GEO-865 ADVANCED QUANTITATIVE METHODS IN GEOGRAPHY
Instructor: Bruce Wm. Pigozzi
Office: 108d Natural Sciences Building
Phone: 355-4652 (leave message at 355-4649)
Email:
Net:
Office Hours: Tuesday and Thursday 1:30 - 2:30 PM,
or by appointment.
Teaching Assistant: James Biles
Details to be announced in class.
This course involves the study of statistical and mathematical
approaches to the analysis of spatial information and processes.
Emphasis will be given to geographic research using Regression,
Principal components and related, so called, general linear
analyses. There are many other methods we will not be able to
cover in the time allotted. However, it is argued that these,
and the research designs offered by the literature, provide a
firm base from which you can learn the others (in other courses
or perhaps on your own).
This is a research oriented course; you are expected to READ,
DO, REPORT and CRITIQUE research. The reading portion of this
charge should be done in a critical fashion, noting the
strengths and weaknesses of the various efforts. This means
that your assessment of research design, methods, and
presentations is important and that you should become capable
of professional level interpretation and analysis. Courses
at this level should elicit active responses from you; that is,
you should be exercising skepticism, constructive criticism,
and synthetic thinking.
· Reading
List Number One.
· Reading
List Number Two.
· Reading
List Number Three.
· Reading
List Number Four.
YOU ARE EXPECTED TO HAVE AVAILABLE, OR VERY SOON ACQUIRE, A
QUALITY RESEARCH DATA SET; those of you without such a data
set should gather it during the first week of the semester.
(See the description of exercise #1 below for more information
on the data set.) You will subject these data to a series of
manipulations in order to gain first hand knowledge of the
potentials and problems associated with the various statistical
methods and designs. We will be using SYSTAT, a micro-computer
based, statistical package for most of the assignments. The
micro-computer based assignments will reflect the technology
with which many of you will come into contact upon leaving MSU.
The manipulations mentioned above will be accomplished through
the 6 exercises listed on the attached sheets. You are
encouraged to experiment on your own as well. SPSS and SAS
are comparable statistical packages. If you intend to utilize
one of these be advised that while they will work there are
limitations to the support which will be available.
There will be four reading lists, one for each of the major
topics covered. These lists consist of a "required" and an
"optional" section. The readings come from the research
literature of several branches of Geography and a few other
disciplines. There will be a mid-term and a final exam which
will cover both the required readings and the lectures. The
optional readings are offered as selections for those of you
who are interested in further examples and for future reference.
At the end of these reading lists are references providing
mathematical summaries, reviews and treatments useful when
you have more time and the interest to work through more
detailed derivations. There's enough "stuff" in these for
you to continue on your own after, and beyond, this course
if you want.
GRADING:
Grades will be based upon performance in three unequally
weighted areas: exercises, presentations and examinations.
The description and points for each are discussed below.
There will be a mid-term and a final examination EACH worth
15 percent of your course grade. These examinations will
test material from the required readings and from the
lectures. A list of questions used before will be made
available early in the term. The questions on the exams
may come in part from this list of "old" questions.
There are 6 written assignments or exercises. These are
very important because they contribute in a major way to
your grade (50 percent) but they will also help you get
into the research process. Some of the analyses done in
this class in the past have led to publications. These
assignments are not of equal magnitude, you should make
careful note of the possible points for these written
exercises.
********************************************************
A REALLY, REALLY, REALLY IMPORTANT POLICY!
In prior years I have come upon the problem of late
assignments. It is unfair for an individual to consistently
turn in exercises late; in doing so the individual not only
has more time to work on the project but often systematic
errors will have been discussed in class and such an
individual would consistently have the benefit of this
information and an advantage over those who turn in the
assignment on time. Thus, I have reluctantly, but firmly,
adopted the following late policy. A "late" is defined as
a work-week day after the due date.
An assignment due on Monday, but turned in on Wednesday,
incurs 2 "lates". Each student is permitted 3 lates for
the semester before penalties are applied. The penalty
is the subtraction of "lates" in excess of 3 from the total
assignment's points accrued. Negative points may be
accumulated up to the value of the exercise. (Thus, a
10 point assignment NOT turned in after ten days has
10 "lates". If these are beyond the 3 allowed the "score"
is a negative 10 points. If the exercise is then turned
in and receives a perfect score of 10 the gradebook would
then show a net score of zero; if it were not turned in
the gradebook would show negative 10.
Hence, the range of points for exercises is +50 points
to -47 points; the latter would result if a student turned
in no assignments at all.) Lates are counted according
to when we (they may be given to the TA also) receive
the material. If I or my TA are not available have one
of the secretaries (Rm 315) put it in my mail box AFTER
they've recorded the day and time on it. NEVER, EVER
SLIDE AN ASSIGNMENT UNDER MY (OR ANY) DOOR; THEY CAN BE
LOST, MUTILATED OR EVEN DESTROYED. NO ASSIGNMENT SO
DELIVERED WILL BE ACCEPTED.
*******************************************************
The final 20 percent of your grade will be determined
by your participation in the research presentation portion
of the course. I have found in the past that the data sets,
analyses, and even the problems encountered by the students
in the class are important opportunities for everyone to
learn. Therefore, you will each present two of your
analyses to the entire class. These presentations will
be scored on a scale of 7 points each. The expectation
is that each student will participate in the dialogue
stimulated by these presentations. Constructive criticism,
pertinent questions, and creative suggestions are all
encouraged. Each of you will be scored on a scale of 6
for this participation.
Thus, of a total of 100 for the entire course: 30% is
from examinations, 14% from presentation of your research,
6% for contributions to discussion of the research of
others, and 50% from the written exercises.
COURSE OUTLINE
1. Introduction and Survey of Mathematics
A. The concept of N-dimensional space in mathematics
B. Elements of the Calculus with applications
C. Matrix algebra with applications
2. The regression model
A. Geometric interpretation, two variable case
B. Algebraic interpretation, two variable case
C. Fitting and testing
D. Multiple regression
1. Normal equations
2. Partial regression coefficients
3. Beta coefficients
E. Regression designs
1. Stepwise
2. Complex independent variables
3. Transformations
4. Dummy variables
F. Problems
1. Multicolinearity
2. Temporal and spatially correlated errors
3. The principal components and factor analysis models
A. The geometric interpretation
B. The algebraic interpretation
C. Communality estimates and criteria for rotation
D. Fitting and interpretation
E. Factor scores
F. Applications
4. Taxonomic methods and other Advanced & Composite designs
A. N-dimensional measures of similarity
B. Discriminant analysis
C. Grouping and regionalization
D. Ventures into time-series and space series analyses
E. Expansion and related methods
F. Location-Allocation models
SCHEDULE OF EXAMS AND EXERCISES
Exercise 1: Data and File Creation. DUE January 28, 1998.
Possible Points are 5 for this exercise. For this assignment
you will prepare and turn in a "code book". You are to have
a quality data set for this course. This data set should
consist of 12-20 interval/ratio scale variables, gathered
for at least 45 observational units. (Think of this as a
table with 12 to 20 columns [variables] and 45 or more rows
[observations].) The observational unit are most commonly
geographic or geographically specific units such as census
tracts, counties, countries, specific weather stations or
sampling points. If you wish to use non-geographic data
please check with me as some can be accommodated while
others are particularly problematic. Nominal or Ordinal
variables might be used only in addition to the requisite
interval/ratio variables.
This first exercise should include a description of the
variables; that is, the full verbal description of the
variables, the sources, the names you will use in SYSTAT.
I also want a list of the sequence of observational units
(for most of you this will be a list of geographic units
like counties, countries, or census tracts; supplemented
with a map of the units), descriptive statistics, and a
complete listing of your data after it has been prepared
for the computer, including any transformations you deem
necessary. This means I expect you will have examined
each of your variables BEFORE you turn in the first
exercise. That is, you will compute the mean, variance
and range. You might also consider such questions as:
"What does the distribution look like?", "Are there any
strange outliers?", "Is a variable normally distributed?"
(there is NO requirement that the variables BE normal)
and "Is there possible need or reason to transform a
variable?" Actually more important than normality at
this point, you should be certain that you have not
made mistakes inputting your data.
This code book will be my reference document for the
term, so if you want a copy before the end of the term
make a duplicate before you turn it in. Also, please
make these code books of standard page size and DO NOT
give me fan folded paper. I will use this document
when I look at each of your later exercises, so give
it to me in a form that it can actually be used!
Exercise 2: Simple Correlations and Regressions.
Due February 9, 1998. Possible points for this exercise
are 5. This exercise has you doing your first "analyses"
with your data. Since each data set will be different
the precise form of this and the remaining assignments
will vary with each of you. (That's why I need the code
books!) This is seen as a "first-cut" assessment of the
interrelationships within your data. I am expecting you
to develop some elementary, but meaningful, hypotheses
concerning your data and to test them using simple
(READ: bivariate) correlations and regressions. Don't
include any literature reviews or extensive theory
discussion. I just want discussion of hypotheses with
BRIEF justification, a brief analysis including computer
results (including printouts) and conclusions. The
text of this assignment should not exceed 1000 words.
Exercise 3: Multiple Regression, Part I.
Due February 18, 1998. Possible points for this exercise
are 10. This is the first "advanced" exercise. Based
upon the results of the previous efforts and your
familiarity with the general subject area (and data),
you are to formulate and justify at least one multiple
regression equation (with at least 3 independent
variables) and test the appropriate hypotheses concerning
the power and significance of the equation and its
component coefficients. In addition you should address
the question as to whether the general assumptions of
the regression design are satisfied. (However, this
exercise does not require you to evaluate residuals; we
do that in the next one.) You will note this is NOT a
stepwise regression. The purpose of this exercise is
for you to demonstrate an understanding of the
single-step, multiple regression research design.
(The next exercise goes further.) You should hand in
your printout(s) and a brief (3-4 page) statement (or
restatement in parts) of your hypotheses and an
interpretation of the results.
Exercise 4: Multiple Regression, Part II. Due March 2, 1998.
Possible points for this exercise are 10. The stepwise
multiple regression procedure is more than a regression
equation; it is a statistical design for determining a
"best" equation, not just the "best" coefficients. The
stepwise multiple regression design and an examination
of residuals is the core of this assignment. Again you
will probably want to use information and statistics
from previous runs to direct the design of the stepwise
process (for example, F and t statistics from Ex 3 might
prove very useful in this one). You will, however,
probably use more than three independent variables. You
are also expected to examine the residuals from a
significant equation. If your data are spatial, as
is the usual case in this class, this assignment will
likely involve mapping the residuals. (If your data
are time series you will need to examine the residuals
for serial correlation.) It's quite likely you will
need more than one run for this assignment. The "text"
of this assignment should interpret the results and
incorporate a discussion of the residuals.
MID-TERM EXAM, March 4, 1998
Exercise 5: Principal Components/Factor Analysis, Part I.
Due March 30, 1998. Possible points from this exercise
are 10. This is a major assignment where you examine the
multivariate characteristics of your total data set.
Principal components and Factor Analysis are accomplished
through the "FACTOR" module in SYSTAT. The latter part
of this assignment involves a comparison of PCA and
Factor Analyses; but it is suggested that you begin
with the Principal Components Analysis. You will
need to evaluate and discuss the appropriate number
of significant components to extract, rotation of
the components, interpretation of the components, a
nd the estimated commonalties of the PCA solution.
Then you should run Factor Analyses which parallel
your PCA analyses, noting differences and improvements
where possible. (Obviously, for this assignment more
than a single computer run is required. It is not
necessary to submit ALL runs that you've done but do
include the printout for all runs which are referenced
in your write-up.)
Exercise 6: Principal Components, Part II.
Due April 15, 1998. Possible points from this exercise
are 10. In this final exercise you are to examine and
use the SCORES from your final principal components
solution. This will probably require mapping the
scores, grouping the observations, and/or using them
in a regression design. The text of this assignment
should focus upon a description of the resulting scores
(spatial) distributions, the "dimensions" each represents,
and a proper discussion of use to which you place them.
FINAL EXAMINATION: Monday May 4, 1998; 3:00 - 5:00 PM.
(Please note our class time goes over two final exam times;
this is the most convenient of the two.)