GEO-865 ADVANCED QUANTITATIVE METHODS IN GEOGRAPHY
 
Instructor: Bruce Wm. Pigozzi
Office: 108d Natural Sciences Building
Phone: 355-4652 (leave message at 355-4649)
Email: 
Net:  
Office Hours: Tuesday and Thursday 1:30 - 2:30 PM, 
or by appointment.
 
Teaching Assistant: James Biles
Details to be announced in class.
 
This course involves the study of statistical and mathematical 
approaches to the analysis of spatial information and processes.  
Emphasis will be given to geographic research using Regression,  
Principal components and related, so called, general linear 
analyses.  There are many other methods we will not be able to 
cover in the time allotted.  However, it is argued that these, 
and the research designs offered by the literature, provide a 
firm base from which you can learn the others (in other courses 
or perhaps on your own).
 
This is a research oriented course; you are expected to READ, 
DO, REPORT and CRITIQUE research.  The reading portion of this 
charge should be done in a critical fashion, noting the 
strengths and weaknesses of the various efforts.  This means 
that your assessment of research design, methods, and 
presentations is important and that you should become capable 
of professional level interpretation and analysis.  Courses 
at this level should elicit active responses from you; that is, 
you should be exercising skepticism, constructive criticism, 
and synthetic thinking.
 

·  Reading List Number One.

·  Reading List Number Two.

·  Reading List Number Three.

·  Reading List Number Four.

 
 
YOU ARE EXPECTED TO HAVE AVAILABLE, OR VERY SOON ACQUIRE, A 
QUALITY RESEARCH DATA SET; those of you without such a data 
set should gather it during the first week of the semester.  
(See the description of exercise #1 below for more information 
on the data set.)  You will subject these data to a series of 
manipulations in order to gain first hand knowledge of the 
potentials and problems associated with the various statistical 
methods and designs.  We will be using SYSTAT, a micro-computer 
based, statistical package for most of the assignments.  The 
micro-computer based assignments will reflect the technology 
with which many of you will come into contact upon leaving MSU.  
The manipulations mentioned above will be accomplished through 
the 6 exercises listed on the attached sheets.  You are 
encouraged to experiment on your own as well.  SPSS and SAS 
are comparable statistical packages.  If you intend to utilize 
one of these be advised that while they will work there are 
limitations to the support which will be available.
 
There will be four reading lists, one for each of the major 
topics covered.  These lists consist of a "required" and an 
"optional" section.  The readings come from the research 
literature of several branches of Geography and a few other 
disciplines.  There will be a mid-term and a final exam which 
will cover both the required readings and the lectures.  The 
optional readings are offered as selections for those of you 
who are interested in further examples and for future reference.  
At the end of these reading lists are references providing 
mathematical summaries, reviews and treatments useful when 
you have more time and the interest to work through more 
detailed derivations.  There's enough "stuff" in these for 
you to continue on your own after, and beyond, this course 
if you want.
 
GRADING:
 
Grades will be based upon performance in three unequally 
weighted areas: exercises, presentations and examinations.  
The description and points for each are discussed below.
 
There will be a mid-term and a final examination EACH worth 
15 percent of your course grade.  These examinations will 
test material from the required readings and from the 
lectures.  A list of questions used before will be made 
available early in the term.  The questions on the exams 
may come in part from this list of "old" questions.
 
There are 6 written assignments or exercises.  These are 
very important because they contribute in a major way to 
your grade (50 percent) but they will also help you get 
into the research process.  Some of the analyses done in 
this class in the past have led to publications.  These 
assignments are not of equal magnitude, you should make 
careful note of the possible points for these written 
exercises.
 
********************************************************
A REALLY, REALLY, REALLY IMPORTANT POLICY!
 
In prior years I have come upon the problem of late 
assignments.  It is unfair for an individual to consistently 
turn in exercises late; in doing so the individual not only 
has more time to work on the project but often systematic 
errors will have been discussed in class and such an 
individual would consistently have the benefit of this 
information and an advantage over those who turn in the 
assignment on time.  Thus, I have reluctantly, but firmly, 
adopted the following late policy.  A "late" is defined as 
a work-week day after the due date.  
 
An assignment due on Monday, but turned in on Wednesday, 
incurs 2 "lates".  Each student is permitted 3 lates for 
the semester before penalties are applied.  The penalty 
is the subtraction of "lates" in excess of 3 from the total 
assignment's points accrued.  Negative points may be 
accumulated up to the value of the exercise.  (Thus, a 
10 point assignment NOT turned in after ten days has 
10 "lates".  If these are beyond the 3 allowed the "score" 
is a negative 10 points.  If the exercise is then turned 
in and receives a perfect score of 10 the gradebook would 
then show a net score of zero; if it were not turned in 
the gradebook would show negative 10.  
 
Hence, the range of points for exercises is +50 points 
to -47 points; the latter would result if a student turned 
in no assignments at all.)  Lates are counted according 
to when we (they may be given to the TA also) receive 
the material.  If I or my TA are not available have one 
of the secretaries (Rm 315) put it in my mail box AFTER 
they've recorded the day and time on it.  NEVER, EVER 
SLIDE AN ASSIGNMENT UNDER MY (OR ANY) DOOR; THEY CAN BE 
LOST, MUTILATED OR EVEN DESTROYED.  NO ASSIGNMENT SO 
DELIVERED WILL BE ACCEPTED.  
 
*******************************************************
 
The final 20 percent of your grade will be determined 
by your participation in the research presentation portion 
of the course.  I have found in the past that the data sets, 
analyses, and even the problems encountered by the students 
in the class are important opportunities for everyone to 
learn.  Therefore, you will each present two of your 
analyses to the entire class.  These presentations will 
be scored on a scale of 7 points each.  The expectation 
is that each student will participate in the dialogue 
stimulated by these presentations.  Constructive criticism, 
pertinent questions, and creative suggestions are all 
encouraged.  Each of you will be scored on a scale of 6 
for this participation.
 
Thus, of a total of 100 for the entire course: 30% is 
from examinations, 14% from presentation of your research, 
6% for contributions to discussion of the research of 
others, and 50% from the written exercises.
 
 
 
 
 
 
COURSE OUTLINE
 
1. Introduction and Survey of Mathematics
        A. The concept of N-dimensional space in mathematics
        B. Elements of the Calculus with applications
        C. Matrix algebra with applications
2. The regression model
        A. Geometric interpretation, two variable case
        B. Algebraic interpretation, two variable case
        C. Fitting and testing
        D. Multiple regression
               1. Normal equations
               2. Partial regression coefficients
               3. Beta coefficients
        E. Regression designs
               1. Stepwise
               2. Complex independent variables
               3. Transformations
               4. Dummy variables
        F. Problems
               1. Multicolinearity
               2. Temporal and spatially correlated errors
3. The principal components and factor analysis models
        A. The geometric interpretation
        B. The algebraic interpretation
        C. Communality estimates and criteria for rotation
        D. Fitting and interpretation
        E. Factor scores
        F. Applications
4. Taxonomic methods and other Advanced & Composite designs
        A. N-dimensional measures of similarity
        B. Discriminant analysis
        C. Grouping and regionalization
        D. Ventures into time-series and space series analyses
        E. Expansion and related methods
        F. Location-Allocation models
 
 SCHEDULE OF EXAMS AND EXERCISES
 
 
Exercise 1: Data and File Creation. DUE January 28, 1998. 
Possible Points are 5 for this exercise. For this assignment 
you will prepare and turn in a "code book".  You are to have 
a quality data set for this course.  This data set should 
consist of 12-20 interval/ratio scale variables, gathered 
for at least 45 observational units.  (Think of this as a 
table with 12 to 20 columns [variables] and 45 or more rows 
[observations].)  The observational unit are most commonly 
geographic or geographically specific units such as census 
tracts, counties, countries, specific weather stations or 
sampling points.  If you wish to use non-geographic data 
please check with me as some can be accommodated while 
others are particularly problematic.  Nominal or Ordinal 
variables might be used only in addition to the requisite 
interval/ratio variables.                                          
 
This first exercise should include a description of the 
variables; that is, the full verbal description of the 
variables, the sources, the names you will use in SYSTAT. 
I also want a list of the sequence of observational units 
(for most of you this will be a list of geographic units 
like counties, countries, or census tracts; supplemented 
with a map of the units), descriptive statistics, and a 
complete listing of your data after it has been prepared 
for the computer, including any transformations you deem 
necessary.  This means I expect you will have examined 
each of your variables BEFORE you turn in the first 
exercise.  That is, you will compute the mean, variance 
and range.  You might also consider such questions as: 
"What does the distribution look like?", "Are there any 
strange outliers?", "Is a variable normally distributed?" 
(there is NO requirement that the variables BE normal) 
and "Is there possible need or reason to transform a 
variable?"  Actually more important than normality at 
this point, you should be certain that you have not 
made mistakes inputting your data.                                  
 
This code book will be my reference document for the 
term, so if you want a copy before the end of the term 
make a duplicate before you turn it in.  Also, please 
make these code books of standard page size and DO NOT 
give me fan folded paper.  I will use this document 
when I look at each of your later exercises, so give 
it to me in a form that it can actually be used!
 
Exercise 2: Simple Correlations and Regressions.  
Due February 9, 1998.  Possible points for this exercise 
are 5.  This exercise has you doing your first "analyses" 
with your data.  Since each data set will be different 
the precise form of this and the remaining assignments 
will vary with each of you.  (That's why I need the code 
books!)  This is seen as a "first-cut" assessment of the 
interrelationships within your data.  I am expecting you 
to develop some elementary, but meaningful, hypotheses 
concerning your data and to test them using simple 
(READ: bivariate) correlations and regressions.  Don't 
include any literature reviews or extensive theory 
discussion.  I just want discussion of hypotheses with 
BRIEF justification, a brief analysis including computer 
results (including printouts) and conclusions.  The 
text of this assignment should not exceed 1000 words.
 
Exercise 3: Multiple Regression, Part I.  
Due February 18, 1998. Possible points for this exercise 
are 10.  This is the first "advanced" exercise.  Based 
upon the results of the previous efforts and your 
familiarity with the general subject area (and data), 
you are to formulate and justify at least one multiple 
regression equation (with at least 3 independent 
variables) and test the appropriate hypotheses concerning 
the power and significance of the equation and its 
component coefficients. In addition you should address 
the question as to whether the general assumptions of 
the regression design are satisfied.  (However, this 
exercise does not require you to evaluate residuals; we 
do that in the next one.)  You will note this is NOT a 
stepwise regression.  The purpose of this exercise is 
for you to demonstrate an understanding of the 
single-step, multiple regression research design.  
(The next exercise goes further.)  You should hand in 
your printout(s) and a brief (3-4 page) statement (or 
restatement in parts) of your hypotheses and an 
interpretation of the results.
 
Exercise 4: Multiple Regression, Part II. Due March 2, 1998.  
Possible points for this exercise are 10.  The stepwise 
multiple regression procedure is more than a regression 
equation; it is a statistical design for determining a 
"best" equation, not just the "best" coefficients.  The 
stepwise multiple regression design and an examination 
of residuals is the core of this assignment.  Again you 
will probably want to use information and statistics 
from previous runs to direct the design of the stepwise 
process (for example, F and t statistics from Ex 3 might 
prove very useful in this one).  You will, however, 
probably use more than three independent variables.  You 
are also expected to examine the residuals from a 
significant equation.  If your data are spatial, as 
is the usual case in this class, this assignment will 
likely involve mapping the residuals.  (If your data 
are time series you will need to examine the residuals 
for serial correlation.)  It's quite likely you will 
need more than one run for this assignment.  The "text" 
of this assignment should interpret the results and 
incorporate a discussion of the residuals.
 
MID-TERM EXAM, March 4, 1998
 
Exercise 5: Principal Components/Factor Analysis, Part I.  
Due March 30, 1998.  Possible points from this exercise 
are 10.  This is a major assignment where you examine the 
multivariate characteristics of your total data set.  
Principal components and Factor Analysis are accomplished 
through the "FACTOR" module in SYSTAT.  The latter part 
of this assignment involves a comparison of PCA and 
Factor Analyses; but it is suggested that you begin 
with the Principal Components Analysis.  You will 
need to evaluate and discuss the appropriate number 
of significant components to extract, rotation of 
the components, interpretation of the components, a
nd the estimated commonalties of the PCA solution.  
Then you should run Factor Analyses which parallel 
your PCA analyses, noting differences and improvements 
where possible.  (Obviously, for this assignment more 
than a single computer run is required.  It is not 
necessary to submit ALL runs that you've done but do 
include the printout for all runs which are referenced 
in your write-up.)
 
Exercise 6: Principal Components, Part II.  
Due April 15, 1998.  Possible points from this exercise 
are 10.  In this final exercise you are to examine and 
use the SCORES from your final principal components 
solution.  This will probably require mapping the 
scores, grouping the observations, and/or using them 
in a regression design.  The text of this assignment 
should focus upon a description of the resulting scores 
(spatial) distributions, the "dimensions" each represents, 
and a proper discussion of use to which you place them.  
 
 
FINAL EXAMINATION:   Monday May 4, 1998;  3:00 - 5:00 PM. 
(Please note our class time goes over two final exam times; 
this is the most convenient of the two.)