GEO-865 ADVANCED QUANTITATIVE METHODS IN GEOGRAPHY

Instructor: Bruce Wm. Pigozzi
Office: 108d Natural Sciences Building
Phone: 355-4652 (leave message at 355-4649)
Email: pigozzi@msu.edu
Net: http://www.msu.edu/~pigozzi
Office Hours: Tuesday and Thursday 1:30 - 2:30 PM, or by appointment.

Teaching Assistant: Edna Wangui, Rm 144 Nat Sci, Phone: 353-9940,
Email: wanguiel@msu.edu, Hours: 1:00 - 3:00PM Monday, 5:00 - 7:00PM Wednesday

This course involves the study of statistical and mathematical approaches to the analysis of spatial information and processes. Emphasis will be given to geographic research using Regression, Principal components and related, so called, general linear analyses. There are many other methods we will not be able to cover in the time allotted. However, it is argued that these, and the research designs offered by the literature, provide a firm base from which you can learn the others (in other courses or perhaps on your own).

This is a research oriented course; you are expected to READ, DO, REPORT and CRITIQUE research. The reading portion of this charge should be done in a critical fashion, noting the strengths and weaknesses of the various efforts. This means that your assessment of research design, methods, and presentations is important and that you should become capable of professional level interpretation and analysis. Courses at this level should elicit active responses from you; that is, you should be exercising skepticism, constructive criticism, and synthetic thinking.

YOU ARE EXPECTED TO HAVE AVAILABLE, OR VERY SOON ACQUIRE, A QUALITY RESEARCH DATA SET; those of you without such a data set should gather it during the first week of the semester. (See the description of exercise #1 below for more information on the data set.) You will subject these data to a series of manipulations in order to gain first hand knowledge of the potentials and problems associated with the various statistical methods and designs. We will be using SYSTAT, a micro-computer based, statistical package for most of the assignments. The micro-computer based assignments will reflect the technology with which many of you will come into contact upon leaving MSU. The manipulations mentioned above will be accomplished through the 6 exercises listed on the attached sheets. You are encouraged to experiment on your own as well. SPSS and SAS are comparable statistical packages. If you intend to utilize one of these be advised that while they will work there are limitations to the support which will be available.

There will be four reading lists, one for each of the major topics covered. These lists consist of a "required" and an "optional" section. The readings come from the research literature of several branches of Geography and a few other disciplines. There will be a mid-term and a final exam which will cover both the required readings and the lectures. The optional readings are offered as selections for those of you who are interested in further examples and for future reference. At the end of these reading lists are references providing mathematical summaries, reviews and treatments useful when you have more time and the interest to work through more detailed derivations. There's enough "stuff" in these for you to continue on your own after, and beyond, this course if you want.

GRADING:
Grades will be based upon performance in three unequally weighted areas: exercises, presentations and examinations. The description and points for each are discussed below.

There will be a mid-term and a final examination EACH worth 15 percent of your course grade. These examinations will test material from the required readings and from the lectures. A list of questions used before will be made available early in the term. The questions on the exams may come in part from this list of "old" questions.

There are 6 written assignments or exercises. These are very important because they contribute in a major way to your grade (50 percent) but they will also help you get into the research process. Some of the analyses done in this class in the past have led to publications. These assignments are not of equal magnitude, you should make careful note of the possible points for these written exercises.

*********************************************************************************
A REALLY, REALLY, REALLY IMPORTANT POLICY!
In prior years I have come upon the problem of late assignments. It is unfair for an individual to consistently turn in exercises late; in doing so the individual not only has more time to work on the project but often systematic errors will have been discussed in class and such an individual would consistently have the benefit of this information and an advantage over those who turn in the assignment on time. Thus, I have reluctantly, but firmly, adopted the following late policy. A "late" is defined as a work-week day after the due date.

An assignment due on Monday, but turned in on Wednesday, incurs 2 "lates". Each student is permitted 3 lates for the semester before penalties are applied. The penalty is the subtraction of "lates" in excess of 3 from the total assignment's points accrued. Negative points may be accumulated up to the value of the exercise. (Thus, a 10 point assignment NOT turned in after ten days has 10 "lates". If these are beyond the 3 allowed the "score" is a negative 10 points. If the exercise is then turned in and receives a perfect score of 10 the grade book would then show a net score of zero; if it were not turned in the grade book would show negative 10.

Hence, the range of points for exercises is +50 points to -47 points; the latter would result if a student turned in no assignments at all.) Lates are counted according to when we (they may be given to the TA also) receive the material. If I or my TA are not available have one of the secretaries (Room 315) put it in my mail box AFTER they've recorded the day and time on it. NEVER, EVER SLIDE AN ASSIGNMENT UNDER MY (OR ANY) DOOR; THEY CAN BE LOST, MUTILATED OR EVEN DESTROYED. NO ASSIGNMENT SO DELIVERED WILL BE ACCEPTED.

I repeat:

NEVER, EVER SLIDE AN ASSIGNMENT UNDER MY (OR ANY) DOOR; THEY CAN BE LOST, MUTILATED OR EVEN DESTROYED. NO ASSIGNMENT SO DELIVERED WILL BE ACCEPTED.

********************************************************************************* The final 20 percent of your grade will be determined by your participation in the research presentation portion of the course. I have found in the past that the data sets, analyses, and even the problems encountered by the students in the class are important opportunities for everyone to learn. Therefore, you will each present two of your analyses to the entire class. These presentations will be scored on a scale of 7 points each. The expectation is that each student will participate in the dialogue stimulated by these presentations. Constructive criticism, pertinent questions, and creative suggestions are all encouraged. Each of you will be scored on a scale of 6 for this participation.

Thus, of a total of 100 for the entire course: 30% is from examinations, 14% from presentation of your research, 6% for contributions to discussion of the research of others, and 50% from the written exercises.
 
 

COURSE OUTLINE

 
  • 1. Introduction and Survey of Mathematics
  • Reading List Number One.

  •     A. The concept of N-dimensional space in mathematics
        B. Elements of the Calculus with applications
        C. Matrix algebra with applications

    2. The regression model
    Reading List Number Two.
        A. Geometric interpretation, two variable case
        B. Algebraic interpretation, two variable case
        C. Fitting and testing
        D. Multiple regression
             1. Normal equations
             2. Partial regression coefficients
             3. Beta coefficients
        E. Regression designs
             1. Stepwise
             2. Complex independent variables
             3. Transformations
             4. Dummy variables
        F. Problems
             1. Multicolinearity
             2. Temporal and spatially correlated errors
    3. The principal components and factor analysis models
    Reading List Number Three.
        A. The geometric interpretation
        B. The algebraic interpretation
        C. Communality estimates and criteria for rotation
        D. Fitting and interpretation
        E. Factor scores
        F. Applications
    4. Taxonomic methods and other Advanced & Composite designs
    Reading List Number Four.
        A. N-dimensional measures of similarity
        B. Discriminant analysis
        C. Grouping and regionalization
        D. Ventures into time-series and space series analyses
        E. Expansion and related methods
        F. Location-Allocation models
     
     
     

    SCHEDULE OF EXAMS AND EXERCISES

    Exercise 1: Data and File Creation. DUE January 24, 2000. Possible Points are 5 for this exercise. For this assignment you will prepare and turn in a "code book". You are to have a quality data set for this course. This data set should consist of 12-20 interval/ratio scale variables, gathered for at least 45 observational units. (Think of this as a table with 12 to 20 columns [variables] and 45 or more rows [observations].) The observational unit are most commonly geographic or geographically specific units such as census tracts, counties, countries, specific weather stations or sampling points. If you wish to use non-geographic data please check with me as some can be accommodated while others are particularly problematic. Nominal or Ordinal variables might be used in addition to the requisite interval/ratio variables.
     

    This first exercise should include a description of the variables; that is, the full verbal description of the variables, the sources, the names you will use in SYSTAT. I also want a list of the sequence of observational units (for most of you this will be a list of geographic units like counties, countries, or census tracts; supplemented with a map of the units), descriptive statistics, and a complete listing of your data after it has been prepared for the computer, including any transformations you deem necessary. This means I expect you will have examined each of your variables BEFORE you turn in the first exercise. That is, you will compute the mean, variance and range. You might also consider such questions as: "What does the distribution look like?", "Are there any strange outliers?", "Is a variable normally distributed?" (there is NO requirement that the variables BE normal) and "Is there possible need or reason to transform a variable?" Actually more important than normality at this point, you should be certain that you have not made mistakes inputting your data.
     

    This code book will be my reference document for the term, so if you want a copy before the end of the term make a duplicate before you turn it in. Also, please make these code books of standard page size and do not give me fan folded paper. I will use this document when I look at each of your later exercises, so give it to me in a form that it can actually be used!

    Exercise 2: Simple Correlations and Regressions. Due February 2, 2000. Possible points for this exercise are 5. This exercise has you doing your first "analyses" with your data. Since each data set will be different the precise form of this and the remaining assignments will vary with each of you. (That's why I need the code books!) This is seen as a "first-cut" assessment of the interrelationships within your data. I am expecting you to develop some elementary, but meaningful, hypotheses concerning your data and to test them using simple (READ: bivariate) correlations and regressions. Don't include any literature reviews or extensive theory discussion. I just want discussion of hypotheses with BRIEF justification, a brief analysis including computer results (including printouts) and conclusions. The text of this assignment should not exceed 1000 words.

    Exercise 3: Multiple Regression, Part I. Due February 14, 2000. Possible points for this exercise are 10. This is the first "advanced" exercise. Based upon the results of the previous efforts and your familiarity with the general subject area (and data), you are to formulate and justify at least one multiple regression equation (with at least 3 independent variables) and test the appropriate hypotheses concerning the power and significance of the equation and its component coefficients. In addition you should address the question as to whether the general assumptions of the regression design are satisfied. (However, this exercise does not require you to evaluate residuals; we do that in the next one.) You will note this is NOT a stepwise regression. The purpose of this exercise is for you to demonstrate an understanding of the single-step, multiple regression research design. (The next exercise goes further.) You should hand in your printout(s) and a brief (3-4 page) statement (or restatement in parts) of your hypotheses and an interpretation of the results.

    Exercise 4: Multiple Regression, Part II. Due February 23, 2000. Possible points for this exercise are 10. The stepwise multiple regression procedure is more than a regression equation; it is a statistical design for determining a "best" equation, not just the "best" coefficients. The stepwise multiple regression design and an examination of residuals is the core of this assignment. Again you will probably want to use information and statistics from previous runs to direct the design of the stepwise process (for example, F and t statistics from Ex 3 might prove very useful in this one). You will, however, probably use more than three independent variables. You are also expected to examine the residuals from a significant equation. If your data are spatial, as is the usual case in this class, this assignment will likely involve mapping the residuals. (If your data are time series you will need to examine the residuals for serial correlation.) It's quite likely you will need more than one run for this assignment. The "text" of this assignment should interpret the results and incorporate a discussion of the residuals.
     

    MID-TERM EXAM, March 1, 2000.
    Exercise 5: Principal Components/Factor Analysis, Part I. Due March 20, 2000. Possible points from this exercise are 10. This is a major assignment where you examine the multivariate characteristics of your total data set. Principal components and Factor Analysis are accomplished through the "FACTOR" module in SYSTAT. The latter part of this assignment involves a comparison of PCA and Factor Analyses; but it is suggested that you begin with the Principal Components Analysis. You will need to evaluate and discuss the appropriate number of significant components to extract, rotation of the components, interpretation of the components, and the estimated commonalties of the PCA solution. Then you should run Factor Analyses which parallel your PCA analyses, noting differences and improvements where possible. (Obviously, for this assignment more than a single computer run is required. It is not necessary to submit ALL runs that you've done but do include the printout for all runs which are referenced in your write-up.)

    Exercise 6: Principal Components, Part II. Due April 10, 2000. Possible points from this exercise are 10. In this final exercise you are to examine and use the SCORES from your final principal components solution. This will probably require mapping the scores, grouping the observations, and/or using them in a regression design. The text of this assignment should focus upon a description of the resulting scores (spatial) distributions, the "dimensions" each represents, and a proper discussion of use to which you place them.
     
     

    FINAL EXAMINATION: Monday May 1, 2000; 3:00 - 5:00 PM.