Adam Olsen

Regression Analysis

Fall 2018

12/12/17

Final Paper Analysis: Spatial

Autocorrelation from Least Squares Regression

In this paper we will analyze an open access paper about

the topic of Spatial Autocorrelation and how it can be used to test the

residuals from a Least Squares Regression equation. The scholastic paper was

written by Yanguang Chen from Peking University in China. The editor of the

paper was Guy J-P. Schumann who was a professor at the University of

California, Los Angeles. We are going to be examining the implications of

Spatial Autocorrelation techniques and how it may be able to provide a more

accurate answer than the Durbin-Watson test which was covered in our textbook.

The overall paper has interesting connections for examining cross-sectional

data stemming from random spatial sampling.

The beginning of this paper first describes the

implementations of least squares regression and how it can be used to describe

real world systems. In this part of the paper Chen states, “By means of

regression modeling, we can explain the causes of effects or predict the

effects with causes” (Chen 1). I believe this is a respectable statement on his

behalf on what regression modeling can produce. Chen then goes into discussing

how a good regression model would look like and how the residuals of the model

would be a random series and not have autocorrelation. Autocorrelation is the

idea that there is a correlation between your predictors variables or elements

when the same regression is used over a specific period. The way scientists go

around this problem however to analyze the residuals and test for

autocorrelation is the Durbin-Watson test. However, this test is only useful

when the regression analysis is only in a one-dimensional field. This is do to

the fact that the Durbin-Watson test only contains a one-dimensional

autocorrelation coefficient. The paper states, “the aim of this study is to

develop simple methods to test residuals of regression analysis based on

spatial data from a new point of view” (Chen 2). Chen will be looking at other

alternatives for analyzing autocorrelation of residuals when it comes to

multidimensional regression analysis.

The next part of Chen’s paper goes

into discussing the models and methods of the regression equation and showing

the deficiency of the Durbin-Watson test. He begins by describing the multivariable

linear equation as follows:

(1)

It

is the same equations as a linear regression equation we learned in class, however

the residuals are slightly changes and must satisfy a set of conditions. The

Durbin-Watson test is described as follows:

(2)

When the Durbin-Watson

test statistic is close to two, then the residuals can be considered

non-auto-correlating. The Durbin-Watson test however will only provide

meaningful results when the data being analyzed from the regression equation is

time series or an ordered spatial series. When you perform a regression

analysis on data that is “cross-sectional from spatial random samples, the

residuals will form a space series” (Chen 4) and make the Durbin-Watson test

null. The next section of Chen’s paper discusses methods to be able to test

random serial correlation.

From the perspective of Chen, the

best way to approach this problem is to implement Moran’s index. Moran’s index

is “a measure of spatial autocorrelation that is characterized by a correlation

in a signal among nearby locations and heavily used in geography” (Chen 5).

Chen first uses a series of residuals from predicted values and standardizes

them with the following equation:

(3)

You

then will be able be able to create a special weights matrix once you have the

random sampling points. Then the spatial autocorrelation coefficient can be

calculated by the following formula:

(4)

From this point we can

develop mathematical alternative forms that may be easier to compute. A

statistical measurement will contain two segments of data one that is based on

the population and the other that is based on samples. For determining a

principle component analysis theory set we must

be able to base that theory on sample data size. A way to construct the spatial

contiguity matrix is:

(5)

This

can then be used to develop of a new set of indices where we are able to test

serial correlation which is as follows:

(6)

The approximate residual

correlation index can then be calculated:

(7)

When you compare the

Durbin-Watson test statistic to the approximate residual correlation index, you

can see some of the similarities between them. The Durbin-Watson test contains

a one order time lag, whereas the approximate residual correlation index

contains a spatial weight function. The next part of the scholastic paper

written by Yanguang Chen describes a case study of testing for spatial

auto-correlation in regression.

In his case study he applies the autocorrelation analysis

to the relationship of urbanization and economic development. The paper

prefaces that a nonlinear function can be used to model the relationship; but a

linear equation will be able to produce an approximation. The case study will

be analyzing 31 provinces in China and will contain two variables in the

regression. The first variable is known as the level of urbanization and the

second variable is gross regional product which is similar to GDP. The level of

urbanization can be calculated as the proportion of urban population vs the

total population in a region. The first step in Chen’s case study is to develop

the regression equation. The equation will result in the residual and

standardized residuals. The next step in solving for autocorrelation is computing

the Moran’s index. To calculate the residual correlation index however, there

are a few steps needed. The first step in this process is to standardize the

residual vector and then calculate the spatial weighted matrix that will

replace the normal one-dimensional regression equation. From there Chen

computes the spatial autocorrelation index and finally from there calculated

the residual correlation index. The next test Chen provides is the ‘Test for

serial correlation for linear regression analyses” ().

Chen in his second section states

“The correlation between the level of urbanization and level of economic

development is currently a hot topic in China” (Chen 12-13). In this section he

creates a linear regression model that fits his data set with the following equation:

(8)

This is easily done in R

which will be provided in a separate file. From his model he assessed his

r-squared value which provided a high “goodness of fit”. From there he went on

to use the Durbin-Watson statistic to be able to obtain the spatial

autocorrelation index and the residual correlation index. One item that is

noteworthy is that if the elements are rearranged the Durbin-Watson value is

changed as well but the residual correlation index remains the same. This is do

to the fact that the weight function generates a residual correlation index

that is unique in nature. Chen also discusses in his paper on the third section

that you can transform the regression equation such that it is a log linear

relation and the spatial autocorrelation analysis can be applied to those

models as well.

One of the last sections of this paper discusses the

basic framework of his methods and how to apply them to different situations

other than the economic development of China. This methodology can be

illustrated best by Figure 4 (Chen 15) in the scholastic paper. The first step

is to analyze a spatial data set using regression analysis. The next step is to

standardize the residuals and create the spatial weight matrix. From this point

you can calculate Moran’s index and find the index for the residual correlation.

In the last step you calculate the spatial Durbin-Watson statistic and then use

the test of residuals of serial correlation. However, one must also realize

there are deficiencies when it comes to any model or test and we must look at

what those might be.

The first deficiency in this method must deal with the

weight functions and how they are used in the regression. There are only four

main types of weight functions in geographical analysis and the problem lies

with determining which function to use. It is very difficult to pick a weight

function because one must know the physical meanings behind the different

functions and the statistician will need to have a lot of background knowledge

of the subject. Another huge challenge as with any statistical modeling is the

quality of the data provided. These techniques are not known to work as well

with big data methodologies. Overall, this paper has provided some new insights

into knowledge about autocorrelation and how it is used in different areas

other than pure statistics, a more applied arena. The conclusion of this paper

ends with three main points that were produced by Chen. The first is that

spatial autocorrelation can be used to test serial correlation of least squares

regression residuals. The next point is that the testing of residual

correlation when it pertains to a spatial random series can be constructed in

more than one way. The last point Chen makes is that the Durbin-Watson tables can

be adopted for testing the autocorrelation of spatial serial data.

References

1.