Adam and how the residuals of the model

Adam Olsen

Regression Analysis

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Fall 2018


Final Paper Analysis: Spatial
Autocorrelation from Least Squares Regression

            In this paper we will analyze an open access paper about
the topic of Spatial Autocorrelation and how it can be used to test the
residuals from a Least Squares Regression equation. The scholastic paper was
written by Yanguang Chen from Peking University in China. The editor of the
paper was Guy J-P. Schumann who was a professor at the University of
California, Los Angeles. We are going to be examining the implications of
Spatial Autocorrelation techniques and how it may be able to provide a more
accurate answer than the Durbin-Watson test which was covered in our textbook.
The overall paper has interesting connections for examining cross-sectional
data stemming from random spatial sampling.

            The beginning of this paper first describes the
implementations of least squares regression and how it can be used to describe
real world systems. In this part of the paper Chen states, “By means of
regression modeling, we can explain the causes of effects or predict the
effects with causes” (Chen 1). I believe this is a respectable statement on his
behalf on what regression modeling can produce. Chen then goes into discussing
how a good regression model would look like and how the residuals of the model
would be a random series and not have autocorrelation. Autocorrelation is the
idea that there is a correlation between your predictors variables or elements
when the same regression is used over a specific period. The way scientists go
around this problem however to analyze the residuals and test for
autocorrelation is the Durbin-Watson test. However, this test is only useful
when the regression analysis is only in a one-dimensional field. This is do to
the fact that the Durbin-Watson test only contains a one-dimensional
autocorrelation coefficient. The paper states, “the aim of this study is to
develop simple methods to test residuals of regression analysis based on
spatial data from a new point of view” (Chen 2). Chen will be looking at other
alternatives for analyzing autocorrelation of residuals when it comes to
multidimensional regression analysis.

            The next part of Chen’s paper goes
into discussing the models and methods of the regression equation and showing
the deficiency of the Durbin-Watson test. He begins by describing the multivariable
linear equation as follows:


is the same equations as a linear regression equation we learned in class, however
the residuals are slightly changes and must satisfy a set of conditions. The
Durbin-Watson test is described as follows:  


When the Durbin-Watson
test statistic is close to two, then the residuals can be considered
non-auto-correlating. The Durbin-Watson test however will only provide
meaningful results when the data being analyzed from the regression equation is
time series or an ordered spatial series. When you perform a regression
analysis on data that is “cross-sectional from spatial random samples, the
residuals will form a space series” (Chen 4) and make the Durbin-Watson test
null. The next section of Chen’s paper discusses methods to be able to test
random serial correlation.

            From the perspective of Chen, the
best way to approach this problem is to implement Moran’s index. Moran’s index
is “a measure of spatial autocorrelation that is characterized by a correlation
in a signal among nearby locations and heavily used in geography” (Chen 5).
Chen first uses a series of residuals from predicted values and standardizes
them with the following equation:


then will be able be able to create a special weights matrix once you have the
random sampling points. Then the spatial autocorrelation coefficient can be
calculated by the following formula:


From this point we can
develop mathematical alternative forms that may be easier to compute. A
statistical measurement will contain two segments of data one that is based on
the population and the other that is based on samples. For determining a
principle component analysis theory set we must
be able to base that theory on sample data size. A way to construct the spatial
contiguity matrix is:


can then be used to develop of a new set of indices where we are able to test
serial correlation which is as follows:  


The approximate residual
correlation index can then be calculated:


When you compare the
Durbin-Watson test statistic to the approximate residual correlation index, you
can see some of the similarities between them. The Durbin-Watson test contains
a one order time lag, whereas the approximate residual correlation index
contains a spatial weight function. The next part of the scholastic paper
written by Yanguang Chen describes a case study of testing for spatial
auto-correlation in regression.

            In his case study he applies the autocorrelation analysis
to the relationship of urbanization and economic development. The paper
prefaces that a nonlinear function can be used to model the relationship; but a
linear equation will be able to produce an approximation. The case study will
be analyzing 31 provinces in China and will contain two variables in the
regression. The first variable is known as the level of urbanization and the
second variable is gross regional product which is similar to GDP. The level of
urbanization can be calculated as the proportion of urban population vs the
total population in a region. The first step in Chen’s case study is to develop
the regression equation. The equation will result in the residual and
standardized residuals. The next step in solving for autocorrelation is computing
the Moran’s index. To calculate the residual correlation index however, there
are a few steps needed. The first step in this process is to standardize the
residual vector and then calculate the spatial weighted matrix that will
replace the normal one-dimensional regression equation. From there Chen
computes the spatial autocorrelation index and finally from there calculated
the residual correlation index. The next test Chen provides is the ‘Test for
serial correlation for linear regression analyses” ().

            Chen in his second section states
“The correlation between the level of urbanization and level of economic
development is currently a hot topic in China” (Chen 12-13). In this section he
creates a linear regression model that fits his data set with the following equation:


This is easily done in R
which will be provided in a separate file. From his model he assessed his
r-squared value which provided a high “goodness of fit”. From there he went on
to use the Durbin-Watson statistic to be able to obtain the spatial
autocorrelation index and the residual correlation index. One item that is
noteworthy is that if the elements are rearranged the Durbin-Watson value is
changed as well but the residual correlation index remains the same. This is do
to the fact that the weight function generates a residual correlation index
that is unique in nature. Chen also discusses in his paper on the third section
that you can transform the regression equation such that it is a log linear
relation and the spatial autocorrelation analysis can be applied to those
models as well.

            One of the last sections of this paper discusses the
basic framework of his methods and how to apply them to different situations
other than the economic development of China. This methodology can be
illustrated best by Figure 4 (Chen 15) in the scholastic paper. The first step
is to analyze a spatial data set using regression analysis. The next step is to
standardize the residuals and create the spatial weight matrix. From this point
you can calculate Moran’s index and find the index for the residual correlation.
In the last step you calculate the spatial Durbin-Watson statistic and then use
the test of residuals of serial correlation. However, one must also realize
there are deficiencies when it comes to any model or test and we must look at
what those might be.

            The first deficiency in this method must deal with the
weight functions and how they are used in the regression. There are only four
main types of weight functions in geographical analysis and the problem lies
with determining which function to use. It is very difficult to pick a weight
function because one must know the physical meanings behind the different
functions and the statistician will need to have a lot of background knowledge
of the subject. Another huge challenge as with any statistical modeling is the
quality of the data provided. These techniques are not known to work as well
with big data methodologies. Overall, this paper has provided some new insights
into knowledge about autocorrelation and how it is used in different areas
other than pure statistics, a more applied arena. The conclusion of this paper
ends with three main points that were produced by Chen. The first is that
spatial autocorrelation can be used to test serial correlation of least squares
regression residuals. The next point is that the testing of residual
correlation when it pertains to a spatial random series can be constructed in
more than one way. The last point Chen makes is that the Durbin-Watson tables can
be adopted for testing the autocorrelation of spatial serial data.





























I'm Morris!

Would you like to get a custom essay? How about receiving a customized one?

Check it out