Ordinary Least Squares (OLS) and Geographically Weighted Regression (GWR) woes

18014
27
Jump to solution
05-07-2010 03:31 PM
BrianWallace
New Contributor
Hello forum community,

I am a graduate student charged with the task of exploring the potentiality of predicting archaeological "site" or more accurately artifact location based off of an existing archaeological site database as my thesis exercise.  The task is to attempt to find any correlational relationships between the existing site location dependent variable and environmental indpendent variables.  I have looked into various different logistic, multivariate, regression models but have been reading up on the relatively recently released tools of OLS and GWR in the Spatial Statistics toolbox.  Needless to say I have begun working with the data using these regression tools but am quickly feeling a bit overwhelmed and fear spending a significant amount of time towards a goal that may not even be possible with this method. 

I have watched the free web seminar a couple of times, have read;

The ESRI Guide to GIS Analysis, Volume 2
Mitchell, Andy. ESRI Press, 2005.
Geographically Weighted Regression: the analysis of spatially varying relationships
Fotheringham, Stewart A., Chris Brunsdon, and Martin Charlton. John Wiley & Sons, 2002.

And thought I was making progress however I am finding very little R-squared "goodness of fit" value in every attempt of analysis and am feeling as though I am spinning circles.  This forum was suggested to me by the regional ESRI tech support as she too admitted as to knowing very little about the application.  I have looked into attending an ESRI class "Performing Analysis with ArcGIS Desktop" which was suggested again after the webinar but before pulling the trigger on spending that kind of scratch I am hoping someone can shed some light as to whether I am barking up the wrong tree? 

Could OLS and GWR successfully analyze the relationship between environmental (and perhaps other explanatory systemic behavioral) variables and archaelogical artifact locations?  I vaguely recall in perhaps the webinar or perhaps Mitchell's chapter regarding this material the potentiality of success but again after early attempts just want to make sure before continuing the pursuit.  I love the theory behind using local statistics to assess relationships on the dependent variable and the ability to formulate a model which can be later used to help predict unknown areas but at least with my own study have hit the preverbial wall. 

Can anyone help?

Thanks- aspiring graduate school graduate
0 Kudos
27 Replies
LaurenRosenshein
New Contributor III
Hi Karen,

The kind of dummy variables you're talking about are fine along with some more normal variables in OLS.  They are also alright in GWR if they are not spatial regime variables.  The potential problem is that there is a much higher probability of getting large areas of the same value with only two choices, ie with dummy variables, which then leads to issues with local multicollinearity when using GWR.  Try it with GWR (once you've found a properly specified model using OLS).  If it is a problem:
1) GWR will not solve and will report Severe Model Design
2) If GWR does solve, be sure to check the condition number ... > 30 indicates problems with local multicollinearity (i.e., the regression models for those features are unstable because of variable redundancy and you cannot trust the results).

As for a logit version, at this time there is no logistic regression in ArcGIS.  One option is to use R to do a logistic regression.  We've got a sample of of integrating R and ArcGIS which you can find here.
0 Kudos
Julio_C_Verdejo
New Contributor
Hi Lauren,

I had the chance to catch the GWR workshop at this years UC. I too have a set o binary variables I want to analyse. I have been trying to use the "logit regression (r version)" with no success. I keep getting "ERROR 000732". Changed the paths to a double "\\" but that dint work. Any info on this issue?

thanks

JC
0 Kudos
AndreaSpray
New Contributor
Two GWR questions:

(1) How can I find the F-statistic for the GWR?  This is provided in the OLS output, but not GWR.

(2) How can I map the Adjust R-squared?  ArcMap only provides a way to map the Local R2.

Thanks in advance,
A
0 Kudos
LaurenRosenshein
New Contributor III
Hi A,

Those are two great questions!  As far as the F-Statistic for GWR, the answer to this question is actually one of the main reasons that we recommend so strongly that you find a properly specified OLS model before moving on to GWR.  OLS provides tons of great diagnostics that can help you figure out if you've met all of the assumptions of OLS.  Those diagnostics are what help us feel so confident that the model that we've found really is a model that we can trust.  To learn more about those assumptions check out the ArcUser article called Finding a Meaningful Model.  Unlike OLS, however, GWR does not have many of those great diagnostics, meaning it is a lot more difficult to figure out if you've found a model that you can trust.  The F-Statistic, as well as most of the other diagnostics, are not available for a GWR analysis.  It is for this reason that it is so important to find a model that meets all of the criteria using OLS before you move on to GWR.

As far as mapping the Adjusted R-Squared values, Adjusted R-Squared is a global value that applies to the entire study area (the entire model).  As a result, there is only one Adjusted R-Squared value for the analysis, meaning that there is nothing to map.  The only value that is output for each individual feature is the local R-Squared value, which explains why you can map the local R-Squared, but not the Adjusted R-Squared.
0 Kudos
JaishreeBeedasy
New Contributor
My dependent variable is a count data,are there ways for me to use GWR , to work towards a Poisson regression version?
0 Kudos
JeffreyEvans
Occasional Contributor III
Poisson or Zero-inflated regression is not supported in GWR. It would be quite dangerous to specify a local regression using this type of response variable.
0 Kudos
BrooksBreece
New Contributor III
Good morning, all,

A follow-up to the post below:

Is it correct to say that a non-normally distributed dependent variable and all non-normally distributed independent variables be transformed?

Thank you for your help.

Are the relationships between my variables linear?

    This may seem like a tricky question to answer, but it is actually very simple!  You can use the Scatterplot Matrix to evaluate all of the relationships between the variables in your data.  Linear relationships would look like diagonal lines in the scatterplot matrix.  Non-linear relationships could look more like curved lines, or take some other shape. 
[ATTACH]947[/ATTACH]
If you see that the variable you are trying to model (your dependent variable) has a non-linear relationship with one of your explanatory variables then you have some work to do!  OLS is a linear regression model that assumes that the relationships between your variables are linear.  If they aren�??t linear, you can try to transform your variables so that the relationships become linear.  Common transformations include Log and Exponential transformations. 

Another useful output of the scatterplot matrix is the histogram that is created for each of the variables.  You can use these histograms to figure out if your data is normally distributed, or if it is skewed or has outliers. Skewness and outliers can cause problems in many types of statistics, including regression.    You can use the same power transformations that I just mentioned to help you mitigate the impact of outliers and skewness.    This image shows the way that different types of transformations can help you get your data into its most useful form.
[ATTACH]948[/ATTACH]

Lauren Rosenshein
Geoprocessing Product Engineer
ESRI | Redlands, CA
0 Kudos
ChristopherBride
New Contributor III
Greetings contributors...this is an amazingly informative thread. I am working on an independent study for M.S., the topic is "Land Use Influence on Water Degradation".

My dependent variable is the length (in meters) of impaired rivers/streams expressed as a percentage of the length of all rivers and streams in the sub-watersheds of a larger watershed.

Explanatory variables are both percentages and Percentage(log) of agriculture, grass/grain/pasture, development, and Forest (to explore a negative relationship), expressed as percentages per subwatershed.

I have run numerous OLS processes, and none of the models gets past .9, dismally low considering I would like to see 75. Even when i isolate the watersheds with impaired water I still only get up to 35 (a drastic increase, but still low). Moran's-I is good though, no spacial auto-correllation in any of my models.

Would GWR be better for this kind of analysis? What else should I try?

Also...if I take the (log) of an explanatory variable, should I also take the log of the dependent variable, or would that simply nullify the transformation?

thanks!

Chris
0 Kudos