Mapping Geographically weighted regression, p values

8668
4
Jump to solution
12-16-2010 09:34 AM
AndrewTweel1
New Contributor
Hi all,

I am exploring GWR for some of my research.  I have been able to produce good models (that aren't spatially autocorrelated), but can't find out how to obtain a significance (p value) for the parameter estimates.  How can I distinguish statistically significant coefficients?  The t-statistic should be the coefficient estimate divided by its standard error.  I do not know degrees of freedom, however... is this the "effective number" from the output window?

Most published GWR maps I see only show the parameter estimates for the statistically significant results.

Thanks,
Andrew
0 Kudos
1 Solution

Accepted Solutions
LaurenRosenshein
New Contributor III
Hi Andrew,

This is a great question, and one that we get quite a bit.  With GWR, there is a local linear equation for each feature in the dataset.  The equation is weighted so that nearby features have a larger influence on the prediction of yi than features that are farther away.  I do know that our consultant�??s GWR software (Fotheringham, Charlton and Martin) does compute p-values for each coefficient in every one of the local linear equations.  However, because doing so is really not appropriate (and we�??ve discussed this with our consultants and they agree with that assessment), we do not report the coefficient p-values for our ArcGIS GWR tool.  I know our consultants are looking into other methods for computing those p-values that might be appropriate, but I don�??t believe they have come up with an ideal solution. 

Because GWR does not have the strong diagnostics (like p-values, as just one example) that OLS does, we very strongly recommend finding a properly specified OLS model before moving to GWR.  Unless you are only interested in predictions (not interested in the coefficients�?� in variable relationships), you cannot trust GWR results unless you can be sure you�??ve found all of the key explanatory variables to model your dependent variable (you can see this easily: run OLS or GWR with 1 important explanatory variable and examine the coefficients; add a second important explanatory variable and notice that the coefficient values change�?� sometimes dramatically; the coefficient values can change 180 degrees, in fact).

Having said that, we have just released a tool and documentation on the Geoprocessing Resource Center to help you find a properly specified OLS model: Exploratory Regression and �??What they don�??t tell you about regression analysis�?�.  The Exploratory Regression tool is similar to Stepwise Regression except instead of just looking for high Adj R2 values, it looks for models that meet all of the assumptions of the OLS method (no variable redundancy, no spatial autocorrelation in regression residuals, no model bias, statistically significant coefficients�?�).  To download this tool and the associated documentation, check out http://bit.ly/spatialstats (look for Supplementary Spatial Statistics).

Hope this helps!

Lauren Rosenshein
Geoprocessing Product Engineer

View solution in original post

0 Kudos
4 Replies
LaurenRosenshein
New Contributor III
Hi Andrew,

This is a great question, and one that we get quite a bit.  With GWR, there is a local linear equation for each feature in the dataset.  The equation is weighted so that nearby features have a larger influence on the prediction of yi than features that are farther away.  I do know that our consultant�??s GWR software (Fotheringham, Charlton and Martin) does compute p-values for each coefficient in every one of the local linear equations.  However, because doing so is really not appropriate (and we�??ve discussed this with our consultants and they agree with that assessment), we do not report the coefficient p-values for our ArcGIS GWR tool.  I know our consultants are looking into other methods for computing those p-values that might be appropriate, but I don�??t believe they have come up with an ideal solution. 

Because GWR does not have the strong diagnostics (like p-values, as just one example) that OLS does, we very strongly recommend finding a properly specified OLS model before moving to GWR.  Unless you are only interested in predictions (not interested in the coefficients�?� in variable relationships), you cannot trust GWR results unless you can be sure you�??ve found all of the key explanatory variables to model your dependent variable (you can see this easily: run OLS or GWR with 1 important explanatory variable and examine the coefficients; add a second important explanatory variable and notice that the coefficient values change�?� sometimes dramatically; the coefficient values can change 180 degrees, in fact).

Having said that, we have just released a tool and documentation on the Geoprocessing Resource Center to help you find a properly specified OLS model: Exploratory Regression and �??What they don�??t tell you about regression analysis�?�.  The Exploratory Regression tool is similar to Stepwise Regression except instead of just looking for high Adj R2 values, it looks for models that meet all of the assumptions of the OLS method (no variable redundancy, no spatial autocorrelation in regression residuals, no model bias, statistically significant coefficients�?�).  To download this tool and the associated documentation, check out http://bit.ly/spatialstats (look for Supplementary Spatial Statistics).

Hope this helps!

Lauren Rosenshein
Geoprocessing Product Engineer
0 Kudos
AndrewTweel1
New Contributor
I see, thanks for your helpful reply.  So the p-values would be skewed, because they wouldn't weight the closer values more than the farther values.  Is there a way to tell how many neighbors the GWR is using if I use the AICc method?

Is there another way to say which GWR predictions are "good" and which are "not so good"?

I am a little confused now...  if OLS is for data with stationarity, and my data has nonstationarity (hence why interested in GWR), then how would I find a properly specified model in OLS?  I read a previous post of yours about breaking up the data into smaller geographic areas and identifying proper OLS models within those areas, but this seems somewhat arbitrary. Or is the point just to see if all of the variation can be accounted for (at whatever scale) by the model variables?

Thank you for your help!
Andrew
0 Kudos
LaurenScott
Occasional Contributor
Hi Andrew,
Good questions!
1.  How can I determine the number of neighbors used to calibrate each local equation in GWR?
For the GWR Kernel Type parameter you have two options: FIXED or ADAPTIVE.  FIXED means you decide who is a neighbor by whether or not a feature is within some FIXED distance.  AICc or CV will determine an "optimal" distance for you and this distance will be reported in the Progress Window (if you run in foreground) and in the Results Window.  ADAPTIVE means you specify the number of closest neighbors to include in local equation calibration.  Again, selecting AICc or CV for Bandwidth Method will have the GWR tool identify the "optimal" number of neighbors for you.  Soooo, if you use ADAPTIVE, you will know the number of neighbors... it will be the same for all features.  If you select FIXED, however, you don't know the number of neighbors for each feature.  Here is one possible strategy for figuring this out:
(a) If your features aren't points, convert them to points (Feature to Points).
(b) Buffer the point features using the distance returned by GWR AICc or CV.
(c) Do a spatial join to count the number of features in each buffer.

2) How do I tell which GWR predictions are "good" and which are not so good?
The default output from GWR is a residual map; this shows the model over and under predictions.  Small residuals are good predictions; large residuals are not so good predictions.  The output feature class from GWR also includes local R2 values for each feature.  Mapping these local R2 values is another way to see where the model is predicting well and not so well.

3)  Why would I start with OLS when I know the relationships I'm trying to model are non-stationary?
Unfortunately, GWR does not have the strong diagnostics that OLS has to determine whether or not you have a properly specified model.  Further, the fact that you don't have spatial autocorrelation in your GWR residuals is not sufficient evidence that you've found all of the key explanatory variables.  One of the most valuable things about GWR is that you can examine coefficient surfaces to see how the relationships you are modeling are changing across you study area.  Unfortunately, if you are missing a key explanatory variable, you cannot trust the coefficients 😞  Try this exercise: run OLS with several variables and look at the coefficients... now remove an important variable and re-run OLS... notice that the coefficients have changed (this exercise has the same effect with GWR, it's just easier to see the differences with OLS).  In fact, adding or removing a key explanatory variable can cause the coefficients to change 180 degrees (to go from positive to negative or vice verse).  Because of this, we take a very conservative stance and strongly recommend that you always start with OLS and do all that you can to find a properly specified OLS model before moving to GWR.  There are only two instances where I might move to GWR without first finding a properly specified OLS model:
(a) If the best OLS model I found passes the Jarque-Bera test (the p-value is NOT statistically significant so my model is NOT biased), AND I am only interested in the predictions (not in the coefficients).
(b) If I felt absolutely confident that the ONLY reason I wasn't getting a properly specified OLS model is because of non-stationarity.  In this case, I would want to have strong evidence that my variables were, in fact, strongly non-stationary (switching signs).  I would run my best OLS model in GWR and examine the coefficients for each explanatory variable to make sure there is strong non-stationarity.  I would want to be able to confidently argue why a particular explanatory variable might be non-stationary.  For example, why would my Income variable be a strong predictor in the northern part of my study area, but not such a great predictor in the southern part of my study area ?  Often if we force ourselves to try to explain why a variable would be non-stationary, we find the key explanatory variables we are missing, and adding these to OLS provides a properly specified model.  With the Income variable example, I would try a spatial regime (dummy) variable in my OLS model that has a value of 1 for features in the north, and a value of 0 for features in the south (I would remove this variable when I moved to GWR).  Perhaps I would realize, oh, there is quite a bit of military housing (or whatever) in the southern part of the study area... perhaps adding a variable to reflect this to my OLS model will yield a properly specified model.

The OLS tool in ArcGIS automatically computes standard errors that are robust to non-stationarity, so unless the the non-stationarity is severe (changing signs), as long as you've included all of the key explanatory variables, you will very often find a properly specified OLS model despite the non-stationarity. 

There is a sample script on the resource center called Exploratory Regression that can help you find a properly specified OLS model (www.bit.ly/spatialstats ... Supplementary Spatial Statistics).  The documentation for that tool provides strategies for fixing OLS models that aren't properly specified.  Again, my strong recommendation is that you start with OLS, then move to GWR once you find a properly specified OLS model.

Because Geography does matter, your chances of finding a properly specified OLS model increase as the size of your study area decreases (keep in mind GWR requires a minimum of about 100 features).  So if I wanted to find a model for a huge study area, I would break it up into regions and work with each piece separately.  After finding properly specified OLS models for each region, I would then move to GWR with the full set of explanatory variables from all regions (the GWR model would not have to be region by region). 

I hope this helps.  Thanks so much for posting your questions!
Very best wishes,
Lauren M Scott, PhD
ESRI
Geoprocessing, Spatial Statistics
0 Kudos
LaurenScott
Occasional Contributor
Sorry, I know my post is already beyond long... but one more thing:
When I say "best OLS model" I, of course, mean a model with explanatory variables that are supported by theory, common sense, guidance from experts... 
I think that's all 🙂
Lauren
0 Kudos