Issue with GWR

2855
5
03-29-2011 12:32 PM
LeandroGonzalez
New Contributor
Hello Community!

I'm currently trying to work with some regression analysis with census data.  I run the OLS analysis for my dependent variable using 5 explanatory variables.  I then moved on to GWR using the same dependent and explanatory variables.  The problem is that I get an error continuously when trying to perform the GWR analysis.  It appears that GWR does not like one of my explanatory variables (Edad - age in Spanish).  I checked the VIF values after running the OLS and all value are way below 7.5.  I created thematic maps of all the explanatory variables.  Perhaps age and distance to coast show some degree of multicollinearity, however, this collinearity does not continue throughout the entire study area.  I am attaching a document with all the images and the error. 

Any help regarding this issue will be greatly appreciated.  Thank you so much for your time and feedback.

Leandro
0 Kudos
5 Replies
LaurenScott
Occasional Contributor
Hi Leandro,
I'm sorry you are having problems with this.
Please try transforming the Edad variable into a deviation from the mean:
1) Create a new field in your data set called something like TrEdad.
2) Determine the mean for the Edad variable... let's call that value MeanEdad
3) Calculate the new TrEdad field to be:  Edad - MeanEdad
4) Use the TrEdad variable instead of the Edad variable in OLS and notice that the results are the same as they were before (transforming the variable will not change your OLS results).
5) Move to GWR with the TrEdad variable and see if this resolves the Severe model design issue.

Other things that might cause this problem are:
a) you have too few features (with GWR you should really have *at least* 80 or 90 features)
b) you are using a kernel that is too small (you will want to use AICc or CV for the Bandwidth Method parameter so that GWR can find the optimal distance/number of neighbors for you)
c) you are having issues with local multicollinearity ... When you specify AICc or CV for the Bandwidth Method, GWR will be trying a bunch of different distances/number of neighbors in an effort to find one that is optimal.  If, along the way, it encounters issues with local multicollinearity (even on one of those trials), it will fail with Severe Model Design (unfortunately...).  If this is the problem you are having, you will need to figure out where the multicollinearity is and try to sneak up on determining the optimal distance band/number of neighbors:
  ** Run GWR and specify "Adaptive" for the Kernel Type parameter
  ** Just as a test, select "Bandwidth Parameter" for the Bandwidth Method and set the number of neighbors to 40 (I pulled that number off the top of my head, btw, it is not magic or special).
  ** Run GWR and see if it solves.  If it does, map the Condition Numbers in the output feature class.  Condition Numbers above 30 indicate the portions of your study area where you very likely are having trouble with local multicollinearity.
  ** Unless a large portion of the features in your dataset have condition numbers larger than 30, temporarily remove those features from your dataset.  (If a large portion of your features have condition numbers larger than 30, you have one or more variables that, while they may not be redundant globally, they are in fact redundant locally).
  ** Re-run GWR on the subset data, this time specifying Fixed/Adaptive for Kernel Type (whichever you think is most appropriate for your analysis... my bias is to use Fixed), and specifying AICc for Bandwidth Method.  My guess is that GWR will solve this time.
  ** If GWR does solve with the subset, write down the optimal distance or number of neighbors it reports in the progress window (or in the Results Window if you are running in Background).
  ** Now try running GWR on the full dataset using the optimal distance or number of neighbors (reported in the last step) and "Bandwidth Parameter" for the Bandwidth Method parameter.
  ** Hopefully GWR will solve now.  If it does, it most likely means there are local multicollinearity issues with the smaller distances/number of neighbors.  Even though GWR did solve, you will still want to map the condition numbers reported in the output feature class.  You don't have confidence in those locations of your study area where the condition number is greater than 30 (because of local multicollinearity problems).

I hope this answers your question.  Please let me know if these strategies work for you.
Thanks so much for posting your question!  I will check the documentation to make sure this information is included and clear.  If something I've written above is not clear, please let me know so that I can improve my explanation.  Again thank you, and I'm sorry you are having problems with this.
Best wishes,
Lauren

Lauren M. Scott, PhD
Esri
Geoprocessing/Spatial Statistics Product Engineer
0 Kudos
RoshanBhandari
New Contributor

Dear @LeandroGonzalez  your idea of subtracting mean from data was extremely helpful. Thanks a lot. Btw, can I refer this idea from the literature to cite them in my methods? 

0 Kudos
RoshanBhandari
New Contributor

Dear @LaurenScott ,  your idea of subtracting mean from data was extremely helpful. Thanks a lot. Btw, can I refer this idea from the literature to cite them in my methods? 

0 Kudos
LeandroGonzalez
New Contributor
Thank you so much for your quick reply Dr. Scott.  I tried the first option you suggested (find out the mean of the Edad column and substract that from each individual value on that table, create a new column with those values).  It worked!   

I red the article you wrote with Rosenhein and Pratt (ArcUser Magazine - Winter 2011). I found it extremely useful. 

Unfortunately my graphic output from the Moran's I analysis shows that both, the OLS and the GWR residuals are clustered.  According to what I red in another paper related to regression analysis and how to interpret its results (from the professional library of the resouce center), the Moran's I output should not show (at least in the GWR case) that residuals are clustered. 
I looked at log transformations on the geostatistical analysis extension.  I can see those transformations happenning but I do not know how exactly apply that conversion to the data (i.e. create a new field with the log transformation values so I can use them and reduce bias). 

Thank you again.

Leandro
0 Kudos
AndrewTimleck
New Contributor

I looked at log transformations on the geostatistical analysis extension.  I can see those transformations happenning but I do not know how exactly apply that conversion to the data (i.e. create a new field with the log transformation values so I can use them and reduce bias). 


may be the blind leading the blind .... but I'll give it a shot:

1) Add a new field, call it something like log_var1
2) I used Field Type DOUBLE, Precision: 18, Scale : 9
3) Right click on the header of the column, select field calculator
4) from the "Functions" menu on right of box click "Log ()"... and it's added to the "log_var1 = " box
5) From the Fields list, above, find the variable you want to transform and double click it to add it.
6) Click on 'OK' to compute the field.

You can then right click on the header and select "Statistics" and real quick see how the distribution of the variable changes.

Hope that helps.

Andrew
0 Kudos