Hello Community!

I'm currently trying to work with some regression analysis with census data. I run the OLS analysis for my dependent variable using 5 explanatory variables. I then moved on to GWR using the same dependent and explanatory variables. The problem is that I get an error continuously when trying to perform the GWR analysis. It appears that GWR does not like one of my explanatory variables (Edad - age in Spanish). I checked the VIF values after running the OLS and all value are way below 7.5. I created thematic maps of all the explanatory variables. Perhaps age and distance to coast show some degree of multicollinearity, however, this collinearity does not continue throughout the entire study area. I am attaching a document with all the images and the error.

Any help regarding this issue will be greatly appreciated. Thank you so much for your time and feedback.

Leandro

I'm currently trying to work with some regression analysis with census data. I run the OLS analysis for my dependent variable using 5 explanatory variables. I then moved on to GWR using the same dependent and explanatory variables. The problem is that I get an error continuously when trying to perform the GWR analysis. It appears that GWR does not like one of my explanatory variables (Edad - age in Spanish). I checked the VIF values after running the OLS and all value are way below 7.5. I created thematic maps of all the explanatory variables. Perhaps age and distance to coast show some degree of multicollinearity, however, this collinearity does not continue throughout the entire study area. I am attaching a document with all the images and the error.

Any help regarding this issue will be greatly appreciated. Thank you so much for your time and feedback.

Leandro

I'm sorry you are having problems with this.

Please try transforming the Edad variable into a deviation from the mean:

1) Create a new field in your data set called something like TrEdad.

2) Determine the mean for the Edad variable... let's call that value MeanEdad

3) Calculate the new TrEdad field to be: Edad - MeanEdad

4) Use the TrEdad variable instead of the Edad variable in OLS and notice that the results are the same as they were before (transforming the variable will not change your OLS results).

5) Move to GWR with the TrEdad variable and see if this resolves the Severe model design issue.

Other things that might cause this problem are:

a) you have too few features (with GWR you should really have *at least* 80 or 90 features)

b) you are using a kernel that is too small (you will want to use AICc or CV for the Bandwidth Method parameter so that GWR can find the optimal distance/number of neighbors for you)

c) you are having issues with local multicollinearity ... When you specify AICc or CV for the Bandwidth Method, GWR will be trying a bunch of different distances/number of neighbors in an effort to find one that is optimal. If, along the way, it encounters issues with local multicollinearity (even on one of those trials), it will fail with Severe Model Design (unfortunately...). If this is the problem you are having, you will need to figure out where the multicollinearity is and try to sneak up on determining the optimal distance band/number of neighbors:

** Run GWR and specify "Adaptive" for the Kernel Type parameter

** Just as a test, select "Bandwidth Parameter" for the Bandwidth Method and set the number of neighbors to 40 (I pulled that number off the top of my head, btw, it is not magic or special).

** Run GWR and see if it solves. If it does, map the Condition Numbers in the output feature class. Condition Numbers above 30 indicate the portions of your study area where you very likely are having trouble with local multicollinearity.

** Unless a large portion of the features in your dataset have condition numbers larger than 30, temporarily remove those features from your dataset. (If a large portion of your features have condition numbers larger than 30, you have one or more variables that, while they may not be redundant globally, they are in fact redundant locally).

** Re-run GWR on the subset data, this time specifying Fixed/Adaptive for Kernel Type (whichever you think is most appropriate for your analysis... my bias is to use Fixed), and specifying AICc for Bandwidth Method. My guess is that GWR will solve this time.

** If GWR does solve with the subset, write down the optimal distance or number of neighbors it reports in the progress window (or in the Results Window if you are running in Background).

** Now try running GWR on the full dataset using the optimal distance or number of neighbors (reported in the last step) and "Bandwidth Parameter" for the Bandwidth Method parameter.

** Hopefully GWR will solve now. If it does, it most likely means there are local multicollinearity issues with the smaller distances/number of neighbors. Even though GWR did solve, you will still want to map the condition numbers reported in the output feature class. You don't have confidence in those locations of your study area where the condition number is greater than 30 (because of local multicollinearity problems).

I hope this answers your question. Please let me know if these strategies work for you.

Thanks so much for posting your question! I will check the documentation to make sure this information is included and clear. If something I've written above is not clear, please let me know so that I can improve my explanation. Again thank you, and I'm sorry you are having problems with this.

Best wishes,

Lauren

Lauren M. Scott, PhD

Esri

Geoprocessing/Spatial Statistics Product Engineer