Geographically Weighted Regression difference in Desktop and Pro version

MWmep013 · ‎07-05-2024

Hi,

I am trying to identify a relationship between water demand and land use land form variables.

However, when I am working on Residential Landuse (% in each municipal ward/census block) unit vs water demand in Desktop and Pro versions I am getting different R2 values and on the lower end as well.

In Pro version, I am using number of neighbours and golden search for neighbourhood selection.

In Desktop version, I can select kernel width and bandwidth method as AICc.

Pro version value: R2 0.0584

Desktop version value: R2 0.117

Update: in pro version, using distance band as neighbourhood selection further lessens the R2

I am unable to understand what might be wrong.... Logically Residential landuse should have a positive relationship with water demand and R2 values should be higher..... I had expected the R2 values to be on the higher side and a positive relationship between variables. And why is their a difference between desktop and pro versions? And what to do since neither version offer the options available in the other version?

EricKrause · ‎07-05-2024

Hi @MWmep013,

The reason for the discrepancy is that the GWR tool was reimplemented in ArcGIS Pro 2.3, and the previous version (equivalent to ArcMap) was deprecated. Among other things, the newer version uses a different and more common formula for global and local R-squared and optimizes bandwidths differently. The newer version follows the design and formulas of the GWR4 software (not from Esri).

While you will not find it in the Geoprocessing pane, the deprecated version can still be used through arcpy (for example, in a Python Notebook or the Python Window) with arcpy.stats.GeographicallyWeightedRegression().

You can see the documentation for the deprecated version here:

https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/geographically-weighted-r...

Using the deprecated version should provide the same results as the ArcMap version. As for why using Distance Band lowers the R-squared, I am not certain, but it likely has something to do with your particular data.

Please let me know if you have any other questions.

-Eric

MWmep013 · ‎07-09-2024

Thank you so much Eric.

Just posted a question for Lauren. Can you please check?

LaurenGriffin · ‎07-05-2024

Hi,

I'm sorry the different versions of GWR are creating confusion. If you look at the desktop GWR documentation, you will see a note stating that an enhanced version of GWR is only available with ArcGIS Pro. We recommend you use GWR with ArcGIS Pro.

Regarding your unexpected low R2 values: OLS, GWR, and MGWR all model linear relationships. If the relationship you're modeling is not linear, that would explain the low R2 value. You can create a scatterplot or, even better, use the Local Bivariate Relationships (LBR) tool to test this. With LBR, I recommend you try different scales of analysis (increasing number of neighbor values). When the relationship isn't linear for at least one scale of analysis, you can try transforming the explanatory variable (or both the explanatory and dependent variables) to try to coerce a linear relationship. The Transform Field tool can help with this.

Another thing that's important to keep in mind with regression models... you need to find a model that is fully specified. You can use GWR in an exploratory manner to look at the relationship between a dependent variable and explanatory variable (you'll still want to ensure the relationship is linear), but once you start including multiple variables, unless you are ONLY interested in prediction, you'll want to strive to find a fully specified model (most important, a model that includes all key explanatory variables). I'm happy to provide additional tips for finding a fully specified GWR model, if you're interested.

Good question! I'm glad you asked it.

Best wishes with your project,

Lauren Griffin, Esri

MWmep013 · ‎07-09-2024

Hello Lauren and Eric,

Thank you for your response.

I did try something else. Instead of using % of land use occupied (Residential or any other), I tried using area as it is... it fares better. R2 is now 0.25 for one of the models.

Another thing I changed is earlier I had clubbed different Residential values together (primary residential, informal housing and mixed land use). Now that I have separated it... the model improves. There are positive relationships for primary residential, informal housing but mixed land use has a fairly horizontal line graph.

While I do understand the basic concepts, I am not good with statistics as a whole.

I would be very glad if you can guide me in finding a good GWR model. Tips and tricks are highly welcome.

Thanks!!!

LaurenGriffin · ‎07-09-2024

Hi again 🙂

Eric will likely have suggestions as well, but here are some of the guidelines I follow for my own projects.

While no empirical workflow (step by step recipe) can guarantee you've found a fully specified GWR/MGWR model, completing the following checks will greatly improve your chances:

1) Ask yourself if you are modeling processes that would behave differently in different locations. You should be able to explain why it is plausible that the relationship between water demand and the square area of a particular land use would vary across space. Local models like GWR or MGWR are only appropriate if you have reason to believe the *spatial* processes influencing the relationships are different in different locations. GWR/MGWR isn't appropriate if all your variable relationships are purely deterministic. Hmmm... for your analysis, it might be that the people in some neighborhoods are more dedicated to water conservation than people in other areas ?

2) Ask yourself if your data is appropriate for GWR/MGWR model calibration.

a. Do you have enough observations? (Usually, 100 or more features is sufficient).

b. Do the data cover the study area consistently? You shouldn't have isolated clusters of features or other strong spatial discontinuities.

c. Have you consulted the literature, theory, related/relevant applications, community advocates, and common sense to ensure you're starting with a comprehensive and appropriate list of candidate explanatory variables? Can you articulate the expected relationship for each one?

d. Are local relationships between Y and every candidate X variable linear? (see my earlier notes about modeling linear relationships).

3) Are you starting with a good, robust, defensible global model (see Fotheringham, 2022, citation below)? Finding a properly specified OLS model might be a good way to ensure you're including at least some of the key explanatory variables in your model. The Exploratory Regression tool in ArcGIS Pro can be very helpful here.

If you can't find a properly specified OLS model before moving to GWR/MGWR you will need to come up with some other way to ensure your model includes all key explanatory variables (see 2c and number 4).

4) Does your final GWR/MGWR model pass these additional tests?

a. The data are scaled (especially for MGWR).

b. Residuals from your GWR/MGWR model are free from statistically significant spatial autocorrelation (this can be checked with the Global Moran's I tool).

c. There is no evidence of local multicollinearity (all Condition Numbers are less than 30, as a rule of thumb).

d. Every model explanatory variable is significant for at least some portion of the study area and the coefficient patterns are plausible (the significant coefficients should have the expected sign).

e. GWR/MGWR produces better results than OLS: a larger AdjR2 value and smaller AICc value. If not, OLS is the better model.

f. You're modeling linear relations only (see Sachdeva, Fotheringham, Li, and Yu 2022 for details about how to check this from the model output ... or ask me to send you instructions for creating the charts they recommend).

g. None of the scaled coefficients for the intercept are statistically significant (any variation in coefficient values reflects noise only). For MGWR, ideally the spatial scale for the intercept is global. Failing this check suggests variables may have been omitted from the model (Fotheringham, 2022).

References:

Fotheringham, S., Yang, W., & Kang, W. 2022. Multiscale Geographically Weighted Regression (MGWR). Annals of the American Association of Geographers, 1-19. https://doi.org/10.1080/24694452.2017.1352480

Fotheringham, A. S. 2022. A Comment on “A Route Map for Successful Applications of Geographically Weighted Regression”: The Alternative Expressway to Defensible Regression-Based Local Modeling. Geographical Analysis. https://doi.org/10.1111/gean.12347

Sachdeva, M., Fotheringham, A. S., Li, Z., & Yu, H. (2022). Are We Modelling Spatially Varying Processes or Non-linear Relationships? Geographical Analysis, 54(4), 715-738. https://doi.org/10.1111/gean.12297

Your call, but you might want to try MGWR since it automatically reports GWR results as well. Then you can decide if GWR or MGWR is the better model (if results are very similar, I might go with GWR because it's the simpler model). GWR allows coefficients to vary across the study area for a fixed spatial scale. MGWR does this also, but in addition, it allows each explanatory variable to have its own fixed spatial scale.

Other tips: Unless I have a good justification for using number of neighbors vs distance band, I try both to see which provides the best result. Same for Bisquare vs Gaussian for the Local Weighting Scheme (I try both and go with the best result).

I hope this is helpful!

Best wishes on your project! Keep us posted 🙂

Lauren