ArcGIS Pro 3.0.2: The GRW generates “multicollinearity” error despite the fact that only one explanatory variable is used,

586
3
Jump to solution
02-08-2024 01:53 PM
JamalNUMAN
Legendary Contributor

ArcGIS Pro 3.0.2: The GRW generates “multicollinearity” error despite the fact that only one explanatory variable is used,

 

I couldn’t figure out how the multicollinearity can be an issue when using the GWR with only one explanatory variable.

 

What could be the issue here?

 

Clip_765.jpg

----------------------------------------
Jamal Numan
Geomolg Geoportal for Spatial Information
Ramallah, West Bank, Palestine
0 Kudos
2 Solutions

Accepted Solutions
EricKrause
Esri Regular Contributor

Hi @JamalNUMAN,

While the error only talks about correlations between explanatory variables (which obviously will not be a problem for a single explanatory variable), a couple other things can also cause this error. 

GWR builds regression models using neighborhoods around each feature, and if any of these neighborhoods have a constant value for the dependent variable or any of the explanatory variables, you will also encounter this error.

You should trying using different neighborhood settings (generally using larger neighborhoods), or attempt to locate the areas of constant value.  The Neighborhood Summary Statistics tool can be used to find local standard deviations, which can help you identify areas with constant values of the variables.

I hope this helps, and please let me know if you have any other questions.

View solution in original post

AlbertoNieto1
Esri Contributor

Hi Jamal,

Thank you for posting your question, and hope you're doing well. 

The error is a bit nuanced - while multicollinearity is typically associated with multiple variables, the error can also occur even with a single variable that has low variation in a small neighborhood for a feature. Please forgive me if you're already aware, but GWR works with the concept of neighborhoods: 

AlbertoNieto1_0-1707436614823.png

Within this neighborhood for each feature, we expect variation in the explanatory and dependent variables to be able to create a local regression model. When the variables within that neighborhood do not have variation, you can run across this error - even when using a single variable. 

Here's a simple thing you might try to check if this is the case: Create a map of your explanatory variable, and assess the smallest neighborhood size you are running the tool with (30 neighbors, by default) is it possible that the neighborhood being created doesn't have variation in your explanatory variable? Hint: You can use Neighborhood Explorer if you're on Pro 3.2 to check this too.  

There's a few things you can do to try to still proceed with this single variable: 

1. Increase the starting neighborhood size. Larger neighborhoods often have a better chance of including variation needed for those local models. To increase the starting neighborhood size, set the Neighborhood Selection Method to "User defined" and test with various increasing sizes. 

AlbertoNieto1_1-1707437413241.png

 

2. Use the Gaussian Kernel. The Gaussian Kernel essentially makes all features neighbors of all features, increasing the neighborhood size but diminishing the effect of distant neighbors. This may help, as the model essentially uses all the data and allows the full variation in your variable to be used in the local model.

AlbertoNieto1_0-1707437825137.png

 

 

Despite these steps, please be aware that GWR really shines when local variation is present, and the fact that you're running into this error may be indicating data problems that should be corrected. It's not guaranteed that this is the case, but please consider this if you proceed with that single variable. 

Hope this helps, and thanks again for your question Jamal. 

Alberto

PS: Just realized that Eric already answered your question more concisely! 

View solution in original post

3 Replies
EricKrause
Esri Regular Contributor

Hi @JamalNUMAN,

While the error only talks about correlations between explanatory variables (which obviously will not be a problem for a single explanatory variable), a couple other things can also cause this error. 

GWR builds regression models using neighborhoods around each feature, and if any of these neighborhoods have a constant value for the dependent variable or any of the explanatory variables, you will also encounter this error.

You should trying using different neighborhood settings (generally using larger neighborhoods), or attempt to locate the areas of constant value.  The Neighborhood Summary Statistics tool can be used to find local standard deviations, which can help you identify areas with constant values of the variables.

I hope this helps, and please let me know if you have any other questions.

AlbertoNieto1
Esri Contributor

Hi Jamal,

Thank you for posting your question, and hope you're doing well. 

The error is a bit nuanced - while multicollinearity is typically associated with multiple variables, the error can also occur even with a single variable that has low variation in a small neighborhood for a feature. Please forgive me if you're already aware, but GWR works with the concept of neighborhoods: 

AlbertoNieto1_0-1707436614823.png

Within this neighborhood for each feature, we expect variation in the explanatory and dependent variables to be able to create a local regression model. When the variables within that neighborhood do not have variation, you can run across this error - even when using a single variable. 

Here's a simple thing you might try to check if this is the case: Create a map of your explanatory variable, and assess the smallest neighborhood size you are running the tool with (30 neighbors, by default) is it possible that the neighborhood being created doesn't have variation in your explanatory variable? Hint: You can use Neighborhood Explorer if you're on Pro 3.2 to check this too.  

There's a few things you can do to try to still proceed with this single variable: 

1. Increase the starting neighborhood size. Larger neighborhoods often have a better chance of including variation needed for those local models. To increase the starting neighborhood size, set the Neighborhood Selection Method to "User defined" and test with various increasing sizes. 

AlbertoNieto1_1-1707437413241.png

 

2. Use the Gaussian Kernel. The Gaussian Kernel essentially makes all features neighbors of all features, increasing the neighborhood size but diminishing the effect of distant neighbors. This may help, as the model essentially uses all the data and allows the full variation in your variable to be used in the local model.

AlbertoNieto1_0-1707437825137.png

 

 

Despite these steps, please be aware that GWR really shines when local variation is present, and the fact that you're running into this error may be indicating data problems that should be corrected. It's not guaranteed that this is the case, but please consider this if you proceed with that single variable. 

Hope this helps, and thanks again for your question Jamal. 

Alberto

PS: Just realized that Eric already answered your question more concisely! 

JamalNUMAN
Legendary Contributor

Thank you, Erci and Alberto, for the very useful input. It works fine with the settings indicated in the screenshot below, as per the guidance from Alberto.

Clip_781.jpg

 

----------------------------------------
Jamal Numan
Geomolg Geoportal for Spatial Information
Ramallah, West Bank, Palestine
0 Kudos