How to interpret condition numbers from GWR/MGWR analysis?

JustinLeeMPH · ‎11-30-2023

I have a question on hos to interpret the condition number. As I understand, they are a measure of local collinearity, and when values are higher than 30, this indicates that certain explanatory variables are highly correlated and are thus redundant. There's a great Esri video on YouTube where it is stated that the Condition Number tells us "how hard the tool had to work to find a suitable equation." Can anyone explain to me the relationship between how hard the tool has to work and local collinearity?

EricKrause · ‎11-30-2023

I've heard variations of that phrasing various times, and I don't think it's wrong, but I'd argue there are better ways to conceptualize the condition number. It's more about the stability of the estimated coefficients for a given set of explanatory variable values. The coefficient are estimated by inverting a matrix of data values, and the condition number measures how sensitive the coefficients are to small changes in the data values. For low condition numbers, you can alter/remove some of the data, and the coefficients should not drastically change (in other words, the estimated coefficients are stable). But for matrices with very large condition numbers, even small changes to the data values can wildly change the estimated coefficients (meaning that the estimated coefficients are not stable/reliable).

This is a bit easier to understand using simple numbers rather than matrices. Inverting a matrix with a large condition number is equivalent to finding the inverse of a number that is very close to 0. For example, the inverse of 0.001 is 1,000, and the inverse of 0.0001 is 10,000. Even though 0.001 and 0.0001 are very close in absolute value (they're both close to 0), their inverses are very different in absolute value (1000 vs 10000). To put it another way, for values very close to 0, the inverse is very sensitive to small changes of the number. This stability of the inverse is what condition numbers measure for matrices rather than single numbers.

I hope that helps, and let me know if any of that was not clear. There are also many resources available to learn about condition numbers, as they are usually taught in Linear Algebra courses rather than geography or statistics.

View solution in original post

EricKrause · ‎11-30-2023

I've heard variations of that phrasing various times, and I don't think it's wrong, but I'd argue there are better ways to conceptualize the condition number. It's more about the stability of the estimated coefficients for a given set of explanatory variable values. The coefficient are estimated by inverting a matrix of data values, and the condition number measures how sensitive the coefficients are to small changes in the data values. For low condition numbers, you can alter/remove some of the data, and the coefficients should not drastically change (in other words, the estimated coefficients are stable). But for matrices with very large condition numbers, even small changes to the data values can wildly change the estimated coefficients (meaning that the estimated coefficients are not stable/reliable).

This is a bit easier to understand using simple numbers rather than matrices. Inverting a matrix with a large condition number is equivalent to finding the inverse of a number that is very close to 0. For example, the inverse of 0.001 is 1,000, and the inverse of 0.0001 is 10,000. Even though 0.001 and 0.0001 are very close in absolute value (they're both close to 0), their inverses are very different in absolute value (1000 vs 10000). To put it another way, for values very close to 0, the inverse is very sensitive to small changes of the number. This stability of the inverse is what condition numbers measure for matrices rather than single numbers.

I hope that helps, and let me know if any of that was not clear. There are also many resources available to learn about condition numbers, as they are usually taught in Linear Algebra courses rather than geography or statistics.

JustinLeeMPH · ‎11-30-2023

Thanks that's helpful! Do large condition numbers help explain how if say I run GWR like 10 times, and each time the coefficients/statistical significances across the study area can vary substantially each time? Maybe if I was running it at a county level for a single state, so the sample size is quite small.

EricKrause · ‎12-01-2023

I should have been more clear about this, but the GWR model as a whole does not have a condition number. However, every local regression has one. It could be the case that some locations have large condition numbers (meaning that the coefficients in that area are unstable and unreliable) but have low condition numbers in another area, meaning that the coefficients are more reliable and precise.

I'm also not completely clear what you mean by rerunning GWR multiple times. If you rerun it with the same data, you should get the same coefficients each time. The condition number is more related to whether you should trust the values of the coefficients.