Spatial Weight Matrix, why same weights?

JianLiu2 · ‎01-06-2011

I created a spatial weight matrix table using the "Inverse distance" in Conceptualization of Spatial Relationships. I set the "Nearest Neighbors" as 8. It's weird that in each neighboring region, the weights for every pair of features are exactly same! For example:

Feature from i to j and weight:
                  20   22         0.0769
                  20   21         0.0769
                  20   9          0.0769
                  ....   ...          ....
                  20   29         0.0769

the weight sum is 1. But I thought under the "Inverse distance" option, weights should reflect the distances between features. Apparently, the distance of feature 20 to 22 is different with feature 20 to feature 21, but why same weights were given...???

Thanks so much!!

LaurenScott · ‎01-07-2011

Hi Jian,
I think I may know why you are getting those results, but let's make sure I understand what you've done first:
1) Your Conceptualization of Spatial Relationships is Inverse Distance
2) You've set Number of Neighbors to 8 to ensure each feature has at least 8 neighbors
3) You are taking the default threshold distance (you didn't enter anything for Threshold Distance)
4) Row Standardization was checked ON

I did the above and got expected results with and without Row Standardization. As expected, the weights were different for feature pairs.

You, however, are seeing identical weights for all neighbors associated with a particular feature.

This is what I think might be happening: Because Inverse Distance is unstable for distances less than 1, our inverse distance calculation treats all distances less than 1 as 1. Suppose you are working in a small-ish study area and are using unprojected data (Geographic Coordinate System instead of a Projected Coordinate System) so that your units are in Degrees. With unprojected data, for a study area that has less than a 1 degree extent, all of your distances will be less than 1.0. All of the weights will get set to 1.0, and when you row standardize all of the weights for a feature's neighbors will be equal. To remedy, please project your data prior to analysis (always a good idea, but especially a good idea when your analyses involve distance measurements).

If this is *not* what's happening I will need additional information so that I can try to reproduce the problem. What version of ArcGIS are you using? Might you be able to send me your data? (I would not need any of the attributes, only the feature geometry).

Thanks for asking your question! I hope this resolves your problem; if not we'll try again 🙂
Best wishes,
Lauren M. Scott, PhD
ESRI
Geoprocessing, Spatial Statistics

JianLiu2 · ‎01-09-2011

Hi Jian,
I think I may know why you are getting those results, but let's make sure I understand what you've done first:
1) Your Conceptualization of Spatial Relationships is Inverse Distance
2) You've set Number of Neighbors to 8 to ensure each feature has at least 8 neighbors
3) You are taking the default threshold distance (you didn't enter anything for Threshold Distance)
4) Row Standardization was checked ON

I did the above and got expected results with and without Row Standardization. As expected, the weights were different for feature pairs.

You, however, are seeing identical weights for all neighbors associated with a particular feature.

This is what I think might be happening: Because Inverse Distance is unstable for distances less than 1, our inverse distance calculation treats all distances less than 1 as 1. Suppose you are working in a small-ish study area and are using unprojected data (Geographic Coordinate System instead of a Projected Coordinate System) so that your units are in Degrees. With unprojected data, for a study area that has less than a 1 degree extent, all of your distances will be less than 1.0. All of the weights will get set to 1.0, and when you row standardize all of the weights for a feature's neighbors will be equal. To remedy, please project your data prior to analysis (always a good idea, but especially a good idea when your analyses involve distance measurements).

If this is *not* what's happening I will need additional information so that I can try to reproduce the problem. What version of ArcGIS are you using? Might you be able to send me your data? (I would not need any of the attributes, only the feature geometry).

Thanks for asking your question! I hope this resolves your problem; if not we'll try again 🙂
Best wishes,
Lauren M. Scott, PhD
ESRI
Geoprocessing, Spatial Statistics

Thanks indeed! You are right, it was because I used a GCS and almost all features were in 1 degree of unit which made all spatial weights equal to 1. I've changed to PCS instead and gotten the reasonable result. 🙂

Just another question: while doing an OLS analysis, does a perfect standard normal distribution of OLS model residuals means the model has already include all important factors, so the spatial autocorrelation analysis for residuals is not necessary any more?
(I think no matter what's the residuals distribution like, the spatial autocorrelation is still needed... so there is not direct relationship between these two tests... I guess.)

And, is there any relationship between Koenker (BP) statistics and spatial autocorrelation for residuals distribution? I feel they somehow give similar implications...

Finally, I think there might be a mistake in the Anselin Local Moran's I equation in the "Spatial Statistical Toolbox-Mapping Clusters Toolset" ArcGIS 10 online reference. The second "Xi" should be "Xj", I suppose.

see:
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_Cluster_and_Outlier_Analysis_Ansel...

Thank you very much again 😮

Jian

LaurenScott · ‎01-10-2011

Hi Bill,
It's always nice to see your posts 🙂
Maybe "unstable" is a poor choice of words... perhaps "problematic" would be better. The issue, as you know, is when you invert distances less than one, the weights quickly become very large (particularly in relation to weights for distances that are greater than 1). The more serious issue is with coincident points when the distance is exactly zero. To avoid a zero divide we could set those distances to something very, very small... not a great solution, and this would strongly impact results because of the huge weights created. We could add 1 to all distances... also not a great solution. So instead we set all distances less than 1 to be 1. This is actually documented in all of the tools that include the Conceptualization of Spatial Relationships parameter and offer Inverse Distance methods. (It is also documented in the source code). This is the usage tip relating to this:

INVERSE_DISTANCE or INVERSE_DISTANCE_SQUARED
<snip...>
Weights for distances less than 1 become unstable. The weighting for features separated by less than one unit of distance (common with geographic coordinate system projections) is 1.

Caution:Analysis on features with a geographic coordinate system projection is not recommended when you select any of the inverse distance-based spatial conceptualization methods (INVERSE_DISTANCE, INVERSE_DISTANCE_SQUARED, or ZONE_OF_INDIFFERENCE).

For these Inverse Distance options, any two points that are coincident are given a weight of 1 to avoid zero division. This ensures features are not excluded from analysis.

True, some people might actually want the weights to soar for distances less than 1.0 and may prefer a different solution for coincident points. One of the nice things about the Spatial Stat tools being written in Python is that a user can modify the code if he/she doesn't like our implementation. The function to create weights from distances is in <ArcGIS>/ArcToolbox/Scripts/WeightsUtilities.py, called Distance2Weight. Hopefully our solution will be acceptable to most, though.
Thanks for posting your comments Bill!
Lauren
ESRI
Geoprocessing, Spatial Statistics

LaurenScott · ‎01-10-2011

Hi Jian,
I'm glad you are getting the weights you expect now. Great! 🙂

I am checking the Local Moran's I math... another customer (or perhaps it was you) brought this to my attention. I will certainly make corrections, if necessary, as quickly as I can. Thanks so much!

Regarding OLS diagnostics:
When an OLS model is not properly specified, often several of the diagnostic tests will fail. For example, if you are missing a key explanatory variable you will see statistically significant spatial autocorrelation in your regression residuals, but you might also see a biased model (non-normally distributed residuals) and poor Adj R-squared values. There are several diagnostic checks you should pass in order to feel confident that you've found a properly specified OLS model. These include:
1) You want the coefficients for your explanatory variables to be statistically significant and have the expected sign (+/-) ...
2) You want to find explanatory variables that get at different facets of what you are modeling (your dependent variable). Said another way: you don't want explanatory variables that are redundant (multicollinearity). The VIF values for your explanatory variables should be less than about 7.5 (smaller is better).
3) You want your residuals to be free from statistically significant spatial autocorrelation. Most often, spatial autocorrelation (any kind of "structure") in your residuals indicates you are missing a key explanatory variable, but it could also be because you are trying to model non-linear relationships, or due to strong non-stationarity (significant Koenker)... Until you find explanatory variables that capture the spatial structure in your dependent variable, it will be difficult to get rid of the spatial autocorrelation in your regression residuals.
4) The Jarque-Bera test assesses the distribution of the residuals. The null hypothesis for this test is that the residuals are normally distributed. So this is one of the diagnostics that you do not want to be statistically significant. If it is significant, it means the model is biased... perhaps the model predicts well in some locations, but not in others... or perhaps it predicts well for low values, but not so well for high values.
5) You want a model that performs well... high adjusted R2.

We have a new sample script called Exploratory Regression available from download from www.bit.ly/spatialstats that you might want to check out (see Supplementary Spatial Statistics). It is a bit like Stepwise Regression except that it tries all variable combinations and does not just look for high Adj R-Square... it looks for models that pass the checks listed above. Be sure to read the cautions that come with the documentation for that tool.

You might also be interested in this post (the answer to the question: Why would I start with OLS when I know the relationships I'm trying to model are non-stationary?): http://forums.arcgis.com/threads/19614-Mapping-Geographically-weighted-regression-p-values

I hope this helps! Thanks so much for your questions and comments!

Best wishes,
Lauren

MollyCohn · ‎01-19-2011

I have a somewhat related question: does anyone know what default formula is used to calculate inverse distance, or where I can obtain this information? Is it as simple as 1/d (or e^-d)?

Thanks!

LaurenScott · ‎01-19-2011

The formula is 1/d**(exponent) for d > 1.0.
All of the code for the weights are in weightsutilities.py found in <ArcGIS>/ArcToolbox/Scripts

I hope this helps,
Lauren Scott
ESRI
Geoprocessing, Spatial Statistics