PROBLEMS WITH GWR ANALYSIS

ArnauFernández · ‎08-19-2010

Dear Comunity,

I'm trying to model some climate phenomena such as convection using several variables: evaporation, temperature, moisture, rainfalls, etc. My goal is get a more specific model than traditional climate projections searching the relationships between variables.

Fisrt I've done OLS analysis for see what variables have the best coefficient and what is the best global correlation to fit the phenomena that I want to predict. Then I move to GWR to see geographical variation of this data, but I have some problems with this analysis.

Now I'll explain the steps that I follow:

1. Input my data by txt files (view capture 1), and then I create a raster map using "Conversion Tools/to raster/ASCII to Raster" (capture 2). After I create a Grid of points using"Conversion Tools/From raster/Raster to point" (capture 3) and finally join my variables in an unique grid using an "Spatial Join" (capture 4).
2. Run the GWR analysis and happen this: some analysis go well (capture 5), but others appear "gaps" in my attributes table as "null values" and consequently in my map too(capture 6). I've deduced that this only happen with moisture (it is in %, only two digits)(capture1) because the gaps appear when I use this variable, but I don't understand why.

Somebody could help me?
I'will be so grateful
Thanks

Arnau

PS: Please you can see my captures on Facebook
http://www.facebook.com/album.php?aid=2086715&l=38d02008fd&id=1432887823

LaurenScott · ‎11-09-2010

Hi Yao,
I'm pretty sure I answered all of your questions in an email recently, but I'm glad you posted your questions here too.

1)     The coefficient surfaces are created using a weighted least squares estimator�?� the method is described on pages 52 to 54 of Geographically Weighted Regression by Fotheringham, Brunsdon and Charlton, Wiley 2002�?� the formula is labeled (2.11) and looks something like this:

β �? (i)=(X^T W(i)X)^(-1) X^T W(i)Y

Basically, GWR estimates the coefficient value at each raster cell using the same formula that it uses to estimate the coefficient values at each feature. For each raster cell, a weights matrix is constructed relating that raster cell location to every feature in the dataset�?� nearby features have a bigger weight than features that are farther away. The weighting function itself depends on what you select for Kernel Type (FIXED/ADAPTIVE) and Bandwidth Method (the distance or number of neighbors) when you run GWR. Even though the raster cell being estimated may not be associated with a specific feature (so it doesn�??t have a specific dependent variable or explanatory variables)�?� it still has weighted explanatory variables and can be associated with weighted dependent variables. In fact, the math to estimate the coefficient at a location that coincides with a feature is the same for a location that doesn�??t coincide with a feature; in both cases the coefficient is estimated using weighted X and Y variables.

Because of the weighting function, it helps me to think of the weighted least squares estimator as a type of interpolator; nearby X and Y values provide the data necessary to estimate the coefficient value at each raster cell.

2)     Yes you can interpolate the predicted values from OLS and GWR if what you are modeling (your Y variable) is actually continuous (elevation, temperature, etc.). However, realize that your sampled data come from predictions (not the actual values). The result will be a prediction surface. My recommendation would be to use the actual Y values, where you have them, then obtain predicted Y values for all locations where you can obtain X values, but don�??t have the actual Y values. I hope that makes sense. Then use something like Kriging for the interpolation, if your data can be modeled using a semi-variogram.

3)     OLS and GWR work fine with sampled data, so if you are missing some points, that�??s fine, you just use those points that have data. OLS and GWR do not recognize �??-999�?� or some other numeric code as missing data (they will interpret those values as REAL data values). You can use all locations with a full set of X and Y values to calibrate your model, then predict Y values for locations with a full set of X variables, but no Y values.

Your follow up email also asked about the best distance to use (scale of analysis). Please check out the supplementary spatial statistics tools available for download from the Geoprocessing Resource Center (www.bit.ly/spatialstats). The Incremental Spatial Autocorrelation tool can help you find the distance where spatial processes promoting clustering are most pronounced. These tools include full documentation.

I hope this helps.
Best wishes,
Lauren M Scott, PhD
ESRI
Geoprocessing, Spatial Statistics

View solution in original post

LaurenRosenshein · ‎09-08-2010

Hi Arnau,

My first guess as far as your NULL output is the fact that when the result of a computation is infinity or undefined, the result for nonshapefiles will be Null; for shapefiles the result will be -DBL_MAX = -1.7976931348623158e+308. If your output is a featureclass, and the result is infinity, then the NULL would make sense.

The real question then becomes "Why are you getting results of Infinity". One thing to keep in mind is that the data you are working with (i.e. continuous surfaces created from rasters, where presumably you have many neighboring points with identical or extremely similar values), you are very likely dealing with issues of local multicollinearity. The type of data that you are working with is one clue that you have multicollinearity, and another clue is the Condition Numbers that are part of the GWR output. In general, a condition number over 30 indicates issues with multicollinearity...and from looking at your images it would appear that many of your condition numbers are in the hundreds, some closer to the thousands. This type of local multicollinearity indicates that your results are unstable. To learn more about these issues, see the conceptual documentation on GWR, found here: How GWR Works.

One potential way to deal with this type of multicollinearity is to increase the cell size of your raster, which might have the effect of increasing the variation in both your dependent and independent variables. The help states some additional methods:

Try creating a thematic map for each explanatory variable. If the map reveals spatial clustering of identical values, consider removing those variables from the model or combining those variables with other explanatory variables to increase value variation. If, for example, you are modeling home values and have variables for both bedrooms and bathrooms, you may want to combine these to increase value variation or represent them as bathroom/bedroom square footage. Avoid using spatial regime dummy/binary variables, spatially clustering categorical/nominal variables, or variables with very few possible values when constructing GWR models.

Hopefully this will help point you in the right direction. The online documentation for GWR is a great resource, and may help you solve some of your problems!

Lauren Rosenshein
Geoprocessing Product Engineer
Spatial Statistics

YaoYan · ‎09-27-2010

Hi Lauren:

I have been looking for ESRI experts on GWR for sometime but did not get good feedback yet.

I am glad to see you here but can not find a better way to contact you. Here are 3 questions and I will appreciate it if you have any comments:
1. How the cofficient maps created? IDW intepolation? Why?
2. Can we interpolate the regression (OLS and GWR) results (points) to create a raster map like Kriging map?
3. Are missing data points permittable for some predictor variables? Any impacts on the results.

I am looking forward to hearing from you!

LaurenScott · ‎11-09-2010

Hi Yao,
I'm pretty sure I answered all of your questions in an email recently, but I'm glad you posted your questions here too.

1)     The coefficient surfaces are created using a weighted least squares estimator�?� the method is described on pages 52 to 54 of Geographically Weighted Regression by Fotheringham, Brunsdon and Charlton, Wiley 2002�?� the formula is labeled (2.11) and looks something like this:

β �? (i)=(X^T W(i)X)^(-1) X^T W(i)Y

Basically, GWR estimates the coefficient value at each raster cell using the same formula that it uses to estimate the coefficient values at each feature. For each raster cell, a weights matrix is constructed relating that raster cell location to every feature in the dataset�?� nearby features have a bigger weight than features that are farther away. The weighting function itself depends on what you select for Kernel Type (FIXED/ADAPTIVE) and Bandwidth Method (the distance or number of neighbors) when you run GWR. Even though the raster cell being estimated may not be associated with a specific feature (so it doesn�??t have a specific dependent variable or explanatory variables)�?� it still has weighted explanatory variables and can be associated with weighted dependent variables. In fact, the math to estimate the coefficient at a location that coincides with a feature is the same for a location that doesn�??t coincide with a feature; in both cases the coefficient is estimated using weighted X and Y variables.

Because of the weighting function, it helps me to think of the weighted least squares estimator as a type of interpolator; nearby X and Y values provide the data necessary to estimate the coefficient value at each raster cell.

2)     Yes you can interpolate the predicted values from OLS and GWR if what you are modeling (your Y variable) is actually continuous (elevation, temperature, etc.). However, realize that your sampled data come from predictions (not the actual values). The result will be a prediction surface. My recommendation would be to use the actual Y values, where you have them, then obtain predicted Y values for all locations where you can obtain X values, but don�??t have the actual Y values. I hope that makes sense. Then use something like Kriging for the interpolation, if your data can be modeled using a semi-variogram.

3)     OLS and GWR work fine with sampled data, so if you are missing some points, that�??s fine, you just use those points that have data. OLS and GWR do not recognize �??-999�?� or some other numeric code as missing data (they will interpret those values as REAL data values). You can use all locations with a full set of X and Y values to calibrate your model, then predict Y values for locations with a full set of X variables, but no Y values.

Your follow up email also asked about the best distance to use (scale of analysis). Please check out the supplementary spatial statistics tools available for download from the Geoprocessing Resource Center (www.bit.ly/spatialstats). The Incremental Spatial Autocorrelation tool can help you find the distance where spatial processes promoting clustering are most pronounced. These tools include full documentation.

I hope this helps.
Best wishes,
Lauren M Scott, PhD
ESRI
Geoprocessing, Spatial Statistics