POST

Hi Leandro, I'm sorry you are having problems with this. Please try transforming the Edad variable into a deviation from the mean: 1) Create a new field in your data set called something like TrEdad. 2) Determine the mean for the Edad variable... let's call that value MeanEdad 3) Calculate the new TrEdad field to be: Edad  MeanEdad 4) Use the TrEdad variable instead of the Edad variable in OLS and notice that the results are the same as they were before (transforming the variable will not change your OLS results). 5) Move to GWR with the TrEdad variable and see if this resolves the Severe model design issue. Other things that might cause this problem are: a) you have too few features (with GWR you should really have *at least* 80 or 90 features) b) you are using a kernel that is too small (you will want to use AICc or CV for the Bandwidth Method parameter so that GWR can find the optimal distance/number of neighbors for you) c) you are having issues with local multicollinearity ... When you specify AICc or CV for the Bandwidth Method, GWR will be trying a bunch of different distances/number of neighbors in an effort to find one that is optimal. If, along the way, it encounters issues with local multicollinearity (even on one of those trials), it will fail with Severe Model Design (unfortunately...). If this is the problem you are having, you will need to figure out where the multicollinearity is and try to sneak up on determining the optimal distance band/number of neighbors: ** Run GWR and specify "Adaptive" for the Kernel Type parameter ** Just as a test, select "Bandwidth Parameter" for the Bandwidth Method and set the number of neighbors to 40 (I pulled that number off the top of my head, btw, it is not magic or special). ** Run GWR and see if it solves. If it does, map the Condition Numbers in the output feature class. Condition Numbers above 30 indicate the portions of your study area where you very likely are having trouble with local multicollinearity. ** Unless a large portion of the features in your dataset have condition numbers larger than 30, temporarily remove those features from your dataset. (If a large portion of your features have condition numbers larger than 30, you have one or more variables that, while they may not be redundant globally, they are in fact redundant locally). ** Rerun GWR on the subset data, this time specifying Fixed/Adaptive for Kernel Type (whichever you think is most appropriate for your analysis... my bias is to use Fixed), and specifying AICc for Bandwidth Method. My guess is that GWR will solve this time. ** If GWR does solve with the subset, write down the optimal distance or number of neighbors it reports in the progress window (or in the Results Window if you are running in Background). ** Now try running GWR on the full dataset using the optimal distance or number of neighbors (reported in the last step) and "Bandwidth Parameter" for the Bandwidth Method parameter. ** Hopefully GWR will solve now. If it does, it most likely means there are local multicollinearity issues with the smaller distances/number of neighbors. Even though GWR did solve, you will still want to map the condition numbers reported in the output feature class. You don't have confidence in those locations of your study area where the condition number is greater than 30 (because of local multicollinearity problems). I hope this answers your question. Please let me know if these strategies work for you. Thanks so much for posting your question! I will check the documentation to make sure this information is included and clear. If something I've written above is not clear, please let me know so that I can improve my explanation. Again thank you, and I'm sorry you are having problems with this. Best wishes, Lauren Lauren M. Scott, PhD Esri Geoprocessing/Spatial Statistics Product Engineer
... View more
03302011
10:37 AM

0

0

23

POST

Hi Todd, Our mathematics for the Global Moran's I tool is given here: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_Spatial_Autocorrelation_Global_Moran_s_I_works/005p0000000t000000/ This document also provides information about interpretation and FAQs. I hope this helps 🙂 Your post said something about testing your OLS regression residuals in order to determine if GWR is appropriate for your data. Please keep in mind that spatial Autocorrelation in your OLS residuals almost always means you are missing a key explanatory variable from your model. GWR is a regression method that deals with nonstationarity... it is not a fix for misspecification nor a method specifically designed to address spatially autocorrelation residuals. In case you might be interested, we have lots of resources about the tools in the Spatial Statistics toolbox: www.bit.ly/spatialstats . We have a sample script, for example, called Exploratory Regression. The documentation for that tool includes strategies for finding a properly specified OLS model. Anyway, I hope very much that this information is helpful to you. Best wishes, Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
01212011
01:47 PM

0

0

81

POST

The formula is 1/d**(exponent) for d > 1.0. All of the code for the weights are in weightsutilities.py found in <ArcGIS>/ArcToolbox/Scripts I hope this helps, Lauren Scott ESRI Geoprocessing, Spatial Statistics
... View more
01192011
12:28 PM

0

0

9

POST

Hi Karl, There really isn't any way to interpret the General G index directly. If you look at the math for the General G equation ( http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_High_Low_Clustering_Getis_Ord_General_G_works/005p0000000q000000/ ), you see that the numerator is the local product (a running sum of what you get if you multiply each feature's value by all its neighbor's values, for all features). The denominator is the global product (the sum of all features with each other). If we shuffle up the values so that all the high values are next to each other, the numerator gets bigger. If we shuffle them up so that all the low values are together (but the high values remain random), the numerator gets smaller. The index is simply the ratio of the local product to the global product. But the index will be very different depending on the magnitude of the values involved (i.e., all values range from 0.01 to 0.05 vs. all values range from 1234500 to 1234600), and depending on your conceptualization of spatial relationships (if everyone is a neighbor of everyone else, the numerator gets larger; a polygon contiguity conceptualization with lots of features, however, will result in a small numerator and very large denominator). So there isn't a fixed interpretation for the index value itself. The rest of the math for the General G involves figuring out the expected index: the expected index is what that ratio would look like if the values were randomly distributed among your features. Next the tool compares the expected to the observed index values. It is the relationship between the observed/actual and the expected values that determines if the general G index is significant or not. You can think of the pvalue as the answer to this question: what are the chances that my values would be arranged as they are, if the spatial processes promoting the observed spatial pattern were random? Small pvalues mean the pattern would be very unlikely if the processes were random. That's why we focus on looking at the zscore and pvalue when we talk about interpreting General G results. I sure hope this helps! Best wishes with your research! Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
01122011
06:29 PM

0

0

40

POST

Hi Jian, I'm glad you are getting the weights you expect now. Great! 🙂 I am checking the Local Moran's I math... another customer (or perhaps it was you) brought this to my attention. I will certainly make corrections, if necessary, as quickly as I can. Thanks so much! Regarding OLS diagnostics: When an OLS model is not properly specified, often several of the diagnostic tests will fail. For example, if you are missing a key explanatory variable you will see statistically significant spatial autocorrelation in your regression residuals, but you might also see a biased model (nonnormally distributed residuals) and poor Adj Rsquared values. There are several diagnostic checks you should pass in order to feel confident that you've found a properly specified OLS model. These include: 1) You want the coefficients for your explanatory variables to be statistically significant and have the expected sign (+/) ... 2) You want to find explanatory variables that get at different facets of what you are modeling (your dependent variable). Said another way: you don't want explanatory variables that are redundant (multicollinearity). The VIF values for your explanatory variables should be less than about 7.5 (smaller is better). 3) You want your residuals to be free from statistically significant spatial autocorrelation. Most often, spatial autocorrelation (any kind of "structure") in your residuals indicates you are missing a key explanatory variable, but it could also be because you are trying to model nonlinear relationships, or due to strong nonstationarity (significant Koenker)... Until you find explanatory variables that capture the spatial structure in your dependent variable, it will be difficult to get rid of the spatial autocorrelation in your regression residuals. 4) The JarqueBera test assesses the distribution of the residuals. The null hypothesis for this test is that the residuals are normally distributed. So this is one of the diagnostics that you do not want to be statistically significant. If it is significant, it means the model is biased... perhaps the model predicts well in some locations, but not in others... or perhaps it predicts well for low values, but not so well for high values. 5) You want a model that performs well... high adjusted R2. We have a new sample script called Exploratory Regression available from download from www.bit.ly/spatialstats that you might want to check out (see Supplementary Spatial Statistics). It is a bit like Stepwise Regression except that it tries all variable combinations and does not just look for high Adj RSquare... it looks for models that pass the checks listed above. Be sure to read the cautions that come with the documentation for that tool. You might also be interested in this post (the answer to the question: Why would I start with OLS when I know the relationships I'm trying to model are nonstationary?): http://forums.arcgis.com/threads/19614MappingGeographicallyweightedregressionpvalues I hope this helps! Thanks so much for your questions and comments! Best wishes, Lauren
... View more
01102011
10:22 AM

0

0

9

POST

Hi Bill, It's always nice to see your posts 🙂 Maybe "unstable" is a poor choice of words... perhaps "problematic" would be better. The issue, as you know, is when you invert distances less than one, the weights quickly become very large (particularly in relation to weights for distances that are greater than 1). The more serious issue is with coincident points when the distance is exactly zero. To avoid a zero divide we could set those distances to something very, very small... not a great solution, and this would strongly impact results because of the huge weights created. We could add 1 to all distances... also not a great solution. So instead we set all distances less than 1 to be 1. This is actually documented in all of the tools that include the Conceptualization of Spatial Relationships parameter and offer Inverse Distance methods. (It is also documented in the source code). This is the usage tip relating to this: INVERSE_DISTANCE or INVERSE_DISTANCE_SQUARED <snip...> Weights for distances less than 1 become unstable. The weighting for features separated by less than one unit of distance (common with geographic coordinate system projections) is 1. Caution:Analysis on features with a geographic coordinate system projection is not recommended when you select any of the inverse distancebased spatial conceptualization methods (INVERSE_DISTANCE, INVERSE_DISTANCE_SQUARED, or ZONE_OF_INDIFFERENCE). For these Inverse Distance options, any two points that are coincident are given a weight of 1 to avoid zero division. This ensures features are not excluded from analysis. True, some people might actually want the weights to soar for distances less than 1.0 and may prefer a different solution for coincident points. One of the nice things about the Spatial Stat tools being written in Python is that a user can modify the code if he/she doesn't like our implementation. The function to create weights from distances is in <ArcGIS>/ArcToolbox/Scripts/WeightsUtilities.py, called Distance2Weight. Hopefully our solution will be acceptable to most, though. Thanks for posting your comments Bill! Lauren ESRI Geoprocessing, Spatial Statistics
... View more
01102011
09:55 AM

0

0

9

POST

Sorry, I know my post is already beyond long... but one more thing: When I say "best OLS model" I, of course, mean a model with explanatory variables that are supported by theory, common sense, guidance from experts... I think that's all 🙂 Lauren
... View more
01082011
11:49 AM

0

0

181

POST

Hi Andrew, Good questions! 1. How can I determine the number of neighbors used to calibrate each local equation in GWR? For the GWR Kernel Type parameter you have two options: FIXED or ADAPTIVE. FIXED means you decide who is a neighbor by whether or not a feature is within some FIXED distance. AICc or CV will determine an "optimal" distance for you and this distance will be reported in the Progress Window (if you run in foreground) and in the Results Window. ADAPTIVE means you specify the number of closest neighbors to include in local equation calibration. Again, selecting AICc or CV for Bandwidth Method will have the GWR tool identify the "optimal" number of neighbors for you. Soooo, if you use ADAPTIVE, you will know the number of neighbors... it will be the same for all features. If you select FIXED, however, you don't know the number of neighbors for each feature. Here is one possible strategy for figuring this out: (a) If your features aren't points, convert them to points (Feature to Points). (b) Buffer the point features using the distance returned by GWR AICc or CV. (c) Do a spatial join to count the number of features in each buffer. 2) How do I tell which GWR predictions are "good" and which are not so good? The default output from GWR is a residual map; this shows the model over and under predictions. Small residuals are good predictions; large residuals are not so good predictions. The output feature class from GWR also includes local R2 values for each feature. Mapping these local R2 values is another way to see where the model is predicting well and not so well. 3) Why would I start with OLS when I know the relationships I'm trying to model are nonstationary? Unfortunately, GWR does not have the strong diagnostics that OLS has to determine whether or not you have a properly specified model. Further, the fact that you don't have spatial autocorrelation in your GWR residuals is not sufficient evidence that you've found all of the key explanatory variables. One of the most valuable things about GWR is that you can examine coefficient surfaces to see how the relationships you are modeling are changing across you study area. Unfortunately, if you are missing a key explanatory variable, you cannot trust the coefficients 😞 Try this exercise: run OLS with several variables and look at the coefficients... now remove an important variable and rerun OLS... notice that the coefficients have changed (this exercise has the same effect with GWR, it's just easier to see the differences with OLS). In fact, adding or removing a key explanatory variable can cause the coefficients to change 180 degrees (to go from positive to negative or vice verse). Because of this, we take a very conservative stance and strongly recommend that you always start with OLS and do all that you can to find a properly specified OLS model before moving to GWR. There are only two instances where I might move to GWR without first finding a properly specified OLS model: (a) If the best OLS model I found passes the JarqueBera test (the pvalue is NOT statistically significant so my model is NOT biased), AND I am only interested in the predictions (not in the coefficients). (b) If I felt absolutely confident that the ONLY reason I wasn't getting a properly specified OLS model is because of nonstationarity. In this case, I would want to have strong evidence that my variables were, in fact, strongly nonstationary (switching signs). I would run my best OLS model in GWR and examine the coefficients for each explanatory variable to make sure there is strong nonstationarity. I would want to be able to confidently argue why a particular explanatory variable might be nonstationary. For example, why would my Income variable be a strong predictor in the northern part of my study area, but not such a great predictor in the southern part of my study area ? Often if we force ourselves to try to explain why a variable would be nonstationary, we find the key explanatory variables we are missing, and adding these to OLS provides a properly specified model. With the Income variable example, I would try a spatial regime (dummy) variable in my OLS model that has a value of 1 for features in the north, and a value of 0 for features in the south (I would remove this variable when I moved to GWR). Perhaps I would realize, oh, there is quite a bit of military housing (or whatever) in the southern part of the study area... perhaps adding a variable to reflect this to my OLS model will yield a properly specified model. The OLS tool in ArcGIS automatically computes standard errors that are robust to nonstationarity, so unless the the nonstationarity is severe (changing signs), as long as you've included all of the key explanatory variables, you will very often find a properly specified OLS model despite the nonstationarity. There is a sample script on the resource center called Exploratory Regression that can help you find a properly specified OLS model ( www.bit.ly/spatialstats ... Supplementary Spatial Statistics). The documentation for that tool provides strategies for fixing OLS models that aren't properly specified. Again, my strong recommendation is that you start with OLS, then move to GWR once you find a properly specified OLS model. Because Geography does matter, your chances of finding a properly specified OLS model increase as the size of your study area decreases (keep in mind GWR requires a minimum of about 100 features). So if I wanted to find a model for a huge study area, I would break it up into regions and work with each piece separately. After finding properly specified OLS models for each region, I would then move to GWR with the full set of explanatory variables from all regions (the GWR model would not have to be region by region). I hope this helps. Thanks so much for posting your questions! Very best wishes, Lauren M Scott, PhD ESRI Geoprocessing, Spatial Statistics
... View more
01082011
11:32 AM

0

0

181

POST

Hi Jian, I think I may know why you are getting those results, but let's make sure I understand what you've done first: 1) Your Conceptualization of Spatial Relationships is Inverse Distance 2) You've set Number of Neighbors to 8 to ensure each feature has at least 8 neighbors 3) You are taking the default threshold distance (you didn't enter anything for Threshold Distance) 4) Row Standardization was checked ON I did the above and got expected results with and without Row Standardization. As expected, the weights were different for feature pairs. You, however, are seeing identical weights for all neighbors associated with a particular feature. This is what I think might be happening: Because Inverse Distance is unstable for distances less than 1, our inverse distance calculation treats all distances less than 1 as 1. Suppose you are working in a smallish study area and are using unprojected data (Geographic Coordinate System instead of a Projected Coordinate System) so that your units are in Degrees. With unprojected data, for a study area that has less than a 1 degree extent, all of your distances will be less than 1.0. All of the weights will get set to 1.0, and when you row standardize all of the weights for a feature's neighbors will be equal. To remedy, please project your data prior to analysis (always a good idea, but especially a good idea when your analyses involve distance measurements). If this is *not* what's happening I will need additional information so that I can try to reproduce the problem. What version of ArcGIS are you using? Might you be able to send me your data? (I would not need any of the attributes, only the feature geometry). Thanks for asking your question! I hope this resolves your problem; if not we'll try again 🙂 Best wishes, Lauren M. Scott, PhD ESRI Geoprocessing, Spatial Statistics
... View more
01072011
02:26 PM

0

0

9

POST

Hi Yao, I'm pretty sure I answered all of your questions in an email recently, but I'm glad you posted your questions here too. 1) The coefficient surfaces are created using a weighted least squares estimator�?� the method is described on pages 52 to 54 of Geographically Weighted Regression by Fotheringham, Brunsdon and Charlton, Wiley 2002�?� the formula is labeled (2.11) and looks something like this: β �? (i)=(X^T W(i)X)^(1) X^T W(i)Y Basically, GWR estimates the coefficient value at each raster cell using the same formula that it uses to estimate the coefficient values at each feature. For each raster cell, a weights matrix is constructed relating that raster cell location to every feature in the dataset�?� nearby features have a bigger weight than features that are farther away. The weighting function itself depends on what you select for Kernel Type (FIXED/ADAPTIVE) and Bandwidth Method (the distance or number of neighbors) when you run GWR. Even though the raster cell being estimated may not be associated with a specific feature (so it doesn�??t have a specific dependent variable or explanatory variables)�?� it still has weighted explanatory variables and can be associated with weighted dependent variables. In fact, the math to estimate the coefficient at a location that coincides with a feature is the same for a location that doesn�??t coincide with a feature; in both cases the coefficient is estimated using weighted X and Y variables. Because of the weighting function, it helps me to think of the weighted least squares estimator as a type of interpolator; nearby X and Y values provide the data necessary to estimate the coefficient value at each raster cell. 2) Yes you can interpolate the predicted values from OLS and GWR if what you are modeling (your Y variable) is actually continuous (elevation, temperature, etc.). However, realize that your sampled data come from predictions (not the actual values). The result will be a prediction surface. My recommendation would be to use the actual Y values, where you have them, then obtain predicted Y values for all locations where you can obtain X values, but don�??t have the actual Y values. I hope that makes sense. Then use something like Kriging for the interpolation, if your data can be modeled using a semivariogram. 3) OLS and GWR work fine with sampled data, so if you are missing some points, that�??s fine, you just use those points that have data. OLS and GWR do not recognize �??999�?� or some other numeric code as missing data (they will interpret those values as REAL data values). You can use all locations with a full set of X and Y values to calibrate your model, then predict Y values for locations with a full set of X variables, but no Y values. Your follow up email also asked about the best distance to use (scale of analysis). Please check out the supplementary spatial statistics tools available for download from the Geoprocessing Resource Center ( www.bit.ly/spatialstats ). The Incremental Spatial Autocorrelation tool can help you find the distance where spatial processes promoting clustering are most pronounced. These tools include full documentation. I hope this helps. Best wishes, Lauren M Scott, PhD ESRI Geoprocessing, Spatial Statistics
... View more
11092010
01:50 PM

0

0

67

Online Status 
Offline

Date Last Visited 
11112020
02:23 AM
