POST
|
Hmmm... not sure what could be causing this. When you hover with your cursor over the Red X, what error does it report? Please determine if script tools outside of the Spatial Statistics toolbox have the same issues (the Multi-Ring Buffer tool, for example). If all script tools are having problems, my guess is that you have a software installation problem. If, by chance, you have access to another computer with the same software, please try to determine if the problem is machine specific. If you only see the problem on one specific computer, this is more evidence that the problem relates to a software installation issue (and the solution would be to reinstall the software ... but Esri Tech Support might have other suggestions). If you are seeing the problem across more than one computer, the problems might be data specific... ? 1) Try those tools with different input data to see if the problem is consistent. If you continue to get the problem with different data sets, we're back to thinking it's a software installation problem. 2) If the tools work with other data sets, however, the problem could be a data corruption issue. Try right clicking on the feature class in the Table of Contents and selecting Export. Export the data to a shapefile. See if the spatial statistics tools work with the new shapefile. So very sorry you're having problem. Lauren Esri Geoprocessing, Spatial Statistics
... View more
12-12-2011
11:52 AM
|
0
|
0
|
621
|
POST
|
Great questions! Using Global Moran�??s I to find the peak Z in order to help you decide the appropriate scale of your analysis (fixed distance band) is an excellent strategy. [Just by the way, if you are using ArcGIS 10.0, we have a sample script tool called Incremental Spatial Autocorrelation that automates this process and can save you some time. If you are interested in getting this tool, please go to our resources page, www.esriurl.com/spatialstats, and look for �??Supplementary Spatial Statistics�?�]. The message about some features not having any neighbors is indication that your distances are not large enough to ensure that every feature has at least one neighbor. To get a quick description of neighbor distances, you can run the Calculate Distance Band From Neighbor Count tool (in the Spatial Statistics toolbox, Utilities toolset). It returns the minimum nearest neighbor distance (this is the distance between the two features that are closest together in your dataset), the average nearest neighbor distance (this is the average distance each feature is away from its nearest neighbor), and the maximum nearest neighbor distance (this the smallest distance that will ensure that EVERY feature in your dataset has at least one neighbor). Sometimes when you have a couple outlier features, the distance to ensure every feature has at least one neighbor gets rather large, �?� possibly larger than is effective for your analysis. You can check this by using the measurement tool to �??draw�?� the maximum nearest neighbor distance. When outliers are forcing you to use distances that are too large (too large to effectively capture the spatial processes you believe are at work in your data), we recommend: 1) Select all but the outlier features. 2) Run Incremental Spatial Autocorrelation (or Global Moran�??s I for increasing distances) on the selection set. [If you don�??t enter a distance for the beginning distance, btw, it will use the distance that ensures every feature has at least one neighbor]. This analysis will give you the peak distance for the majority of your features (unbiased by the handful of outliers). You still want to include the outliers in your final analysis, however, so: 3) Run the Generate Spatial Weights Matrix tool, selecting Fixed Distance, and using the distance you found in (2). Also, check ON Row Standardization (more about that below), and put 2 for the Number of Neighbors parameter <- the 2 will force each feature (even the outliers) to have at least 2 neighbors and you won�??t get the message about invalid results. With this parameter, the tool will look farther than the distance provided, only if required and only for the outliers, in order to ensure every feature gets at least 2 neighbors. 4) Run your hot spot analysis and/or your Cluster and Outlier Analysis using the .swm file created in (3). You do this by selecting �??Get Weights From File�?� for the Conceptualization of Spatial Relationships parameter and then specifying the path to the .swm file for the Spatial Weights Matrix File parameter. About your concerns with regard to the different sampling densities�?� On the one hand, any bias in your sampling scheme will be reflected in your results�?� what that means: �?� The Hot Spot Analysis (Gi* statistic) works by looking at each feature (each sample) within the context of neighboring features. It compares the local mean of the total positives to the global mean of the total positives and decides if the difference is statistically significant or not. �?� If you were to ONLY sample in the high total positive areas (for whatever reason, as one example of bias), the global mean would be higher than if you had a truly random sample of the entire study area. Because the global mean is higher, you will see fewer hot spots than you might with a truly random sample. On the other hand, if your samples are truly representative of the broader population (what you would see if you could sample all across Canada using a random sampling design), then the fact that you have more samples in some areas and less in others is not so much of a concern. The reason is that the Gi* statistic is conceptually looking at the local mean in relation to the global mean and so the number of features isn�??t so important. The number of features is still considered in determining the z-score, however, but only by the fact that more features means more information. So when a feature has very few neighbors, Gi* still does the very best it can, but it has less information to come up with a result. When a feature has LOTs of features, the Gi* statistic has more information to compute a result. There is not an over count or undercount bias�?� instead, there are just differences in how confident you can be about the results in places with few samples. Does that make sense? If not, please ask and I can try again 🙂 Regarding the extreme Z scores for Global Moran�??s I, 2 things: 1) When you run Global Moran�??s I with increasing distances, or when you run the Incremental Spatial Autocorrelation sample script tool, be sure to check ON for Row Standardization (this doesn�??t make a difference for Hot Spot Analysis, but it is important for the Global tools). If your points were very reflective of the spatial distribution of what you are sampling, you would not check ON for Row Standardization (example: when we have point data reflecting ALL Crimes, then we see lots of points where there are LOTs of crimes and few points where there are few crimes, and that difference in point densities is reflective of crime patterns in our study area). In your case, you have dense samples in some areas, and less dense samples in others and it probably has more to do with where you decided to sample rather than the underlying distribution of the total positives data (I hope that makes sense). To compensate for any bias in your sampling scheme (the idea that some places happened to get lots of samples and others happened to get very few samples), check ON for Row Standardization. 2) If the Total Positives data is skewed (my guess is that it may be; you can check this by creating a histogram of the Total Positives data values�?� if it deviates from a bell curve, your data is skewed), you want to make sure that on average each feature has 8-ish or more neighbors. When you use the Generate Spatial Weights Matrix tool to create a file that represents the Conceptualization of Spatial Relationships among your features (as recommended above), you automatically get a summary of the number of neighbors. [For ArcGIS 10.0: You can access this information from the Results window�?� or you will automatically see this information if you disable Background processing. If you need additional information about this, please don�??t hesitate to ask :)]. Because you put 2 for the Number of Neighbors parameter, the minimum number of neighbors a feature will have is going to be 2. You want the average to be 8 or more (but more than like 100 is starting to get silly). The Gi* statistic is asymptotically normal: as long as you ensure every feature has at least a few neighbors and none of the features have everyone as a neighbor, you can trust your z-score results. With skewed data and excessively too few (zero neighbors) or too many neighbors (everyone is a neighbor), the skewness in the data analyzed spills over into the z-score results and you have less confidence in those values. Okay, lots of information. I hope this is helpful! Best wishes, Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
12-05-2011
02:21 PM
|
0
|
0
|
250
|
POST
|
Hi Mareike, Good questions! First, I would recommend the following steps: 1) overlay your island with the fishnet grid. 2) do the spatial join 3) remove any grid cells that fall off the island, or where it would be impossible to have points. You remove these becasue the Gi* statistic conceptually compares the local mean to the global mean, then decides if the difference is significant... when you have cells that fall outside the study area (so that lots of cells have zero values), it brings the global mean down ... With a "sea of zeros" in your dataset, you tend to see anything that is non-zero appearing as a hot spot. 4) If you have a good distance (scale of analysis) for the Distance Band or Threshold Distance parameter in mind (based on your knowledge of what you are studying), great!!! If not, then if you are using ArcGIS 10.0 you will find the Incremental Spatial Autocorrelation sample script (www.esriurl.com/spatialstats ... find "Supplementary Spatial Statistics" for the download) helpful. If you don't have ArcGIS 10.0, you can get the same results by running the Spatial Autocorrelation tool for increasing distances and looking for a peak value (I can give you more information about that if you need me to). 5) Run Hot Spot Analysis on your remaining fishnet grid cells, using the distance you discovered in (4). Next, to answer your questions about the fishnet grid area (how much of the cell falls into the ocean vs how much falls on the island... I think that's what you are asking): for any of the Distance Conceptualizations (Fixed Distance, for example), the Hot Spot Analysis tool "sees" and treats polygons as point centroids. Does this help? Please let me know if I didn't answer your question and I am happy to try again 🙂 Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
12-05-2011
10:17 AM
|
0
|
0
|
950
|
POST
|
Hi Thom, Good questions 🙂 Let's think about this with points first... two examples involving variations in my point densities: 1) Suppose we have ALL crime incidents. In some parts of our study area there are lots of points because those are places with lots of crime. In other parts, there are few points, because those are low crime areas. The density of the points is a very good reflection (is representative) of what I'm trying to understand: crime spatial patterns. 2) Suppose I've taken soil samples. For some reason (the weather was nice or I happened to be in a location where I didn't have to climb fences, swim through swamps, hike to the top of a mountain, etc.), I have lots of samples in some parts of my study area, but fewer in others. In other words, the density of my points is not strictly the result of a carefully planned random sample; some of my own biases may have been introduced. Further, where I have more points is not necessarily a reflection of the underlying spatial distribution of the data I'm analyzing. For case 1, whether a feature has more neighbors or not is a reflection of actual crime densities. While it is fine to row standardize, in this case I'd rather have the density of my points play a role in my analysis, because they are reflective of what I'm studying. For case 2, I want to minimize any bias that may have been introduced during sampling. When you row standardize, the fact that one feature has 2 neighbors and another has 18 doesn't have a big impact on the results; all the weights sum to 1. Make sense? Okay, polygons... Whenever we aggregate our data we are imposing a structure on it. If that structure is a good reflection of the data I'm studying, I might decide not to row standardize ... but to be honest, I can't think of a good example off the top of my head where that would be the case. Some might argue that census polygons (like tracts) are designed around population, so if the data I'm analyzing has to do with people, I might not want to row standardize... but the way tracts appear in the census represent just one of many, many, many ways they could have been drawn. So with polygon data, I always apply row standardization. I hope this helps! Lauren Lauren M. Scott, PhD Esri Geoprocessing and Analysis, Spatial Statistics
... View more
11-22-2011
11:59 AM
|
1
|
0
|
363
|
POST
|
Hi Ellen, I will look at the data you sent to me. Thank you. With regard to the Expected K values being exactly equal to the Distance values, that is what you will always get. The reason is because we are using a transformation that converts the Expected K value to be equal to distance. For more information on this, please see: Getis, A. Interactive Modeling Using Second-Order Analysis. Environment and Planning A, 16: 173�??183. 1984. The actual formula for the L(d) transformation is given in: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_Multi_Distance_Spatial_Cluster_Analysis_Ripley_s_K_function_works/005p0000000s000000/ I do have a bug in for myself to improve the K Function documentation. Sorry for the confusion! (I can't believe I include the L(d) formula and then don't actually tell you what it does... my very bad! So sorry!). Thank you for your post and for sending me the data. More soon. Lauren Lauren M. Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
10-26-2011
09:28 AM
|
0
|
0
|
1607
|
POST
|
Hi Mareike, Sorry you�??re having trouble with this! The K Function works by simply counting feature pairs: the tool �??visits�?� each feature in the dataset, selects all features within a specified distance of the target feature, and counts the number of feature pairs among the selected features�?�feature pair counts are accumulated as the tool visits every feature. The distance is then increased and the counting repeated�?� and this process continues however many times you�??ve specified for the Number of Distance Bands parameter. These accumulated counts (one for each distance) are converted to an index and plotted on a line graph. When your points tend to be clustered, the accumulated counts are higher, and the index falls above the blue diagonal expected line. When the points tend to be dispersed, counts are lower and the index falls below the expected line. To decide if the clustering or dispersion is significantly different from what you would get if the points were randomly distributed in your study area, the tool uses simulation. The tool randomly pitches your points into your study area 9, 99, or 999 times and for each simulation, it performs the whole distance/counting thing. From all the simulations, it remembers (for each distance) the most clustered index obtained from the random process of pitching your points into the study area, and it remembers the most dispersed index obtained. These extreme values form the confidence envelope, and they show you (given X number of points and the peculiarities of your study area), what is the range of possible indices you can obtain from a random process. For a weighted K function, the confidence envelope follows the observed line and the simulation process is a bit different than I described above. From the graphic you sent, my guess is you are using the unweighted K Function, but please let me know if I�??ve guessed incorrectly. For the unweighed K function, if the study area has a very simple shape (circle, rectangle) the confidence envelope will enclose the expected line. When the study area isn�??t simple (there are peninsulas, or you are working with an �??L�?� shape, for example) then the study area itself can force randomly placed features to be far away from each other, so the confidence envelope appears below the expected line (more dispersed). Okay, so why might someone run the unweighted K function? The K function provides a kind of spatial �??fingerprint�?� of how spatial clustering among your point features changes across multiple scales (across increasing distances). Why is this interesting? Whenever we see clustering in the landscape, we are seeing evidence of underlying spatial processes at work. Statistically significant peaks or dips of the observed index are evidence that spatial processes are operating at the associated spatial scale. Sometimes knowing something about these statistically significant spatial scales provides clues about the underlying processes at work. Comparing the spatial �??fingerprints�?� for two different point datasets within the exact same study area can tell you if their spatial patterns are being influenced by the same or different spatial processes. Some questions for you: You indicated you are analyzing birds on an island. Are you providing a study area polygon when you run the K function? If so, might that polygon be forcing a structure on the simulations that would explain why the confidence envelope falls below the expected line? Do the points you have reflect a sample of bird sitings, or do they represent ALL possible data (like ALL bird nests on the island)? Sampled data, especially when the samples might be biased by observer behavior or the sampling scheme, are not good candidates for the K Function�?� there is the risk that you will model observer behavior rather than bird behavior. You mentioned a projection/transformation warning message or error�?� that sounds like a problem. If possible, I�??m hoping you can send me your data so that we can figure out exactly why you are getting the unexpected results. Please contact me directly at LScott@Esri.com if that might be possible. Again, I�??m sorry you are having problems with the K Function. I hope this information is helpful to you. If anything is unclear, please contact me or reply here and I will do my very best to clarify. Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
10-21-2011
04:00 PM
|
0
|
0
|
1607
|
POST
|
Hi Zahi, If you are using ArcGIS 10.0, you should be able to see the numerical output by: 1) Disabling background processing (click the Geoprocessing Menu, then Geoprocessing Options... UNcheck "Enable" for background processing). OR 2) Open the Results window (Geoprocessing Menu, then Results). You will see an entry for your model... open that, right click on Messages and select View. The "*" to determine statistical significance isn't written to the coefficient or diagnostic tables (as you noticed), but you can easily interpret p-value significance as follows: * P < 0.10 means statistically significant at the 90% confidence level (less conservative) * P < 0.05 means statistically significant at the 95% confidence level * P < 0.01 means statistically significant at the 99% confidence level (more conservative). To learn more about interpreting z-scores and p-values, please see: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/What_is_a_z_score_What_is_a_p_value/005p00000006000000/ The VIF values don't reflect results that are either significant or not significant... rather the rule of thumb is that if a VIF value is larger than about 7.5 there are issues with variable redundancy (multicollinearity) that could potentially lead to model instability. For more information about interpreting OLS diagnostics, please see: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Interpreting_OLS_results/005p00000030000000/ With regard to removing dummy variables when you move from OLS to GWR: this strong recommendation applies to spatial regime dummy variables where you have a bunch of 1's spatially clustered and/or a bunch of 0's spatially clustered. The reason this is a problem for GWR is what's called "local multicollinearity". GWR creates a separate equation for each feature and calibrates it (i.e., computes the coefficients) using nearby features (rather than using ALL features). When the values for a variable cluster spatially (e.g., all 1's) there is the potential that the nearby features used for calibrating an equation will have all the same values and this would result in perfect multicollinearity with the Intercept ...and GWR cannot solve (calibrate) in that situation. Even if there isn't perfect local multicollinearity, when there is very little variation in a variable's values, results can be unstable. You can tell if you are having this problem by looking at your output feature class from GWR. When the condition number for a feature is larger than about 30, there are issues with local multicollinearity and you have less confidence in the results associated with those features. I hope this is clear... if not, please let me know and I can try again 🙂 You also asked about the validity of using GWR for linear data. This is fine as long as you recognize that network relationships are not used to define which features are "nearby" or how much weighting a nearby feature has with regard to calibration. Remember, calibration of the equation associated with a particular feature is based on features that are "nearby"... the idea is that nearby features better reflect relationships between your dependent variable and the explanatory variables than features that are far away. "Nearby" is a function of your answers for the Kernel Type and Bandwidth Method parameters and we can talk more about that if you want, but the point is that distances to determine which features are nearby are computed using plain ol' Euclidean straight-line, as-the-crow-flies distance. With a road network, we might expect two points that are on the same street to be more alike (to deserve a larger weight/influence) than two points that are the same distance but on parallel streets... these type of network relationships won't be considered. So if you use GWR, you should ask yourself if including network relationships in the calibrations for each feature equation is important to the question you want to answer. For other tools in the Spatial Statistics toolbox you can create spatial relationships base on a road network (Generate Network Spatial Weights tool), but unfortunately our GWR tool currently cannot take advantage of this option. I hope this answers your questions! Best wishes, Lauren Lauren M. Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
06-21-2011
03:06 PM
|
0
|
0
|
395
|
POST
|
Hi Bilal, You can add a new field to your table. Right click on that field and select Field Calculator. You can use the calculator to populate the new field with the log of the original values or the original values raised to whatever exponent you think works best. The Add Field and Calculate Field tools can also be used. I hope this helps, Lauren Scott Esri
... View more
05-11-2011
03:30 PM
|
0
|
0
|
1179
|
POST
|
Hi Mike, Yes, it's Standardized Residual. Thanks for pointing out that this isn't documented! We will get this corrected. Best wishes, Lauren Scott Esri Geoprocessing, Spatial Statistics
... View more
05-11-2011
03:24 PM
|
0
|
0
|
1179
|
POST
|
Hi Carrie, A great place to start is our resources page: www.esriurl.com/spatialstats You'll find short videos, a free 1 hour training seminar on spatial pattern analysis, and a couple hot spot analysis tutorials (in fact, one of the tutorials actually uses 911 emergency call data). Best wishes! Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
04-27-2011
09:49 AM
|
0
|
0
|
204
|
POST
|
Hi Leandro, I'm sorry you are having problems with this. Please try transforming the Edad variable into a deviation from the mean: 1) Create a new field in your data set called something like TrEdad. 2) Determine the mean for the Edad variable... let's call that value MeanEdad 3) Calculate the new TrEdad field to be: Edad - MeanEdad 4) Use the TrEdad variable instead of the Edad variable in OLS and notice that the results are the same as they were before (transforming the variable will not change your OLS results). 5) Move to GWR with the TrEdad variable and see if this resolves the Severe model design issue. Other things that might cause this problem are: a) you have too few features (with GWR you should really have *at least* 80 or 90 features) b) you are using a kernel that is too small (you will want to use AICc or CV for the Bandwidth Method parameter so that GWR can find the optimal distance/number of neighbors for you) c) you are having issues with local multicollinearity ... When you specify AICc or CV for the Bandwidth Method, GWR will be trying a bunch of different distances/number of neighbors in an effort to find one that is optimal. If, along the way, it encounters issues with local multicollinearity (even on one of those trials), it will fail with Severe Model Design (unfortunately...). If this is the problem you are having, you will need to figure out where the multicollinearity is and try to sneak up on determining the optimal distance band/number of neighbors: ** Run GWR and specify "Adaptive" for the Kernel Type parameter ** Just as a test, select "Bandwidth Parameter" for the Bandwidth Method and set the number of neighbors to 40 (I pulled that number off the top of my head, btw, it is not magic or special). ** Run GWR and see if it solves. If it does, map the Condition Numbers in the output feature class. Condition Numbers above 30 indicate the portions of your study area where you very likely are having trouble with local multicollinearity. ** Unless a large portion of the features in your dataset have condition numbers larger than 30, temporarily remove those features from your dataset. (If a large portion of your features have condition numbers larger than 30, you have one or more variables that, while they may not be redundant globally, they are in fact redundant locally). ** Re-run GWR on the subset data, this time specifying Fixed/Adaptive for Kernel Type (whichever you think is most appropriate for your analysis... my bias is to use Fixed), and specifying AICc for Bandwidth Method. My guess is that GWR will solve this time. ** If GWR does solve with the subset, write down the optimal distance or number of neighbors it reports in the progress window (or in the Results Window if you are running in Background). ** Now try running GWR on the full dataset using the optimal distance or number of neighbors (reported in the last step) and "Bandwidth Parameter" for the Bandwidth Method parameter. ** Hopefully GWR will solve now. If it does, it most likely means there are local multicollinearity issues with the smaller distances/number of neighbors. Even though GWR did solve, you will still want to map the condition numbers reported in the output feature class. You don't have confidence in those locations of your study area where the condition number is greater than 30 (because of local multicollinearity problems). I hope this answers your question. Please let me know if these strategies work for you. Thanks so much for posting your question! I will check the documentation to make sure this information is included and clear. If something I've written above is not clear, please let me know so that I can improve my explanation. Again thank you, and I'm sorry you are having problems with this. Best wishes, Lauren Lauren M. Scott, PhD Esri Geoprocessing/Spatial Statistics Product Engineer
... View more
03-30-2011
10:37 AM
|
0
|
2
|
1258
|
POST
|
Hi Todd, Our mathematics for the Global Moran's I tool is given here: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_Spatial_Autocorrelation_Global_Moran_s_I_works/005p0000000t000000/ This document also provides information about interpretation and FAQs. I hope this helps 🙂 Your post said something about testing your OLS regression residuals in order to determine if GWR is appropriate for your data. Please keep in mind that spatial Autocorrelation in your OLS residuals almost always means you are missing a key explanatory variable from your model. GWR is a regression method that deals with non-stationarity... it is not a fix for misspecification nor a method specifically designed to address spatially autocorrelation residuals. In case you might be interested, we have lots of resources about the tools in the Spatial Statistics toolbox: www.bit.ly/spatialstats. We have a sample script, for example, called Exploratory Regression. The documentation for that tool includes strategies for finding a properly specified OLS model. Anyway, I hope very much that this information is helpful to you. Best wishes, Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
01-21-2011
01:47 PM
|
0
|
0
|
2294
|
POST
|
The formula is 1/d**(exponent) for d > 1.0. All of the code for the weights are in weightsutilities.py found in <ArcGIS>/ArcToolbox/Scripts I hope this helps, Lauren Scott ESRI Geoprocessing, Spatial Statistics
... View more
01-19-2011
12:28 PM
|
0
|
0
|
687
|
POST
|
Hi Karl, There really isn't any way to interpret the General G index directly. If you look at the math for the General G equation (http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_High_Low_Clustering_Getis_Ord_General_G_works/005p0000000q000000/), you see that the numerator is the local product (a running sum of what you get if you multiply each feature's value by all its neighbor's values, for all features). The denominator is the global product (the sum of all features with each other). If we shuffle up the values so that all the high values are next to each other, the numerator gets bigger. If we shuffle them up so that all the low values are together (but the high values remain random), the numerator gets smaller. The index is simply the ratio of the local product to the global product. But the index will be very different depending on the magnitude of the values involved (i.e., all values range from 0.01 to 0.05 vs. all values range from 1234500 to 1234600), and depending on your conceptualization of spatial relationships (if everyone is a neighbor of everyone else, the numerator gets larger; a polygon contiguity conceptualization with lots of features, however, will result in a small numerator and very large denominator). So there isn't a fixed interpretation for the index value itself. The rest of the math for the General G involves figuring out the expected index: the expected index is what that ratio would look like if the values were randomly distributed among your features. Next the tool compares the expected to the observed index values. It is the relationship between the observed/actual and the expected values that determines if the general G index is significant or not. You can think of the p-value as the answer to this question: what are the chances that my values would be arranged as they are, if the spatial processes promoting the observed spatial pattern were random? Small p-values mean the pattern would be very unlikely if the processes were random. That's why we focus on looking at the z-score and p-value when we talk about interpreting General G results. I sure hope this helps! Best wishes with your research! Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
01-12-2011
06:29 PM
|
0
|
0
|
709
|
POST
|
Hi Jian, I'm glad you are getting the weights you expect now. Great! 🙂 I am checking the Local Moran's I math... another customer (or perhaps it was you) brought this to my attention. I will certainly make corrections, if necessary, as quickly as I can. Thanks so much! Regarding OLS diagnostics: When an OLS model is not properly specified, often several of the diagnostic tests will fail. For example, if you are missing a key explanatory variable you will see statistically significant spatial autocorrelation in your regression residuals, but you might also see a biased model (non-normally distributed residuals) and poor Adj R-squared values. There are several diagnostic checks you should pass in order to feel confident that you've found a properly specified OLS model. These include: 1) You want the coefficients for your explanatory variables to be statistically significant and have the expected sign (+/-) ... 2) You want to find explanatory variables that get at different facets of what you are modeling (your dependent variable). Said another way: you don't want explanatory variables that are redundant (multicollinearity). The VIF values for your explanatory variables should be less than about 7.5 (smaller is better). 3) You want your residuals to be free from statistically significant spatial autocorrelation. Most often, spatial autocorrelation (any kind of "structure") in your residuals indicates you are missing a key explanatory variable, but it could also be because you are trying to model non-linear relationships, or due to strong non-stationarity (significant Koenker)... Until you find explanatory variables that capture the spatial structure in your dependent variable, it will be difficult to get rid of the spatial autocorrelation in your regression residuals. 4) The Jarque-Bera test assesses the distribution of the residuals. The null hypothesis for this test is that the residuals are normally distributed. So this is one of the diagnostics that you do not want to be statistically significant. If it is significant, it means the model is biased... perhaps the model predicts well in some locations, but not in others... or perhaps it predicts well for low values, but not so well for high values. 5) You want a model that performs well... high adjusted R2. We have a new sample script called Exploratory Regression available from download from www.bit.ly/spatialstats that you might want to check out (see Supplementary Spatial Statistics). It is a bit like Stepwise Regression except that it tries all variable combinations and does not just look for high Adj R-Square... it looks for models that pass the checks listed above. Be sure to read the cautions that come with the documentation for that tool. You might also be interested in this post (the answer to the question: Why would I start with OLS when I know the relationships I'm trying to model are non-stationary?): http://forums.arcgis.com/threads/19614-Mapping-Geographically-weighted-regression-p-values I hope this helps! Thanks so much for your questions and comments! Best wishes, Lauren
... View more
01-10-2011
10:22 AM
|
0
|
0
|
687
|
Title | Kudos | Posted |
---|---|---|
1 | 11-22-2011 11:59 AM | |
2 | 02-09-2015 04:52 PM | |
2 | 02-09-2015 02:12 PM | |
1 | 01-23-2014 03:37 PM | |
1 | 08-25-2014 12:58 PM |
Online Status |
Offline
|
Date Last Visited |
11-11-2020
02:23 AM
|