|
POST
|
Hi Franky, You are right that if your data is skewed you want to ensure that your features all have at least several neighbors, and 8 is a good rule of thumb. The question of whether to use K-nearest neighbors or fixed distance is really determined by the question that you're asking. Fixed distance is often a good option because it ensures that your scale is consistent across the whole study area, but if you want to ensure that all of your features have at least 8 neighbors what you might want to do is use the "Generate Spatial Weights Matrix tool", which allows you to set a fixed distance (and choose your fixed distance according to the question that you're asking), and then optionally lets you set a minimum number of neighbors. That way it will use the fixed distance band everywhere, but for those features where the fixed distance does not ensure that a feature has 8 neighbors, it will extend the distance just for those features to ensure they have the minimum number of neighbors that you set. Hope this helps. Lauren Rosenshein Geoprocessing Product Engineer
... View more
06-14-2011
09:38 AM
|
0
|
0
|
1006
|
|
POST
|
Hi Franky, That is a great question! Basically, the zone of indifference is not an option in the Generate Spatial Weights Matrix tool because we made the decision that for almost all use cases the difference between zone of indifference and fixed distance band is very, very small. Once the distance band (d) gets large (even above 10, for instance), the weight of features outside of that distance band is 1/d...which becomes negligible for most use cases). We still have zone of indifference in the Hot Spot Analysis tool and others because once you provide a parameter option you have to continue to provide those parameters for backward compatibility, but when we designed the Generate Spatial Weights Matrix tool we decided not to include it. I would suggest running the same analysis with Fixed Distance Band, and you should see very, very similar results. Hope this helps! Lauren Rosenshein Geoprocessing Product Engineer
... View more
04-20-2011
10:47 AM
|
0
|
0
|
1177
|
|
POST
|
Hi Bilal, That's a great question, and brings up a very important requirement of GWR. Unfortunately, Geographically Weighted Regression does work best when you have lots of observations with geographic coordinate systems. Ideally, you'll have at least 80 or so features. The dataset that you have seems very interesting, but with only five features for analysis GWR is not going to be the right tool for the job. Hope this helps! Lauren Rosenshein Geoprocessing Product Engineer
... View more
04-20-2011
10:41 AM
|
0
|
0
|
1146
|
|
POST
|
Hi Franky, I'm glad to hear you've found the resources and are looking at the peak z-score distances to choose a distance band for hot spot analysis. Hopefully you're using the Incremental Spatial Autocorrelation tool to do this, which you can find in our Supplementary Spatial Statistics toolbox. This is a great question that you bring up, and its especially common, for instance, when you have a lot of features with similar sizes (small polygons in the more urban areas), and then a couple of features that are much larger (big polygons in more rural/suburban areas). A good option for dealing with this is to create a Spatial Weights Matrix. From the Generate Spatial Weights Matrix tool, you can set the Distance Band to the distance at which the z-score peaks, and then use the Number of Neighbors parameter to make sure that for those features that don't have any neighbors at the distance that you chose, they'll have at least the number of neighbors that you set. So, for example, if you choose a distance of 500m, and there are a couple of features that don't have any neighbors at 500m, but you also set a Number of Neighbors at 2...then for those features WITHOUT any neighbors at the specified distance band we'll increase the threshold to ensure those particular features have at least 2 neighbors. You can only do it using the Generate Spatial Weights Matrix option, and then from Hot Spot Analysis you'll choose to Get Spatial Weights From File. We often use this option when dealing with this issue. Hope this helps! Lauren Rosenshein Geoprocessing Product Engineer Check out the latest Spatial Statistics resources at http://esriurl.com/spatialstats
... View more
04-18-2011
04:28 PM
|
0
|
0
|
1459
|
|
POST
|
Hi Matt, That's a great question, and actually the answer is a little more involved than it might seem. There is actually a whitepaper that may help explain some of the basic principles that are being used by integrate. I think the important thing to remember is that if 2 points are within the threshold distance of eachother, they will both be moved to a point in between them, and that will continue happening as the tool continues to look at more points to see if they are within the threshold distance. Making sure you pick a threshold that is as small as possible within the context of your analysis is ideal. Check out this whitepaper: Understanding Coordinate Management in the Geodatabase for more details. Hope this helps! Lauren Rosenshein Geoprocessing Product Engineer
... View more
03-29-2011
03:51 PM
|
0
|
0
|
3849
|
|
POST
|
Hi Mike, Two more great questions! 🙂 First, you are absolutely right that the Koenker (BP) Statistic is a Breusch-Pagan test for heteroscedasticity. As far as the robust probabilities are concerned, those are calculated using White's heteroscedasticity-consistent standard errors. Hope this helps! Lauren Rosenshein Geoprocessing Product Engineer
... View more
03-28-2011
08:58 AM
|
0
|
0
|
2335
|
|
POST
|
Hi Juan! I'm glad to hear that you're finding the resources useful. It sounds like you are on the right track using the Incremental Spatial Autocorrelation tool to graph the intensity of spatial clustering at increasing distances, and its great that you found a pronounced peak, indicating that at that distance the spatial clustering of your phenomenon is most intense. All of that sounds great. Your concern about running the Spatial Autocorrelation tool with a distance that does not ensure that every feature has at least one neighbor is very valid. If the distance that you choose does not ensure that every feature has at least one neighbor, then you cannot trust the results of your analysis. And this is also true for the Hot Spot Analysis tool, or any of the spatial statistics tools that use a conceptualization of spatial relationships. So, the first steps that you took by running the Calculate Distance Band from Neighbor Count and the Incremental Spatial Autocorrelation tool are important first steps to ensure that you do have enough neighbors being considered so that you can trust the results of your analysis. Hope this helps! Lauren Rosenshein Geoprocessing Product Engineer Check out the latest Spatial Statistics resources at http://esriurl.com/spatialstats
... View more
03-22-2011
08:44 AM
|
0
|
0
|
373
|
|
POST
|
Hi mbrigha1, The tool that you are looking for is called Integrate, in the Data Management toolbox. The integrate tool will snap features that are close together to a common location (based on a threshold that you define). Once you use integrate, you can then use Collect Events to get a count of the number of points at each location. One word of caution is that Integrate actually changes your input data...so make sure that you do a Copy Features before you use the Integrate tool so that you don't lose your original data. Hope this helps! Lauren Rosenshein Geoprocessing Product Engineer
... View more
03-18-2011
08:25 AM
|
0
|
0
|
3849
|
|
POST
|
Hi Alexei, The post about fuzzy c-means clustering may help you moving forward with your analysis, so definitely check that out. From there we link out a sample script that uses R to do some c-means cluster analysis. I also mention that we're working on a Group Similar Features tool for ArcGIS 10.1 that is an an implementation of K-Means ++ Clustering, with optional space/time constraints...which we're really excited about! Alternatively, GeoDa has Bivariate Global and Local Spatial Autocorrelation analysis, so you may want to check that out. Hope this helps! Lauren Rosenshein Geoprocessing Product Engineer
... View more
03-16-2011
09:52 AM
|
0
|
0
|
1845
|
|
POST
|
Hi Arnold, That's a great question. I'm guessing that you are using the Generate Spatial Weights Matrix tool, which would explain why you aren't seeing the Polygon Contiguity (First Order) option in the drop-down. In the Generate Spatial Weights Matrix tool, there are actually two options for polygon contiguity...Contiguity_Edges_Only and Contiguity_Edges_Corners. The Contiguity_Edges_Only option is the exact same conceptualization as Polygon Contiguity (First Order)...they both use edges only to determine if features are neighbors, while the Contiguity_Edges_Corners option uses either edges OR corners. A good way to think about this is by visualizing a chess board. The Contiguity_Edges_Only and Polygon Contiguity (First Order) options consider features neighbors ONLY if they are directly next to each other (ie they share edges). The Contiguity_Edges_Corners option considers features neighbors if they are next to each other, and also if they are diagonal to each other with a corner touching. See attached image. So, there are several tools in the Spatial Statistics toolbox that refer to the edges-only option as Polygon Contiguity (first order)...including hot spot analysis, spatial autocorrelation, cluster and outlier analysis, and high/low clustering. The Generate Spatial Weights Matrix tool, however, calls this option CONTIGUITY_EDGES_ONLY because there are two options and those names are more descriptive. Hope this helps! Lauren Rosenshein Geoprocessing Product Engineer
... View more
03-14-2011
01:13 PM
|
0
|
0
|
1177
|
|
POST
|
Hi torock83, This is a great question, and we've got a ton of resources that should help you moving forward! You are absolutely right that you need to run OLS before you can move onto GWR to ensure that you've found a properly specified model. There are actually 6 things you have to check before you can move onto GWR. We recently published an ArcUser article that outlines these checks, Finding a Meaningful Model. This should be very helpful. There are also a ton of other resources at http://esriurl.com/spatialstats, including a Regression Analysis Tutorial. As far as using the Koenker test to determine if GWR may improve your OLS model, you are absolutely right. A statistically significant Koenker test does mean that there may be nonstationarity in your model that GWR can account for. That being said, a statistically significant Koenker test would have an asterisk next to it, and be lower than 0.05. In your case, your Koenker test is actually not statistically significant. BUT...don't be discouraged. That doesn't mean GWR wont help you. You've actually got some work to do before you can determine if GWR will help, because you have other issues with your OLS model. From the output you included we can see that the Jarque-Bera statistic is statistically significant (notice the asterisk*), which means that you have a biased model, and can't trust the results of your OLS analysis. Once you go through the 6 checks described in the ArcUser article you may find that your Koenker test has changed and GWR may help after all. It's all a matter of finding a good OLS model first! Hope this helps! Lauren Rosenshein Geoprocessing Product Engineer
... View more
03-14-2011
10:27 AM
|
0
|
0
|
2566
|
|
POST
|
Hi Alper, Thanks for your question. If I understand your question correctly, you want to know which individual features are related to the peaks in your resulting graph from Incremental Spatial Autocorrelation. Is that right? Actually, the peaks in the graph don't represent individual peaks, they represent the entire dataset at a particular distance band. What does this mean? Spatial Autocorrelation is a global spatial autocorrelation tool, which means that it is measuring the intensity of clustering for the entire dataset given a particular distance band. A peak in the graph indicates a distance at which the clustering is the most intense. So, if you have several peaks that means that there are several distances, or neighborhood sizes, that reflect intense spatial processes (clustering). That distance band can then be used as the distance band for further analysis, for instance in a Hot Spot Analysis, or as the radius size in a density analysis. We talk a lot more about using this tool in this free training seminar: Introduction to Spatial Pattern Analysis. If you are interested in changes in clustering over time, then what you'd want to do is run the Spatial Autocorrelation (Moran's I) tool multiple times using data from different time periods. You could do this be doing a Select by Attribute on your data and choosing only data from, for instance, the first month of your dataset and running Spatial Autocorrelation on that subset of your data and see how intense the clustering is. You could then do the same analysis for the next month worth of data, and so on. We've got some great examples of this type of "change over time" analysis in this tutorial: Spatial Statistics Modelbuilder Tutorial for ArcGIS 10. The tutorial does a Hot Spot Analysis instead of a Spatial Autocorrelation analysis...but it helps explain the idea. This video also talks about analysis over time: The Spatial Distribution of Piracy. Hope this helps! Lauren Rosenshein Geoprocessing Product Engineer
... View more
03-14-2011
10:16 AM
|
0
|
0
|
1100
|
|
POST
|
Hi Mike, That's a GREAT question, and one that we get often! The simple answer is that you are right, and dummy variables should be represented as individual fields for each category. That being said, that's only necessary when you have more than 2 "dummy categories". In your example, where you have d1 and d2, it would be fine to have just one field representing one of those categories, either d1 or d2, and have zeros and ones for that field, since they would be mutually exclusive. It gets trickier when you have more than 2 categories, however, and when that's the case your example works well. In the example below, there are 3 dummy categories (think urban, suburban, rural for instance). There are actually two ways to do this. One way is to just create 2 fields, one for d1 and 1 for d2...and while you aren't creating a field for d3 it is still represented because those features that don't fall into d1 or d2 will have zeros for both and the calculations will reflect that. The other way is to create a field for each category. This is especially useful if you are doing an exploratory regression analysis or a stepwise regression analysis and you want to find out which variables are most important when explaining your dependent variable. In this case, having the 3rd field to represent the 3rd dummy category would be important. Factor d1 d2 d3 1 0 0 1 2 1 0 0 3 0 1 0 Hope this helps! Lauren Rosenshein Geoprocessing Product Engineer
... View more
03-14-2011
10:03 AM
|
0
|
0
|
7147
|
|
POST
|
Hi Matthew, I'm really glad to hear that you're finding the Incremental Spatial Autocorrelation tool useful! Actually, we are working really hard on some improvements right now and the tool is going to be CORE in ArcGIS with the 10.1 release, which we're really excited about! One of the improvements that we're making relates to what you're asking about as far as the name of the graphs. In the 10.1 version of the tool we're making the output a PDF, for which you will choose the name and location. This should help with the issue of overwriting graphs. For now though, what I've been doing is renaming the graph once it is created, so that on the next run of the tool the new graph wont overwrite the old one (since it wont have the same name). Sorry for the inconvenience there! As far as the performance issue, I need a little more information to understand what's going on and hopefully get to the bottom of it. What beginning distance and distance increment are you using? How does that compare to the full extent of your data? Also, how many distance bands are you testing? A screenshot of the tool right before you run it will help me get a picture of exactly what you're doing. Also, are you seeing this performance issue only when you run the tool from inside of a model? If so, what exactly does the model do? A screenshot of the model would help as well. Lastly, is is possible to share your data? The best way for us to figure out what's going on is to try to repro it, ideally with your data. If not possible, the more specific you can be about the data (ie the geographic extent, the type of values in your analysis field, etc., the better. Hopefully we'll be able to get to the bottom of this! Lauren Rosenshein Geoprocessing Product Engineer
... View more
03-14-2011
09:48 AM
|
0
|
0
|
452
|
| Title | Kudos | Posted |
|---|---|---|
| 2 | 09-08-2010 01:48 PM |
| Online Status |
Offline
|
| Date Last Visited |
11-11-2020
02:23 AM
|