POST

These help topics should help: http://resources.arcgis.com/en/help/main/10.1/#/What_is_a_z_score_What_is_a_p_value/005p00000006000000/ http://resources.arcgis.com/en/help/main/10.1/#/How_Hot_Spot_Analysis_Getis_Ord_Gi_works/005p00000011000000/ The Gi* result (shown in the math for the second link) is the zscore. The zscore is a standard deviation. I hope this provides what you need. Best wishes, Lauren Lauren M Scott, PhD Esri
... View more
02112013
11:35 AM

0

0

81

POST

Hi Peter, Bounding Geometry : If you are using Integrate and Collect Events to create weighted points from incident data, there is no minimum bounding geometry. (The only two tools in the Spatial Statistics toolbox that use a bounding geometry are K Function and Average Nearest Neighbor). Because Hot Spot Analysis requires weighted points, you need to aggregate incident data. When you run Integrate (after you make a backup copy of your original input dataset), features within the distance you specify are snapped together. The input feature geometry is modified so instead of clusters of nearby features, you get stacks of coincident features. When you run collect events, you replace those stacks with a single point attributed with the number of incidents on the stack... so you get wieghted points. With the fishnet aggregation scheme the fishnet itself imposes a bounding geometry so you DO have to worry about zero cells (dead space); with Integrate/Collect Events you do not. Edge Effects : Hot Spot Analysis visits each weighted point and computes a local mean (based on the target feature and its nearby neighbors) and compares it to the global mean (based on all features in the dataset). You will specify a fixed distance band to indicate which features to consider "neighbors". You can think of this distance band as a circular window that moves around the study area, stopping at each weighted point to compute the local mean for the features that fall within the window. Some weighted points will have lots of neighbors, others will have few neighbors but this does not impact the result. If the global average number of incidents (based on all of the weighted points in your study area) is 3, then the expectation is that the average number of incidents anywhere on the map is 3. Compute the average number of incidents per point for just the points in the north, or for just the points in the center, or just the bottom... the expectation is that the average number of incidents per weighted point feature will be 3 everywhere in the study area. It doesn't matter if a feature has 10 or 20 neighboring features because in the end we are comparing the local *average* to the global average. When we get local mean values that are much higher than expected, we have a hot spot. When the local mean is much lower, we have a cold spot. The edge effect for the Gi* statistic (hot spot analysis), then, is not an undercount problem at all. The only bias is that when a feature has very few neighbors the local mean that gets computed is based on less information than for a feature with lots of neighbors. I hope that makes sense. Street Network : If you have a street map for your study area, you can create distance relationships based on your road network. You would: 1) Create a spatial weights matrix file (.swm) using the Generate Network Spatial Weights tool 2) Select Get Spatial Weights From File for the Hot Spot Analysis Conceptualization of Spatial Relationships parameter 3) Specify the .swm created in step 1 for the Hot Spot Analysis Spatial Weights Matrix File parameter. If you don't have a street feature class, using Manhattan Distance should still provide a better solution than Euclidean Distance for your urban study area. Typical biases : You can account for typical biases associated with incident data by running hot spot analysis on rate values rather than count values. If you run hot spot analysis on the raw aggregated incidents you are asking the question: "where do we have lots of incidents?" If you run hot spot analysis on a rate (like incidents per person or incidents this week per incidents all year) you are asking: "where do we have a more than expected number of incidents given <some bias like population or typical patterns represented by yearly incidents>. In order to get the denominator for the rate values, you will need to aggregate the incidents to a consistent set of polygon boundaries like administration units (census blocks). You would use Spatial Join to count the number of incidents within each polygon, then calculate the rate as the number of incidents divided by population, yearly incident rates, etc. I hope this is helpful. Best wishes, Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
12122012
10:59 AM

0

0

81

POST

A bit more again. I found another problem in the Ripley�??s K function tool associated with the confidence envelope. It is most apparent when the study area is much larger than the points being analyzed, but could also show up with the Minimum Enclosing Rectangle option if the distribution of the points is not very rectangular. Unfortunately, I found this bug too late to get the fix into 10.1 (not yet released) or into 10.0 service pack 5. Consequently, I'm attaching a file that fixes this problem for ArcGIS 10.0 (it also fixes the issue described earlier relating to "simTable"). This fix will only work for ArcGIS 10.0. Here are the instructions for installing the fix: 1) Navigate to your <ArcGIS>\Desktop10.0\ArcToolbox\Scripts folder. 2) Rename the KFunction.py file (to something like KFunctionOrig.py < this is just in case �?�) 3) Copy the attached KFunction.py into that same Scripts folder 4) Run the K function as usual. Please feel free to contact me if you have any questions or concerns. My sincere apologies, Lauren Lauren M. Scott, PhD LScott@esri.com Esri Geoprocessing, Spatial Statistics
... View more
05212012
01:55 PM

0

0

36

POST

A bit more... we have identified a bug in the MultiDistance Spatial Cluster Analysis (Ripley's K Function) tool, for ArcGIS 10.0 only, when you select Simulate Outer Boundary Values for the Boundary Correction method and also elect to Compute a Confidence Envelope (sorry!). In this circumstance, you will notice that the observed L(d) values (the red line on the K Function graph) will have the appropriate (accurate) correction, but that the confidence envelope lines (the gray lines on the graph) will continue to droop because no correction is applied. I'm very sorry that we didn't catch this problem sooner! Fortunately, since almost all of the tools in the Spatial Statistics toolbox are written using Python, you have our source code and can correct the bug if you so choose. Below are instructions for making the correction (it involves changing one word in the source code). If you are not comfortable making this change, but need this fix, please contact me and I'm happy to send you the corrected Python script file. To make the correction yourself: 1) Navigate to the Scripts folder and locate the KFunction.py script: <ArcGIS>\Desktop10.0\ArcToolbox\Scripts 2) Create a backup copy of this script file (name the copy something like KFunctionSave.py)... this is just in case something goes wrong. 3) Open KFunction.py with any text editor (like Notepad, for example). Alternatively, from within ArcMap you can also just right click on the K Function tool (via the Catalog or the ArcToolbox pane) and select Edit to access the source code. 4) Locate the following section of code (at about line 517) and make the change indicated below (shown in red): #### Resolve Simulate Points #### if self.simulate: [INDENT]simTable = GAPY.ga_table() tempN = len(newTable) simID = self.maxID + 1 for i in xrange(tempN): [INDENT]row = newTable id = row[0] x,y = row[1] simTable.insert(id, (x,y), 1.0) if near[id] <= self.stepMax: [INDENT]nearX, nearY = nearXY[id] dX = nearX + (nearX  x) dY = nearY + (nearY  y) point = (dX, dY) inside = UTILS.pointInPoly(point, self.studyAreaPoly, tolerance = self.tolerance) if not inside: [INDENT]newTable.insert(simID, point, 1.0) < change "newTable" to "simTable" on this line: simTable.insert(simID, point, 1.0) newSimDict[simID] = id simID += 1 [/INDENT][/INDENT][/INDENT][/INDENT] Again, my sincere apologies for this error. Please contact me (or contact Tech Support) if you have any questions or concerns. Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics LScott@Esri.com
... View more
03132012
03:14 PM

0

0

36

POST

Hi Arina, What you will notice is that if you run the Hot Spot Analysis tool with row standardization or without row standardization, the results will be exactly the same. That's why that option is disabled. This ONLY applies to the Hot Spot Analysis tool. For Anselin Local Moran's I, row standardization will impact the Index value, but not the Z score. Row standardization has a definite impact on the other tools that include the Row Standardization parameter. When I first wrote the script for Gi* I stupidly added the Row Standardization parameter (not realizing it has no impact on the results). Unfortunately, once Esri releases a tool we cannot remove parameters (that's so we don't break existing workflows or models). The only thing I could do was to gray out that parameter, which I did. I'm very sorry for the confusion. Shoot 😞 I just checked the documentation and it does not explain this. I will correct this immediately. Thank you for posting your question. Again, my apologies, Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
03072012
10:18 AM

0

0

10

POST

Hi Jessica, Global Moran's I (the Spatial Autocorrelation tool), Hot spot Analysis (GetisOrd Gi*), and Cluster and Outlier Analysis (Anselin's Local Moran's I) require an analysis field. Any of these tools could be used to answer the questions: is there statistically significant clusters of high catchment and where are the statistically significant clusters of high catchmnet? For the presence/absence question, though, you have binary data (turtles found/not found) and that type of data isn't appropriate for those tools. Here are some suggestions for your analysis. I hope they're helpful! 1) My guess is (because you said you have a ton of zeros for catchment) that you might be able to tell just by looking at the map of presence/absence if there is clustering. Still, you may want to measure the intensity of that clustering so that you can track if it is increasing or decreasing over time, and/or want to understand at what spatial scales the clustering is most intense. To do this, here are some strategies (along with the gotchas associated with each method): * Run Incremental Spatial Autocorrelation on the point data with the analysis field set to number of turtles. You are answering the question: are the high catchment locations clustered? At what distances are they most clustered? How intense is the clustering? The line graph returned by the Incremental Spatial Autocorrelation tool could be compared to random trawls in the future to answer questions about whether the intensity of clustering is changing over time. * Run the K Function (no weight field) on just the points where turtles were present. This tool also creates a line graph showing the intensity of clustering. Elect to create a confidence envelope (start with 9 permutations: this is THE most computationally intense tool in the Spatial Stat toolbox... it will take a while to finish with 4500+ points... but you are only working with the points where at least one turtle was found, so maybe it won't be so bad). Because you are only using points where turtles were found, you are answering the question: are the turtle presence locations clustered? At what spatial scale are they most clustered? The K function is very sensitive to study area size and shape... you may want to use a convex hull (see the Minimum Bounding Geometry tool) to create a study area that encloses all of the trawl locations (including those where no turtles were found). You would then do a second analysis where you run the K Function using the catchment values as the Weight field. Create a single line graph (from the table output) with:  Expected line  Observed line for just the presence points (no weight field... only the points where at least one turtle was found)  Confidence envelope for just the presence points (no weight field... only the points where at least one turtle was found)  Observed line for the weighted presence points (catchment is the weight field... and you are only using nonzero points). Interpretation: = The expected line is what you would see if turtle presence were randomly distributed across the study area (the study area would be the convex hull based on all trawl points). = The observed line is the actual clustering of turtle presence. = Where the observed line extends beyond/outside the confidence envelope, the clustering (or dispersion) is statistically significant = The observed line for weighted presence shows the clustering of catchment values (high vs low catchment) beyond any clustering found in the presenceonly point locations. Comparing the observed weighted line to the observed presenceonly line (no weight) you can see how much the variation in catchment values impacts clustering... If the line for weighted looks a lot like unweighted than the *number* of turtles (variations in catchment) isn't necessarily clustered beyond the presence of turtles. Please also see the discussion of interpretation here: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_Multi_Distance_Spatial_Cluster_Analysis_Ripley_s_K_function_works/005p0000000s000000/ * If you are interested in a simple comparison of clustering over time (if more than one set of trawls will be taken), you can run the Average Nearest Neighbor tool on the presenceonly points. Use the AREA of the convex hull polygon (created using Minimum Bounding Geometry around all trawl points). Because the zscore computation for Average Nearest Neighbor is very sensitive to study area size, think of those zscores as descriptive only... if the study area remains fixed, however, it is valid to compare changes in zscore values over time and to say something about the clustering increasing or decreasing. * Overlay a fishnet grid over the data and use spatial join to count the number of presence points in each grid cell. Remove cells with counts of zero that fall outside of the convex hull (around all trawl points). Also remove cells with counts of zero where NO trawls were done (there are no trawl points). Run Global Moran's I on these counts to answer: do the locations with turtles (presence/absence) cluster spatially and is that clustering statistically significant? You could use Incremental Spatial Autocorrelation to find an appropriate distance band. If you don't get a peak, you will need to use some other criteria to determine an appropriate fixed distance band value. 2) Okay, on to hot spot analysis 🙂 Here you are answering the following questions: * Where are the statistically hot spots of high catchment? Where are the statistically significant cold spots of low catchment? Where are the spatial outliers? (high catchment surrounded by low catchments or vice verse). a) Decide the scope of your question. You mention that there are a ton of zeros. The hot spot tool works by (conceptually) comparing the local mean (the average catchment for a feature and its neighbors within the fixed distance specified) to the global mean (the mean for all points) and deciding if that difference is statistically significant or not (based on variance and number of points). With a ton of zeros, pretty much ALL nonzero points will show up as hot spots. One very nice thing about the Hot Spot Analysis zscores, however, is that the larger the zscore, the hotter the hot spot (same for cold spots... the more negative, the colder the cold spot)... just be sure to only compare statistically significant zscores. Alternatively, you can remove all points with zero values and only analyze the nonzero points. You are then answering a different question, though: of those locations where you found turtles, where are the statistically significant hot spots? Where are the statistically significant cold spots? b) You can use the Incremental Spatial Autocorrelation tool to find a fixed distance value. If you have points that are outliers (far from all the other points)... well, let me know and I will explain a strategy for dealing with those so that you don't use a distance band that is too large. c) Run hot spot analysis using the catchment values as your analysis field and the distance you got from Incremental Spatial Autocorrelation. Creating a Spatial Weights Matrix up front is not needed (it used to improve performance for larger datasets, but doesn't really make that much of a difference with ArcGIS 10.0 and beyond... it does provide a good strategy for dealing with spatial outliers, though  when you have points that are far away from the herd). d) Run Cluster and Outlier analysis to find spatial outliers (those features whose catchment values are very different from surrounding catchment values): = HH is a high catchment surrounded by other high catchments (a hot spot) = LL is a low catchmetn surrounded by other low catchments (a cold spot) = HL is a high surrounded by lows = LH is a low surrounded by highs Note: you may see some differences between the hot/cold spots returned for the Hot Spot Analysis tool and the Cluster/Outlier Analysis tool... this is because the math is just a little different. I would either use the two results to come up with a concordance or I would use the results from Hot Spot Analysis to define my statistically significant hot/cold spots and the results from the Cluster and Outlier Analysis to define spatial outliers. Have fun! I hope this is helpful 🙂 Very best wishes, Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
03062012
09:46 AM

0

0

14

POST

Hi Laurence, Thanks for your question. There are a couple different reasons that the confidence envelope may not follow the expected line. 1) Differences between weighted and unweighted K function. When you run K function just on your point features (no weight field), the confidence envelope will tend to follow the Expected Blue line. The confidence envelope is created by taking your point features and (conceptually) throwing them down into your study area (a rectangle if you select minimum enclosing rectangle, otherwise the polygon feature you provide). It repeats this random process of throwing down your points, letting them fall where they may within the study area, for 9, 99, or 999 times. Each time it computes the K function value for all distances and the lower confidence line is derived from the lowest observed L(d) values; the upper confidence line is derived from the largest L(d) values. If the study area is simple (rectangle, circle), the confidence envelope will enclose the expected line (but see #2 and #3 below). When you run the K function with a Weight Field, the confidence envelope will tend to follow the Observed L(d) line (the red line). In this case the confidence envelope is created by throwing down the feature values (the weights) onto the existing feature locations. The locations themselves remain fixed, only the weights associated with the features are randomly redistributed for 9, 99, or 999 permutations. Because the spatial distribution of your points restrict where the values can land, the confidence envelope follows the observed L(d) line showing you the range of outcomes given the fixed location of your features. 2) Boundary correction. The K function works by counting all feature pairs within a given distance of each feature. When you specify NONE for the Boundary Correction method, this counting process is biased near the edges/boundaries. Imagine a circle representing the distance where pairs will be counted. When that circle overlays a point/feature near an edge, a portion of the circle will fall outside the study area where there are no points.... the counts will be smaller because there are fewer pairs within the circle. If there really are no points/features outside the study area, this drop in clustering at increasing distances is valid. If the boundaries are an artifact, you should correct for this undercounting bias by selecting a Boundary Correction method. 3) Study area size. The K Function is one of two tools in the Spatial Statistics Toolbox that is VERY (VERY) sensitive to study area size (the other tool is Average Nearest Neighbor). Imagine a cluster of points enclosed by a very, very tight study area... with that configuration, the pattern appears dispersed. Now imagine that same cluster of points enclose by a very large study area (so the cluster is at the middle with vast space all around it)... now the points would definitely appear clustered. For a graphic, please see: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Multi_Distance_Spatial_Cluster_Analysis_Ripley_s_K_Function/005p0000000m000000/ (About the 12th usage tip that starts: "The kfunction statistic is very sensitive to the size of the study area."). 4) Study area shape. In #1 above, I described how the confidence envelopes are constructed. In essence, features are pitched onto your study area, each feature landing where it may. When you have a very convoluted study area, this can impact where features are allowed to land. Hmmm... okay imagine a square study area with two long skinny arms, two long skinny legs, and a head 🙂 Features that fall into the arms and legs will have fewer neighbors because the study area itself doesn't allow many features to fall into the skinny parts... (does that make sense)? But this kind of thing can also happen if you elect the Minimum Enclosing Rectangle study area when your features aren't very rectangular. Imagine a set of features randomly distributed into a circle. Then imagine a rectangular study area around it. In the corners of the study area there will be no features. When the K function starts counting pairs near those corners, the pair counts will drop. This can result in a drooping confidence envelope for weighted K function. I hope this helps. If you still have questions, please feel free to contact me. I am happy to look at your data and evaluate the results to see why you might be seeing the drooping confidence envelope even when you apply a boundary correction method. Best wishes, Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics LScott@esri.com
... View more
03062012
07:33 AM

0

0

36

POST

Hi Lauren, You will get the same results from the Spatial Autocorrelation tool using either the residuals or the standardized residuals, so use either. Regression does sound like a good strategy for your analysis. Please check out our resources page which includes a free 1 hour web seminar on regression and also tutorials that will walk you through an analysis start to finish. The tutorial will take you about an hour to finish. We also have a sample script called Exploratory Regression that can help you find a properly specified model. It works with ArcGIS 10.0 (but it will be core functionality with ArcGIS 10.1... which will likely release this summer). Exploratory Regression works similar to Stepwise Regression, but instead of just looking for models with high Adj R2, it looks for models that meet all of the requirements of the OLS method (including checking for spatial autocorrelation in your model residuals). Unfortunately, GWR does not have the strong diagnostics to tell you if you have a properly specified model or not... and if you are missing key explanatory variables, you can't really trust model coefficients. Consequently, we always recommend that you start with OLS, find a properly specified model, then move to GWR. This is all explained in the tutorial and also in the documentation that comes with the Exploratory Regression tool. To download Exploratory Regression, the free video seminar, the tutorial and other resources, please check out: www.esriurl.com/spatialstats (for Exploratory Regression look for "Supplementary Spatial Statistics"). I hope this helps! Best wishes, Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
02082012
08:04 AM

0

0

10

POST

Hi, You have 20 hospitals, but lots of data over time at each hospital, is that correct? I ask because for most statistics it is important to have at least 30 observations in order to trust your results. We have lots of resources to help you learn more about hot spot analysis in ArcGIS. Please check out the short videos, free web seminars, and tutorials at www.esriurl.com/spatialstats At present, the most straightforward way to do a space time hot spot analysis is to create time snapshots of your data, run hot spot analysis on each time frame, then present the results via animation or small map multiples (creating a map for each time period and presenting them together). Alternatively, you could create a custom spatial weights matrix file that links features based on both spatial and temporal proximity. This would be a bit of work. At 10.1 (the next release of the ArcGIS software), our Generate Spatial Weights Matrxi tool will allow you to create those spacetime relationships very easily and then use the resultant spatial weights matrix file to perform spacetime hot spot analysis. If you can get into the beta program for ArcGIS 10.1, you could get access to that functionality now (??) If you are interested in constructing the spacetime spatial weights matrix manually, here is some information about the file format. You would want features that were within the same space window (like at the same or a nearby hospital) AND within the same time window (malaria count values within 3 days of each other, for example) to be assigned a weighting of 1... otherwise the weighting should be 0. Once you create the spacetime spatial weights matrix file, you would simply use it when you run the hot spot analysis tool. Please let me know if you need additional information. http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Modeling_spatial_relationships/005p00000005000000/ < please scroll to the bottom of the document where it says Spatial Weights Matrix File (.swm) I hope this is helpful, Please let me know if I can provide additional information. Lauren Lauren M Scott, PhD Esri Geoprocessing and Analysis, Spatial Statistics
... View more
12192011
10:31 AM

1

17

56

POST

Hi Jessica, Interesting research! Just curious, how did you decide where to trawl? Are the locations based on a random sampling scheme? I ask, because if there are any biases in your sampling scheme (you only collected turtles where it was convenient, or where you knew you would find them, etc... vs systematically, or using some kind of random sampling scheme), this may impact how you can interpret your results. I'm also curious about what motivates your questions: Is there clustering of turtles? Is there clustering of high catches? How is knowing that information helpful to your broader research (I ask because it is interesting to me, but also because sometimes it impacts how you set your data up for analysis)? When you say "Is there clustering of turtles?" I'm wondering how that differs from "Is there clustering of high catches?" Are you looking at two different datasets? Are you considering presence/absense (rather than the number of turtles found at each point) for the "clustering of turtles" part of the analysis? I look forward to learning more about your research and will happily help if I can. Best wishes, Lauren Lauren M. Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
12192011
10:13 AM

0

0

14

Online Status 
Offline

Date Last Visited 
11112020
02:23 AM
