I have a point dataset for sea turtles. Each point represents a single trawl where 0-10 turtles were collected. There are 4500+ points, most of which have 0 catch. The study area encompasses the southeast. We want to know if 1.) there is clustering of turtles and 2.) if there is clustering of the high catch numbers. We want to know this for the overall area and by region (there are four smaller regions). 1.) Would Moran's I and Hot Spot Analysis be the best tools in the spatial stats toolset to answer these questions?
We tried aggregating the data into 1 x 1 min grids, but when we ran the 'Incremental Spatial Autocorrelation' the Z score never peaked and kept going up and up. We ran it again on the
individual points, and got a peak. 2.) Is it appropriate to run either of these tools on incident data or would it be better to try to find a finer scale on which to aggregate?
Some of the Hot Spot Analysis information I read suggested that for datasets with 3000+ features, it would be advisable to construct a spatial weights matrix file. 3.) How is this better than choosing one of the other spatial relationships listed (e.g. fixed distance)?
Just curious, how did you decide where to trawl? Are the locations based on a random sampling scheme? I ask, because if there are any biases in your sampling scheme (you only collected turtles where it was convenient, or where you knew you would find them, etc... vs systematically, or using some kind of random sampling scheme), this may impact how you can interpret your results.
I'm also curious about what motivates your questions:
Is there clustering of turtles?
Is there clustering of high catches?
How is knowing that information helpful to your broader research (I ask because it is interesting to me, but also because sometimes it impacts how you set your data up for analysis)?
When you say "Is there clustering of turtles?" I'm wondering how that differs from "Is there clustering of high catches?" Are you looking at two different datasets? Are you considering presence/absense (rather than the number of turtles found at each point) for the "clustering of turtles" part of the analysis?
I look forward to learning more about your research and will happily help if I can.
Lauren M. Scott, PhD
Geoprocessing, Spatial Statistics
Thanks for responding to my e-mail. Here's some more information about the research.
Yes, the trawling locations are randomly chosen within a trawl zone (limited by depth).
We want to know whether there is a clustering of turtles (presence/absence) AND if there is clustering, where the clustering of high catch numbers is located.
Any suggestions you can give would be much appreciated.
Global Moran's I (the Spatial Autocorrelation tool), Hot spot Analysis (Getis-Ord Gi*), and Cluster and Outlier Analysis (Anselin's Local Moran's I) require an analysis field. Any of these tools could be used to answer the questions: is there statistically significant clusters of high catchment and where are the statistically significant clusters of high catchmnet? For the presence/absence question, though, you have binary data (turtles found/not found) and that type of data isn't appropriate for those tools. Here are some suggestions for your analysis. I hope they're helpful!
1) My guess is (because you said you have a ton of zeros for catchment) that you might be able to tell just by looking at the map of presence/absence if there is clustering. Still, you may want to measure the intensity of that clustering so that you can track if it is increasing or decreasing over time, and/or want to understand at what spatial scales the clustering is most intense. To do this, here are some strategies (along with the gotchas associated with each method):
* Run Incremental Spatial Autocorrelation on the point data with the analysis field set to number of turtles. You are answering the question: are the high catchment locations clustered? At what distances are they most clustered? How intense is the clustering? The line graph returned by the Incremental Spatial Autocorrelation tool could be compared to random trawls in the future to answer questions about whether the intensity of clustering is changing over time.
* Run the K Function (no weight field) on just the points where turtles were present. This tool also creates a line graph showing the intensity of clustering. Elect to create a confidence envelope (start with 9 permutations: this is THE most computationally intense tool in the Spatial Stat toolbox... it will take a while to finish with 4500+ points... but you are only working with the points where at least one turtle was found, so maybe it won't be so bad). Because you are only using points where turtles were found, you are answering the question: are the turtle presence locations clustered? At what spatial scale are they most clustered? The K function is very sensitive to study area size and shape... you may want to use a convex hull (see the Minimum Bounding Geometry tool) to create a study area that encloses all of the trawl locations (including those where no turtles were found).
You would then do a second analysis where you run the K Function using the catchment values as the Weight field. Create a single line graph (from the table output) with:
- Expected line
- Observed line for just the presence points (no weight field... only the points where at least one turtle was found)
- Confidence envelope for just the presence points (no weight field... only the points where at least one turtle was found)
- Observed line for the weighted presence points (catchment is the weight field... and you are only using non-zero points).
= The expected line is what you would see if turtle presence were randomly distributed across the study area (the study area would be the convex hull based on all trawl points).
= The observed line is the actual clustering of turtle presence.
= Where the observed line extends beyond/outside the confidence envelope, the clustering (or dispersion) is statistically significant
= The observed line for weighted presence shows the clustering of catchment values (high vs low catchment) beyond any clustering found in the presence-only point locations. Comparing the observed weighted line to the observed presence-only line (no weight) you can see how much the variation in catchment values impacts clustering... If the line for weighted looks a lot like unweighted than the *number* of turtles (variations in catchment) isn't necessarily clustered beyond the presence of turtles. Please also see the discussion of interpretation here: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_Multi_Distance_Spatial_Cluster_Ana...
* If you are interested in a simple comparison of clustering over time (if more than one set of trawls will be taken), you can run the Average Nearest Neighbor tool on the presence-only points. Use the AREA of the convex hull polygon (created using Minimum Bounding Geometry around all trawl points). Because the z-score computation for Average Nearest Neighbor is very sensitive to study area size, think of those z-scores as descriptive only... if the study area remains fixed, however, it is valid to compare changes in z-score values over time and to say something about the clustering increasing or decreasing.
* Overlay a fishnet grid over the data and use spatial join to count the number of presence points in each grid cell. Remove cells with counts of zero that fall outside of the convex hull (around all trawl points). Also remove cells with counts of zero where NO trawls were done (there are no trawl points). Run Global Moran's I on these counts to answer: do the locations with turtles (presence/absence) cluster spatially and is that clustering statistically significant? You could use Incremental Spatial Autocorrelation to find an appropriate distance band. If you don't get a peak, you will need to use some other criteria to determine an appropriate fixed distance band value.
2) Okay, on to hot spot analysis 🙂 Here you are answering the following questions:
* Where are the statistically hot spots of high catchment? Where are the statistically significant cold spots of low catchment? Where are the spatial outliers? (high catchment surrounded by low catchments or vice verse).
a) Decide the scope of your question. You mention that there are a ton of zeros. The hot spot tool works by (conceptually) comparing the local mean (the average catchment for a feature and its neighbors within the fixed distance specified) to the global mean (the mean for all points) and deciding if that difference is statistically significant or not (based on variance and number of points). With a ton of zeros, pretty much ALL non-zero points will show up as hot spots. One very nice thing about the Hot Spot Analysis z-scores, however, is that the larger the z-score, the hotter the hot spot (same for cold spots... the more negative, the colder the cold spot)... just be sure to only compare statistically significant z-scores. Alternatively, you can remove all points with zero values and only analyze the non-zero points. You are then answering a different question, though: of those locations where you found turtles, where are the statistically significant hot spots? Where are the statistically significant cold spots?
b) You can use the Incremental Spatial Autocorrelation tool to find a fixed distance value. If you have points that are outliers (far from all the other points)... well, let me know and I will explain a strategy for dealing with those so that you don't use a distance band that is too large.
c) Run hot spot analysis using the catchment values as your analysis field and the distance you got from Incremental Spatial Autocorrelation. Creating a Spatial Weights Matrix up front is not needed (it used to improve performance for larger datasets, but doesn't really make that much of a difference with ArcGIS 10.0 and beyond... it does provide a good strategy for dealing with spatial outliers, though -- when you have points that are far away from the herd).
d) Run Cluster and Outlier analysis to find spatial outliers (those features whose catchment values are very different from surrounding catchment values):
= HH is a high catchment surrounded by other high catchments (a hot spot)
= LL is a low catchmetn surrounded by other low catchments (a cold spot)
= HL is a high surrounded by lows
= LH is a low surrounded by highs
Note: you may see some differences between the hot/cold spots returned for the Hot Spot Analysis tool and the Cluster/Outlier Analysis tool... this is because the math is just a little different. I would either use the two results to come up with a concordance or I would use the results from Hot Spot Analysis to define my statistically significant hot/cold spots and the results from the Cluster and Outlier Analysis to define spatial outliers.
I hope this is helpful 🙂
Very best wishes,
Lauren M Scott, PhD
Geoprocessing, Spatial Statistics
This document was generated from the following discussion: Performing spatial stats on turtle trawl data