Hello,

I have started to play with the Spatial Statistics toolbox using three shape files:

1) Evenly-spaced point data (generated from Create Fishnet tool)

2) Random point data (generated from Create Random Points tool)

3) Real-world point data (results of research in Australia)

The outputs of (1) and (2) are rectangular in extent, constrained by the extent of my Australia (3) shapefile. I ran the Average Nearest Neighbor tool on all three datasets and received the expected results (based on visual examination):

1) Dispersed (with associated p-values and z-scores)

2) Random (ditto)

3) Clustered (ditto)

Next, I clipped (1) and (2) by the actual outline of Australia (3) and re-ran the first two analyses. This is where I noticed an interesting - but troubling - problem:

1) Dispersed (no problem)

2) Clustered (?!?)

The clipped dataset contains ~80% of the original points, each in their original, random location (on terrestrial Australia). Thus, they should still output as Random, but they output as Clustered.

I have replicated this problem using multiple shapefiles of random points. In each case of using the Average Nearest Neighbor tool, the full rectangle of random points gives a result of Random but clipping out the middle of the rectangle gives a result of Clustered. I have also created shapefiles of random points using the Constraining Feature Class (versus the Constraining Extent) option and achieved the same result (Clustered).

I'm hoping to understand why this would happen in order to understand the results of more advanced analyses (such as Ripley's K and Moran's I). Anyone have any ideas about this problem? Cheers,

Chris

I have started to play with the Spatial Statistics toolbox using three shape files:

1) Evenly-spaced point data (generated from Create Fishnet tool)

2) Random point data (generated from Create Random Points tool)

3) Real-world point data (results of research in Australia)

The outputs of (1) and (2) are rectangular in extent, constrained by the extent of my Australia (3) shapefile. I ran the Average Nearest Neighbor tool on all three datasets and received the expected results (based on visual examination):

1) Dispersed (with associated p-values and z-scores)

2) Random (ditto)

3) Clustered (ditto)

Next, I clipped (1) and (2) by the actual outline of Australia (3) and re-ran the first two analyses. This is where I noticed an interesting - but troubling - problem:

1) Dispersed (no problem)

2) Clustered (?!?)

The clipped dataset contains ~80% of the original points, each in their original, random location (on terrestrial Australia). Thus, they should still output as Random, but they output as Clustered.

I have replicated this problem using multiple shapefiles of random points. In each case of using the Average Nearest Neighbor tool, the full rectangle of random points gives a result of Random but clipping out the middle of the rectangle gives a result of Clustered. I have also created shapefiles of random points using the Constraining Feature Class (versus the Constraining Extent) option and achieved the same result (Clustered).

I'm hoping to understand why this would happen in order to understand the results of more advanced analyses (such as Ripley's K and Moran's I). Anyone have any ideas about this problem? Cheers,

Chris

This is a good question, and my initial reaction is that this does not sound like a bug (although I do understand your concern). The first part of your question involves clipping a dataset that was originally created using the Generate Random Points tool. Once you clip those random points, depending on the polygon that you're using to clip the points you may be imposing a structure on what were once randomly distributed points, which could lead to a clustered distribution.

In terms of the new dataset that you created using the Australia boundary, what does that constraining dataset look like? Is it one polygon representing Australia, or does it have multiple polygons? If it has multiple polygons (regions, counties, etc.), then Generate Random Points actually generates a user-specified random number of points in each one of those polygons. What that means is that if you have smaller polygons and larger polygons within that constraining dataset, then there will be 100 points (for example) in each one of the smaller polygons and 100 points in each one of the larger polygons. What that means is that within each individual polygon the features will be "random", but for the entire study area you will have imposed some definite clustering in those smaller polygons. So that's one thing to think about.

The other thing to think about, which is touched in a little bit on the documentation for Average Nearest Neighbor, is how sensitive the Average Nearest Neighbor (ANN) tool is to the study area or extent of your analysis. Essentially what ANN does is look at the average distance between each feature and its closest feature in relation to the area of the analysis and compare that to the distances between random features in a circle of the same area. So, for instance, the same exact distribution of points could be considered random or clustered depending on the extent/bounding geometry used for the analysis. For this reason, one of the ways that we recommend using ANN is actually for making comparisons between multiple distributions within the same study area. For instance, if you had points represnting the locations of various types of trees in Australia, you could use ANN to compare those distributions because the point locations/distributions would be changing, but the bounding geometry would stay the same. That isn't to say that you cannot use ANN for your purposes, it is just important to remember the impact that your bounding geometry has on your output.