In ArcGIS 10's help under Hot Spot Analysis, it says: "The Input Field should contain a variety of values. The math for this statistic requires some variation in the variable being analyzed; it cannot solve if all input values are 1, for example. If you want to use this tool to analyze the spatial pattern of incident data, consider aggregating your incident data."

My question is how much variation is needed? I want to identify clusters of incidents using a point dataset, but the majority of my points (about 90%) only have 1 incident. If I don't aggregate, how will this impact my results? If I should aggregate the points (e.g., using a fishet grid), how much variation should I be aiming for?

Thank you for any help/suggestions!

My question is how much variation is needed? I want to identify clusters of incidents using a point dataset, but the majority of my points (about 90%) only have 1 incident. If I don't aggregate, how will this impact my results? If I should aggregate the points (e.g., using a fishet grid), how much variation should I be aiming for?

Thank you for any help/suggestions!

This is a really good question, and one that we actually get quite a bit. Glad the answer will be here on the forums now!

To start I just want to give a bit of context about why variation in your analysis variable is so important. When doing a hot spot analysis, the resulting z-score is a function of both the number of features in your dataset and the variance in the values associated with those features. What that means, is that if you've got hundreds of features with a value of 1, and then just a couple with values like 2 or 3, those 2's and 3's are likely to show up as hot spots, even though they aren't necessarily that "hot"...in comparison to all of the 1's they are actually high values. Hopefully that makes sense and provides a bit of context. It's not that the resulting z-score will be wrong, it is just that it may not be answering the question that you're asking. If you want those 2's and 3's to show up as hot spots, then go ahead and use the data that you've got. But, if that doesn't sound like it is going to show you the patterns that you're interested in, you probably want to aggregate.

There are two main ways we suggest aggregating, using the fishnet (or polygons that make sense for your analysis...census blocks, etc.), and using a method that uses the Integrate tool and the Collect Events tool. Both of tehse methods (with the exception of using pre-existing polygons), require you to figure out the right scale to use for aggregation. For a fishnet this is the size of the fishnet, and for the integrate-collect events method this is the distance to use when integrating. There are a couple of ways to choose a good fishnet size or snapping distance.

In general, the goal is to identify a cell size that isn�??t so small that you end up with a lot of empty cells, but isn�??t too big that you lose too much information about the underlying spatial pattern of your points.

Some researchers suggest 2*the mean area per feature or:

Sq root [2* (Area/n)] where n is the number of incidents and Area is either the extent of your features or the AREA of your study area. You also want to make sure you remove cells that have zeros before doing your hot spot analysis. This same distance can be used as a snapping distance in the integrate tool.

Another way to choose a distance is using the Calculate Distance Band from Neighbor Count tool, in the Utilities Toolset in the Spatial Statistics toolbox. If you use a number of neighbors of 1, then the average distance as which features have 1 neighbor will be a good distance/cell size to use.

To learn more about the integrate-collect events method you can check out the Hot Spot Analysis Tutorial which walks through the process. You can learn more about all of the spatial statistics tools, with things like videos and tutorials and free training seminars, here: http://esriurl.com/spatialstats.

Hope this helps.

Lauren Rosenshein

Geoprocessing Product Engineer