Hi Peter,
Bounding Geometry: If you are using Integrate and Collect Events to create weighted points from incident data, there is no minimum bounding geometry. (The only two tools in the Spatial Statistics toolbox that use a bounding geometry are K Function and Average Nearest Neighbor). Because Hot Spot Analysis requires weighted points, you need to aggregate incident data. When you run Integrate (after you make a backup copy of your original input dataset), features within the distance you specify are snapped together. The input feature geometry is modified so instead of clusters of nearby features, you get stacks of coincident features. When you run collect events, you replace those stacks with a single point attributed with the number of incidents on the stack... so you get wieghted points. With the fishnet aggregation scheme the fishnet itself imposes a bounding geometry so you DO have to worry about zero cells (dead space); with Integrate/Collect Events you do not.
Edge Effects: Hot Spot Analysis visits each weighted point and computes a local mean (based on the target feature and its nearby neighbors) and compares it to the global mean (based on all features in the dataset). You will specify a fixed distance band to indicate which features to consider "neighbors". You can think of this distance band as a circular window that moves around the study area, stopping at each weighted point to compute the local mean for the features that fall within the window. Some weighted points will have lots of neighbors, others will have few neighbors but this does not impact the result. If the global average number of incidents (based on all of the weighted points in your study area) is 3, then the expectation is that the average number of incidents anywhere on the map is 3. Compute the average number of incidents per point for just the points in the north, or for just the points in the center, or just the bottom... the expectation is that the average number of incidents per weighted point feature will be 3 everywhere in the study area. It doesn't matter if a feature has 10 or 20 neighboring features because in the end we are comparing the local *average* to the global average. When we get local mean values that are much higher than expected, we have a hot spot. When the local mean is much lower, we have a cold spot.
The edge effect for the Gi* statistic (hot spot analysis), then, is not an undercount problem at all. The only bias is that when a feature has very few neighbors the local mean that gets computed is based on less information than for a feature with lots of neighbors. I hope that makes sense.
Street Network: If you have a street map for your study area, you can create distance relationships based on your road network. You would:
1) Create a spatial weights matrix file (.swm) using the Generate Network Spatial Weights tool
2) Select Get Spatial Weights From File for the Hot Spot Analysis Conceptualization of Spatial Relationships parameter
3) Specify the .swm created in step 1 for the Hot Spot Analysis Spatial Weights Matrix File parameter.
If you don't have a street feature class, using Manhattan Distance should still provide a better solution than Euclidean Distance for your urban study area.
Typical biases: You can account for typical biases associated with incident data by running hot spot analysis on rate values rather than count values. If you run hot spot analysis on the raw aggregated incidents you are asking the question: "where do we have lots of incidents?"
If you run hot spot analysis on a rate (like incidents per person or incidents this week per incidents all year) you are asking: "where do we have a more than expected number of incidents given <some bias like population or typical patterns represented by yearly incidents>.
In order to get the denominator for the rate values, you will need to aggregate the incidents to a consistent set of polygon boundaries like administration units (census blocks). You would use Spatial Join to count the number of incidents within each polygon, then calculate the rate as the number of incidents divided by population, yearly incident rates, etc.
I hope this is helpful.
Best wishes,
Lauren
Lauren M Scott, PhD
Esri
Geoprocessing, Spatial Statistics