Cluster detection!

945
3
09-18-2012 06:34 AM
NTHOESELELETOAO
New Contributor
Hello,

I have a sample of 142 households as points in a neighbourhood. One person from each household has been tested for certain diseases and the results have been either positive or negative. Now I am trying to detect clusters of negatives and positives of all the diseases over the study area. Which tool is best suited for this? I tried using all tools under analyzing Patterns and mapping clusters but I get different patterns from each tool for one disease.

Please help:confused:
0 Kudos
3 Replies
ThomMackey
New Contributor III
Hi,

When using the Mapping Clusters tools, it's important to have a question in mind before you choose which tool to use. For example, if your question is "which households have an illness rate which is higher than we'd expect, compared to their neighbours?" you'd use the Hotspot Analysis tool. If your question was "which households have higher (or lower) values than we'd expect, and are surrounded by households with lower (or higher) values than we'd expect?" then you might use the Cluster and Outlier Analysis tool.

I'd suggest you think specifically about exactly what you want to ask of your data, and then refer to the documentation to find which tool answers your question. It's entirely possible that you want to ask a question which isn't directly answerable by any of the tools in the spatial statistics toolbox!

From what you've described, if I were you I'd be using the Cluster and Outlier Analysis tool. That will tell you where there are groups of low values, or a high value surrounded by low values.
0 Kudos
JeffreyEvans
Occasional Contributor III
A similar problem, and answer, was recently posed in this thread: http://forums.arcgis.com/threads/67233-Distance-from-one-set-of-xy-coordinatesto-another-set-of-xy-c...

I want to chime in here because ERSI is not being very good at providing statistical expectations around their spatial statistical tools. When I refer to "expectations" I mean the data requirements, assumptions and expected distributional output for a specified a model. There are critical considerations in using a given method. Where I wholeheartedly agree with Thom in having a testable hypothesis (question) and not just throwing methods at the wall I should point out that non of the tools available in ArcGIS are suitable for the question posed in the original post. In reading the original post the problem is stated as "positives/negatives" which represents a binary problem. The cluster/outlier tools in ArcGIS are specifically the local Moran's-I (LISA) and the Getis-Ord G* statistics. The expectation in the input data for both of these models are a normal, continuous distribution (because both equations use the mean). This effectively rules out specifying the model(s) using binary input data.

We rarely point out assumptions in autocorrelation and point pattern statistics but they do exists. In Moran's-I and Geary's-C (global and local) the data is expected to be normally distributed. Although, non-normal data can be accounted for in a Monte Carlo simulation. The specification of the spatial weights matrix (Wij) can also have profound effects in the results. In point pattern statistics (e.g., Ripley's-K) the assumption is that the data is homogeneous, meeting model-II stationarity assumptions. This is because the null assumption, that is tested against to infer spatial clustering, is following a Poisson CSR (Complete Spatial Randomness) process. If the data exhibits nonstationarity the the null no longer holds and becomes an intensity function effectively providing invalid results. So, statistical ramblings aside, in addition to a well defined hypothesis it is also quite important to look and model assumptions as well.
0 Kudos
VladimirKekez
New Contributor
Hello everybody!

I have one question!

Where we can find Geary's C in Spatial Statistic tool and how we can calculate it?
If not there maybe in PYSAL toolbox and again what kind of weight matrix it needs?

Thanks! 🙂
0 Kudos