I computed Anselin's Moran's I and am trying to understand how ArcGIS moves from the test statistic to the classification of HH, HL, LH, or LL in the COtype field. From what I can tell, the test statistic only indicates whether an observation is similar (HH, LL cluster-->high test statistic), or dissimilar (HL, LH-->low test statistic) to its neighbor. However, I do not see how the magnitude of the test statistic distinguishes between the different types of cluster or outliers. That is, can I understand just by looking at the test statistic/z-score whether an observation is part of a HH as opposed to a LL, or a LH as opposed to a HL, cluster? If not, how does the program fill in the COtype field?
Thanks for your question, it is a really good one! You are right that just looking at the z-scores for your features is not enough to determine with what COType that feature will be classified. For instance, a strong, negative z-score that is statistically significant is always an outlier, but whether it is classified as HL or LH is based on more information than just the z-score. And the same can be said for strong, positive z-scores and HH and LL classifications.
For each feature in the dataset we calculate the z-score, p-value, and local mean, and then for those features that are statistically significant we go on to determine the classifications. The global mean is the average of all of the analysis field values; the local mean is the average analysis field value for a target feature�??s neighbors. For outliers (strong, negative z-scores <-1.96) we compare the value of the target feature to the local mean. So, those features with values that are higher than the local mean are classified as HL, and those features with values that are lower than the local mean are classified as LH. For clusters of similar values (strong, positive z-scores >1.96) we compare the local mean to the global mean. Those features with local means that are higher than the global mean are classified as HH, and those with local means that are lower than the global mean as LL.
My understanding is that a local Moran's I is calculated by taking the deviation of each observation from the global mean times the sum of the spatial weights matrix multiplied by the deviation of each neighbor observation from the global mean:
This is very different form the formula posted by ESRI. I am in particular having trouble understanding 2 terms in ESRI's equation. The first is the denominator of the first term--where the deviation of x_i is "normalized" by the average weight minus the global mean squared. This is not intuitive to me. Second, the equation does not take into account the difference of neighbor observations from the global mean. Instead, in the second term, the weights matrix is multiplied by the deviation of the observation i from the global mean. This seems like it might be a typo?
Any insight you could provide as to where this formula is coming from is greatly appreciated! Thank you so much again for your help!