For my masters thesis, I'm using the High/Low Clustering (Getis-Ord General G) tool to analyze similarity in spatial value distribution between a raster showing accumulation of anomalously-low freezing temperatures sustained during Spring 2007, and a timeseries of MODIS EVI raster from the 2007 growing season showing vegetative productivity before and after the freeze. These two rasters have been combined into a timeseries of "association indexes" (two sets of timeseries, actually, one for freeze vs. average EVI, and another for freeze vs. 2007 EVI anomaly) and I have applied the Getis-Ord tool to each of these rasters.
The statistics returned for each of these rasters are Observed General G, Expected General G, Variance, Z-Score, and p-value. In all the literature I've been able to obtain about Getis-Ord, which includes Lauren's explanations from the old forum, as well as the Resource Center documentation on the tool published by ESRI, and the original Getis-Ord article from 1992, I haven't been able to find anything that discusses the meaning of the General G statistic. All I've been able to find is that you use the Z-Score and p-value to assess the degree of clustering vs. dispersion for high/low values. And my results make sense when interpreted using only the Z-Score, despite the observed and expected General G values returned were 0.000004 and 0.000005 through the whole set, but with Z-scores ranging from -5 to 40.
So, my question is, what exactly does the General G statistic mean? I know it typically ranges from 0-1, but what can I understand directly from the General G statistic itself, that I cannot interpret indirectly from its associated Z-Score?
Thanks very much to anyone who can offer any help or insight! It is kindly appreciated!
Hi Karl, There really isn't any way to interpret the General G index directly. If you look at the math for the General G equation (http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_High_Low_Clustering_Getis_Ord_Gene...), you see that the numerator is the local product (a running sum of what you get if you multiply each feature's value by all its neighbor's values, for all features). The denominator is the global product (the sum of all features with each other). If we shuffle up the values so that all the high values are next to each other, the numerator gets bigger. If we shuffle them up so that all the low values are together (but the high values remain random), the numerator gets smaller. The index is simply the ratio of the local product to the global product. But the index will be very different depending on the magnitude of the values involved (i.e., all values range from 0.01 to 0.05 vs. all values range from 1234500 to 1234600), and depending on your conceptualization of spatial relationships (if everyone is a neighbor of everyone else, the numerator gets larger; a polygon contiguity conceptualization with lots of features, however, will result in a small numerator and very large denominator). So there isn't a fixed interpretation for the index value itself. The rest of the math for the General G involves figuring out the expected index: the expected index is what that ratio would look like if the values were randomly distributed among your features. Next the tool compares the expected to the observed index values. It is the relationship between the observed/actual and the expected values that determines if the general G index is significant or not. You can think of the p-value as the answer to this question: what are the chances that my values would be arranged as they are, if the spatial processes promoting the observed spatial pattern were random? Small p-values mean the pattern would be very unlikely if the processes were random. That's why we focus on looking at the z-score and p-value when we talk about interpreting General G results.
I sure hope this helps! Best wishes with your research! Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics