Performing Cluster analysis on Soil samples for Mineral Exploration

Discussion created by khaledmd on Nov 14, 2011
Latest reply on Dec 5, 2011 by lscott-esristaff
I've been trying to perform a cluster analysis on soil/till samples that were collected for mineral exploration. Basically, I have data that spans throughout Canada. I have a field in the data that indicates the total number of indicator grains (called 'Total_Positives') found in each sample. My intention is to find statistically significant clusters. I've read through the help documentation and watched some of the training Videos.

Here's is what I've done:

(1) I ran Morans I Spatial Autocorrelation tool at different distance bands in order to find the optimum distance band of clustering by attempting to find the highest Z-score values by distance. I used the 'Total_Positives' field as my Input field. As suggested in the help documentation, I've used "Fixed Distance Band" as the conceptualization.

(2) Using the distance band corresponding to the highest Z-score value, I ran the Getis-Ord-Gi* Hot Spot analysis tools.

The problems I have encountered are:

(a) When I run the Morans I Spatial Autocorrelation, at various distance bands, I get a message "xx features had no neighbors which generally invalidates the statistical properties of a test." The number of features with no neighbors decreases with increasing distance bands. My question is that, does this invalidate the calculated Z-score values and can I still use them to determine the optimum distance band for my clustering?

(b) The sampling density is quite variable all over Canada. Does this variation have any effect on my clustering calculation? For example, the sample spacing varies between project areas. As we haven't sampled in detail all over Canada, the actual sample locations cause some clustering which is not clustering of high values that we are searching. How can I remove the effect of variable sampling density in my cluster analysis assuming that it would have some effect on my cluster analysis?

(c) The Z-scores values by distance are highly variable and do not show a gradual increase to a maximum and then gradual decline. However, all z-score values are very high and higher than 2.85. Many of them are over 80. With some datasets, i was getting z-score values over 250. Are these values meaningful? In all cases, the p-values is shown as 0.