Hi- I am looking to map a dataset with very skewed input data. We have over 500 villages and cancer rates. Over 300 villages have zero cases- so rate=0. Is this ok to use with Moran's I and LISA? or should we use the natural log of the rates when looking for high/low clusters? Because it is using a z-score and a p-value we were thinking out data should be normally distributed. We have also thought about imputing a VERY small number (0.01) case for each village with zero rates and calculate the rate and then log it. *Looking* at the data it appears that the results using the natural log of the rate captures the data better than using the direct rate.
Any thoughts? Has anyone encountered this issue before?
don't add small anything to data to overcome a transformation limitation
It skewed because of the zero?
Just confirming the zero values are an observation and not a substitute for "nodata".
You have "rates" What about the absolute values? What is their distribution? (your distribution could completely change if actual values were used)