Average Nearest Neigbor - Influence of sample size?

JakobEder · ‎10-21-2014

Hello.

I have a simple question about the Average Nearest Neighbor Tool in the Spatial Statistics Toolbox. I understand, that the result is heavily influenced by the size of the study area. This shouldn't be a problem for me, since I am always studying the same city. I want to compare five different samples (incident data) to each other by looking the z- and p-scores.

The problem is, that the biggest sample contains 500 incidents and the smallest one 100 incidents. Hence, my question is: Is the number of incidents also influencing my results? And comparisons in that case are not valid?

At least it seems to me as if this is the case - higher number of incidents causes higher clustering...

Thanks for your help,

Jakob

IbrahimMohammed_Ahmed · ‎10-21-2014

F

DanPatterson_Retired · ‎10-21-2014

You did check the equations used for this analysis in the help files which shows how Do and De vary with by n and root(n/A) respectively?

JakobEder · ‎10-21-2014

Thanks for your answer. I looked again at the equations and I understand know, that the number of features has a certain influence to the NNA-values. Nevertheless, I have to admit that I am not able to estimate the degree of the distortion by just looking at the equations - could you maybe provide a non-technical summary?

Is there a way to deal with this problem?

DanPatterson_Retired · ‎10-21-2014

I am not sure what you mean by distortion?

JakobEder · ‎10-21-2014

Maybe I should rephrase: I want to compare different samples within a city. I have the impression, that the sample size influences (distortes) my results, so I am not sure if this comparison of the z-scores is valid. Is it? Meaning - is a higher z-score only caused by the actual degree of clustering or also by the number of incidents included in the analysis?

Thanks.

DanPatterson_Retired · ‎10-21-2014

Jakob

Give me 30 mins...putting together a demo so you can see and explore the parameters yourself

DanPatterson_Retired · ‎10-21-2014

Sorry Jakob...I got carried making it into something that could be used as a demo in my class...I stopped. In the attached file, I have the html results for a case that I produced with 4 evenly spaced points within its own convex hull. the sum of the distances was 400 m (ie each point forms a square 100m between points), n = 4 and area = 10,000. The results of the NN analysis are included in the html file. I then proceeded to create a small python script which you can simply load into your python IDE (not the arcmap one) check it, then run it. What it does is replicates the single result for the configuration, then it does a loop keeping n and sum of distances the same, but increases study area. The affect can be readily seen on the outputs of the other parameters as described in the help files. You can play with this to suit your purposes. Hope this helps

JakobEder · ‎10-23-2014

Dear Dan,

Thank you for your efforts. I really appreciate it. I think I understand (hopefully) now, how Average Nearest Neighbor works and how the parameters affect the z-score.

So, just to conclude, is it valid to say: "The clustering according to ANN depends on the area, on the distances between the features and on the number of features."?

DanPatterson_Retired · ‎10-23-2014

I would say so...