Select to view content in your preferred language

Average Nearest Neigbor - Influence of sample size?

4513
9
10-21-2014 11:13 AM
JakobEder
Deactivated User

Hello.

 

I have a simple question about the Average Nearest Neighbor Tool in the Spatial Statistics Toolbox. I understand, that the result is heavily influenced by the size of the study area. This shouldn't be a problem for me, since I am always studying the same city. I want to compare five different samples (incident data) to each other by looking the z- and p-scores.

The problem is, that the biggest sample contains 500 incidents and the smallest one 100 incidents. Hence, my question is: Is the number of incidents also influencing my results? And comparisons in that case are not valid?

 

At least it seems to me as if this is the case - higher number of incidents causes higher clustering...

 

Thanks for your help,

 

Jakob

0 Kudos
9 Replies
IbrahimMohammed_Ahmed
Deactivated User

F

0 Kudos
DanPatterson_Retired
MVP Emeritus

You did check the equations used for this analysis in the help files which shows how Do and De vary with by n and root(n/A) respectively?

0 Kudos
JakobEder
Deactivated User

Thanks for your answer. I looked again at the equations and I understand know, that the number of features has a certain influence to the NNA-values. Nevertheless, I have to admit that I am not able to estimate the degree of the distortion by just looking at the equations - could you maybe provide a non-technical summary?

Is there a way to deal with this problem?

0 Kudos
DanPatterson_Retired
MVP Emeritus

I am not sure what you mean by distortion?

0 Kudos
JakobEder
Deactivated User

Maybe I should rephrase: I want to compare different samples within a city. I have the impression, that the sample size influences (distortes) my results, so I am not sure if this comparison of the z-scores is valid. Is it? Meaning - is a higher z-score only caused by the actual degree of clustering or also by the number of incidents included in the analysis?

Thanks.

0 Kudos
DanPatterson_Retired
MVP Emeritus

Jakob

Give me 30 mins...putting together a demo so you can see and explore the parameters yourself

0 Kudos
DanPatterson_Retired
MVP Emeritus

Sorry Jakob...I got carried making it into something that could be used as a demo in my class...I stopped.  In the attached file, I have the html results for a case that I produced with 4 evenly spaced points within its own convex hull.  the sum of the distances was 400 m (ie each point forms a square 100m between points), n = 4 and area = 10,000.  The results of the NN analysis are included in the html file.  I then proceeded to create a small python script which you can simply load into your python IDE (not the arcmap one) check it, then run it.  What it does is replicates the single result for the configuration, then it does a loop keeping n and sum of distances the same, but increases study area.  The affect can be readily seen on the outputs of the other parameters as described in the help files.   You can play with this to suit your purposes.  Hope this helps

JakobEder
Deactivated User

Dear Dan,

Thank you for your efforts. I really appreciate it. I think I understand (hopefully) now, how Average Nearest Neighbor works and how the parameters affect the z-score.

So, just to conclude, is it valid to say: "The clustering according to ANN depends on the area, on the distances between the features and on the number of features."?

0 Kudos
DanPatterson_Retired
MVP Emeritus

I would say so...

0 Kudos