Ripley's K Confidence Envelope doesn't follow the blue expected line

6211
18
09-15-2011 05:36 AM
Mareike_TabeaScheller
New Contributor
Hello!
I'm working for the first time with the Spatial Statistics tools.
I have point data of birds on an island. I want to calculate the K-Function, but the result graph always shows a Confidence Envelope that is under the blue expected line. Why is that? Shouldn't the confidence envelope follow the expected line?
I would be glad if anyone had a suggestion what mistake I might have made here.

Mareike
0 Kudos
18 Replies
Mareike_TabeaScheller
New Contributor
I'm working in Iceland, so I use the ISN_2004 Projected Coordinate System. In the results window of the K-function, it always shows that the tool used NAD_1927_to_NAD_1983_NADCON in the field "Geographic transformations". This just appears automatically. Could this be the "Error" here? What are these geographic transformations for and what kind of transformation should I use?
I would be really really glad if someone had a suggestion! :confused:
0 Kudos
LaurenRosenshein
New Contributor III
Hi Mareike,

I'm really sorry that you're having trouble! Would it be possible for you to attach a copy of the graphical output from K-Function?  That would help us start to figure out exactly what's going on. 

Hopefully we'll be able to figure this out and get you up and running!

Lauren Rosenshein
Geoprocessing Product Engineer
0 Kudos
Mareike_TabeaScheller
New Contributor
Hi!

Thanks for your answer. Here is my graph. I noticed if I switch the Boundary Correction Method (I used "Simulate Outer Boundaries") off, the confidence envelope follows the observed line, which is of course also wrong. If I want to use the other two methods I get an error. So I'm still confused..

Mareike
0 Kudos
JeffreyEvans
Occasional Contributor III
My guess is that your point process is in fact inhomogeneous (nonstationary). One of the underlying assumptions of point pattern analysis is that your point pattern is representing a stationary random field. Some test of this assumption are closed-space distance, Morishita or Fry plots. The null that your are testing against in the K statistic is a CSR (Complete Spatial Randomness) process. If the underlying random field is not CSR but conditional on the intensity of the measured process the null is invalid and the expected will not follow Gibbs randomization used to generate the simulation envelope. Unfortunately, inhomogeneous PPA statistics are not widley available and are very computational expensive. There is a  inhomogeneous K available in Spatstat. As an alternative I would recommend fitting an empirical point process model using covariates. This has the potential of detrending a variable may be conditioning the intensity. An option without covariates is to fit a 2nd order polynomial to detrend the point process. However, this is assuming that the nonstationarity is the result of 1st order spatial variation and in ecological process rarely is this the case.
0 Kudos
LaurenScott
Occasional Contributor
Hi Mareike,
Sorry you�??re having trouble with this! 
The K Function works by simply counting feature pairs: the tool �??visits�?� each feature in the dataset, selects all features within a specified distance of the target feature, and counts the number of feature pairs among the selected features�?�feature pair counts are accumulated as the tool visits every feature.  The distance is then increased and the counting repeated�?� and this process continues however many times you�??ve specified for the Number of Distance Bands parameter.

These accumulated counts (one for each distance) are converted to an index and plotted on a line graph.  When your points tend to be clustered, the accumulated counts are higher, and the index falls above the blue diagonal expected line.  When the points tend to be dispersed, counts are lower and the index falls below the expected line. 

To decide if the clustering or dispersion is significantly different from what you would get if the points were randomly distributed in your study area, the tool uses simulation.  The tool randomly pitches your points into your study area 9, 99, or 999 times and for each simulation, it performs the whole distance/counting thing.  From all the simulations, it remembers (for each distance) the most clustered index obtained from the random process of pitching your points into the study area, and it remembers the most dispersed index obtained.  These extreme values form the confidence envelope, and they show you (given X number of points and the peculiarities of your study area), what is the range of possible indices you can obtain from a random process.

For a weighted K function, the confidence envelope follows the observed line and the simulation process is a bit different than I described above.  From the graphic you sent, my guess is you are using the unweighted K Function, but please let me know if I�??ve guessed incorrectly.

For the unweighed K function, if the study area has a very simple shape (circle, rectangle) the confidence envelope will enclose the expected line.  When the study area isn�??t simple (there are peninsulas, or you are working with an �??L�?� shape, for example) then the study area itself can force randomly placed features to be far away from each other, so the confidence envelope appears below the expected line (more dispersed).

Okay, so why might someone run the unweighted K function?  The K function provides a kind of spatial �??fingerprint�?� of how spatial clustering among your point features changes across multiple scales (across increasing distances).  Why is this interesting?  Whenever we see clustering in the landscape, we are seeing evidence of underlying spatial processes at work.  Statistically significant peaks or dips of the observed index are evidence that spatial processes are operating at the associated spatial scale.  Sometimes knowing something about these statistically significant spatial scales provides clues about the underlying processes at work.  Comparing the spatial �??fingerprints�?� for two different point datasets within the exact same study area can tell you if their spatial patterns are being influenced by the same or different spatial processes.

Some questions for you:  You indicated you are analyzing birds on an island.  Are you providing a study area polygon when you run the K function?  If so, might that polygon be forcing a structure on the simulations that would explain why the confidence envelope falls below the expected line?

Do the points you have reflect a sample of bird sitings, or do they represent ALL possible data (like ALL bird nests on the island)?  Sampled data, especially when the samples might be biased by observer behavior or the sampling scheme, are not good candidates for the K Function�?� there is the risk that you will model observer behavior rather than bird behavior.

You mentioned a projection/transformation warning message or error�?� that sounds like a problem.  If possible, I�??m hoping you can send me your data so that we can figure out exactly why you are getting the unexpected results.  Please contact me directly at LScott@Esri.com if that might be possible.

Again, I�??m sorry you are having problems with the K Function.  I hope this information is helpful to you.  If anything is unclear, please contact me or reply here and I will do my very best to clarify.
Lauren

Lauren M Scott, PhD
Esri
Geoprocessing, Spatial Statistics
0 Kudos
EllenKersten
New Contributor
I am having the same problem with the confidence intervals. I have a rectangular study area and no projection problems.
Also, in the table output, the ExpectedK values appear to just be the distance thresholds that are evaluated (5, 10, 15, 20, etc.) and are not actually calculated ExpectedK values. Why is this?
0 Kudos
LaurenScott
Occasional Contributor
Hi Ellen,
I will look at the data you sent to me.  Thank you.
With regard to the Expected K values being exactly equal to the Distance values, that is what you will always get.  The reason is because we are using a transformation that converts the Expected K value to be equal to distance.

For more information on this, please see:
Getis, A. Interactive Modeling Using Second-Order Analysis. Environment and Planning A, 16: 173�??183. 1984.

The actual formula for the L(d) transformation is given in:
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_Multi_Distance_Spatial_Cluster_Ana...

I do have a bug in for myself to improve the K Function documentation.  Sorry for the confusion!  (I can't believe I include the L(d) formula and then don't actually tell you what it does... my very bad!  So sorry!).

Thank you for your post and for sending me the data.  More soon.
Lauren

Lauren M. Scott, PhD
Esri
Geoprocessing, Spatial Statistics
0 Kudos
EllenKersten
New Contributor
Thanks. I understand what the L transformation does, so if that is indeed the formula that is being used then it seems like the output.dbf should describe the fields as ExpectedL(Distance) and ObservedL rather than saying K. The .dbf fields differ from the results window box, which labels the output fields Distance and L(d).

My reading on the L transformation suggests that it is used to make graphical interpretation of results more straightforward. In that case, it would be more helpful if the graphical output displayed L(d)-d on the y axis so that the expected line (which from my understanding represents complete spatial randomness) is equal to y=0 rather than a line with a slope of 1. Also, the legend of the graphic should say ExpectedL and ObservedL to be consistent with the formula that is used.

I look forward to hearing your response for why the confidence intervals at some distances do not include the expected value for L (CSR).
0 Kudos
Mareike_TabeaScheller
New Contributor
Thank you very much for your answer, Lauren. It is right, that I have a very complicated study area, because it is the polygon of the whole island, so that explains the problems with the confidence envelope! That´s good to know. The points reflect bird sightings. As you suggested, it might be the reason why Ripley´s K shows strange results?!  I used the Spatial Autocorrelation tool also to analyse the clustered areas. The results from this tool where right, I suppose, but the Ripleys K-Function showed completely different results. So I used Spatial Autocorrelation in my report in the end. I think I first had some problems with the projection/transformation because I had two coordinate systems in the ArcMap Document, ISN1993 and ISN2004.

Thanks for your help,
Mareike
0 Kudos