Creating Sample Points for Accuracy Assessment of Polygon layer in Data Reviewer

Discussion created by susansnet on May 15, 2012
Latest reply on Jul 16, 2012 by paniello-esristaff
I would greatly appreciate input on the following methodological issue:

I have a polygon layer with a binary attribute (0,1).  The polygons represent areas that contain roads (value 0) or unroaded areas (like wilderness study areas, etc. that are important for conservation - value 1).  I generated the polygon layer through an automated process and want to do a statistically based accuracy assessment using orthophotos as a reference.  I'm aware that I can use Data Reviewer to calculate a sample for accuracy checking, however in this case, it is not logical or meaningful to use polygons as the basis for the accuracy assessment. This is because the polygons range in size from 1,000 acres to hundreds of thousands of acres.  It is inappropriate (and impossible) to check the imagery of a 100,000 acre polygon to see if there is evidence of a single road in it and it would be inappropriate to "fail" it if, for example, 99.9% of that polygon was mapped correctly but only 1 small error existed. Rather, we want to use points.  Each point is viewed over imagery and if that point, according to the orthophoto falls within an area that meets our guidelines (contiguous area without roads 1,000 acres or greater, > 1000 m from a road, etc.), then it's considered roadless, otherwise not.  We overlay our image-checked points with our mapped polygons and calculate the error matrix from there.

Here's the dilemma - how to get to the statistically valid sample of Points, when starting with polygons?  The process I used was to find an appropriate minimum spacing of sample points that I thought would minimize spatial autocorrelation while still allowing for a reasonable number of sample points.  Moran's I is inappropriate to use with binary data, but the Joint (or Join) Count statistic that IS suitable is not part of the Spatial Statistics toolbox.  Instead, I made a rough estimate, using diameter of the median roadless polygon, and bumped the spacing up a bit.  (It is impossible, however, given the huge polygon sizes, and natural clumping of roadless and not-roadless lands on the landscape, to avoid spatial autocorrelation, no matter how large the spacing between points - surely this is a typical problem?!).  I decided a minimum distance between points should be 5 km.  So I created a fishnet with label points at 5 km distance.  This gave me a "population" of 1,192 points in our study area.  I used Data Reviewer, Sampling Check, to select a random sample from these points for a 95% confidence level, 4% margin of error, which yielded 399 points.  (But if I'd chosen 10-km distance, my starting "population" would have been higher and needed sample size higher, too).

So really I'm wondering - HOW BOGUS IS MY METHODOLOGY and WHAT IS THE APPROPRIATE WAY TO DO THIS???  Surely this type of question is something the must commonly be dealt with in accuracy assessments???

Thanks for reading this way-too-long post and offering your insights!