Geocoding precision - how to determine

1893
4
07-18-2017 02:30 PM
ArthurNazarian
New Contributor

I would say the following is a basic but crucial information, but I couldn't find good information about it. I'll explain what I have in mind, I might have some errors in my line of thought...

I have geocoded my data with the World Geocoder. For further analysis, I'd like to use only the data that I consider accurate enough. For this, I thought that Addr_type ("The type of address that was geocoded. This attribute indicates to what kind of feature the address was matched") would be a good feature to analyse for precision. There you have types like PointAddress, StreetAddress and Postal, referring to a pin-pointed building as a location, a street segment, and a postal code respectively.

For example, if Postal would have have an average radius (depending on the polygons of the postal codes) that is  500m, that would mean that the actual location can be 500m off. Then I would ignore all Postal data because I consider that not accurate enough. Or I could look per case of course, ignoring all individual data that is not accurate enough.

For this analysis, I thought that Xmin, Xmax, Ymin, Ymax would contain the appropriate information. For Xmin, the link I mentioned before states: "The minimum x-coordinate for the display extent of a feature returned by the locator. The Xmin, Xmax, Ymin, and Ymax values can be combined to set the map extent for displaying a geocode result. The extent coordinates use the spatial reference of the locator (default)." I might be wrong, but this seems to have the correct information.

So what I did is take the difference between the Xmax and Xmin (which is always the same as the difference between Ymax and Ymin) and analyse that. However, looking at those results, PointAddress has more or less the same range as StreetAddresses, which I would say is odd. However, what is much more weird is that Postal has the same range too (with almost no deviation in between results). But a postal code cannot be as accurate as a pinpointed building, right?

What am I doing wrong?

(By the way, for some better sense of "accuracy" I would like to have the "radius" or "range" in meters, for this I transform during post-processing the locations to UTM coordinates and then perform the same analysis by taking the difference between Xmax and Xmin. Would this be okay or are there better ways?)

4 Replies
ChrisDonohue__GISP
MVP Frequent Contributor

(Tagging  Addressing for greater visibility).

Chris Donohue, GISP

JoeBorgione
MVP Esteemed Contributor

Accuracy or precision?  Two different beasts IMHO.  See:  Data accuracy for emergency response .  If you want a geocoder to be precise enough to nail a building location, using some online world geocoder isn't what you want to use.  If you are happy to get a match inside the right postal polygon, you should be good to go.

If you hit the target but with a large group as shown below right, you're acccurate.  On the left the group is precise (nice and tight) but not so accurate if you're trying to hit the bullseye.  Everybody wants to stack thier shots in the smallest circle, but everything comes with a price.  Maybe something a little more precise than the right and a little more accurate than the left is good enough....  What are you using the results for and how much are you willing to 'pay' for high accuracy AND high precision?

can't wait to retire....
ArthurNazarian
New Contributor

Thanks Joe! Good point, I indeed mean precision rather than accuracy! Accuracy can be found with the Score attribute probably. In any way, I've checked the results, and they are accurate enough. I'll change the opening post, thanks again!

However, what I'd like to know is is indeed the precision of the geocoded results. For example, if my data (i.e. address strings) only had a postal code, ArcGIS will give a Postal Addr_type after geocoding. Which is what it should do with the data provided, hence the accuracy is excellent.

But the actual pinpointed location, is somewhere in within that postal code. So the precision depends on the size of the (polygon of the) postal code. For example, if the postal code is 100 square meters, then I know that the actual location must be somewhere within that area, and 100 square meters is possible deviation is good enough. However, if the particular postal code is a 100 square kilometers, the precision of actual location is arguably not good enough.

Like I mentioned in the opening post, I was expecting Xmin and Xmax would give me the information of the possible deviation, but the results seem to give something different. So I think I misinterpret Xmin and Xmax, and that getting the precision of the results should be done differently.

0 Kudos
JoeBorgione
MVP Esteemed Contributor

Re-reading your original post, it seems to me you are getting hung up on the XY min/max of the data you are matching to; yes, in your second post you are correct in saying you have misinterpreted thier applicablity to your issue.

When you match to polygon features like a postal zone, your resulting geocoded point will be located on the polygon centroid.  It does not matter how big that polygon is.  I'm not a good enough mathematician to describe how to determine the level of precision, but in layman's terms I can understand, it's pretty low.  Using my targets example, you are going hit the target over and over somewhere, with a very low probablity of being in the bullseye and more than likely on one of the outer most rings.Think about it; if one of your addresses is actually located on the centroid of the polygon, you'll be on the bullseye, right?  But matchig 1,000 records against a 100 km-square polygon  is a bet I wouldn't take.  A component of precsion is also how  often you repeat that bullseye shot.

As a hobby, I shoot rifles at long distance targets, hence my propensity of the target analogy.  I can stack hits in the bullseye all day long at 200 meters.   That's the range at which I 'sight' my scope, so shooting groups of 5 shots, I expect them to be within a couple centimeters of each other and in the the bullseye.  My groups expand as range increase, and that's to be expected.  However, if I'm at least hitting my target at 800 + meters one shot after another, it's a good day at the range! (The target on the right above, would be a thrill at 1,000 meters; trust me!)

The issue you are actually faced with is the level of comfort you take in using a geocoder ( or series of geocoders)  in which you really have no knowlege of the quality of data that are used.  It's a leap of faith at best; for what I do, I'm not comfortable with that leap.  But you may be, and that's just fine!

In theory, point data will provide the most precise results, but you'd need to know how the point was derivied to be absolutely sure.  If the point is 'just' the property parcel centroid, that's pretty good, but not as good as if the point was manually placed on a high resolution, well rectified image at the front door of every house.

Assuming the data you are using is of relative good quality, the precision of resulting geocdoded points in order of highest to least is points, lines and polygons for the matching data.  How to quantify that level of precision is beyond my means.  

The data you are actually matching plays a role in the results as well, but that will be at least another cup of coffee, perhaps two for me to discuss...

Hope this helps!

can't wait to retire....
0 Kudos