Select to view content in your preferred language

Data Scrubbing Addresses

5370
16
09-02-2016 08:45 AM
BrandonPowell1
Occasional Contributor

This is my first ever post in GeoNet and I am new to the GIS field.

 

One of my first assignments is to geocode all of our customer locations in ArcMap. The task seems simple enough, but I need the data to be clean first. Are there any recommended tools to clean up the addresses? How do I know they are accurate? Etc..

0 Kudos
16 Replies
ChrisDonohue__GISP
MVP Alum

Just to provide a broad overview, Geocoding processes assume the address data will be fed to it in a certain way.  Depending upon what type of address locator one employs (single field or multiple field), the data input to it will need to be in a certain format.  So the cleanup process usually involves arranging the data into the needed format.  This often involves creating new fields in the input table and then calculating over parts of the original data to come up with the combination(s) the address locator needs so one gets a relevant geocoded result.  Or one may have to parse out pieces of data from fields to populate new fields.  For example, parsing out "123 Main Street" into a StreetNumber field ("123"), a StreetName field ("Main"), and a StreetType ("Street") field.

Note - if you don't have an address locator available, you can build one.  This will take some learning.

In terms of accuracy, that can be a major issue.  As Joe Borgione  mentioned, a typo where "Maine" is entered instead of "Main" will will probably not produce a match when geocoding.  So you may need to do some checking to look for typos, missing data components, data in the wrong sequence (example "Main Street 123" instead of the expected "123 Main Street", etc.

In general, I would say a one-shot perfect geocoding run is not possible outside of a classroom.  You can get many of them to work by geocoding in the first try, but there will always be some addresses that will elude the process.  Then one has investigate them and maybe correct the data, and do several followup geocoding runs with different settings to finally get them to work.  Expect the ones that don't work the first time to take a bit of effort to resolve. 

Geocode Addresses—Help | ArcGIS for Desktop 

Chris Donohue, GISP

BrandonPowell1
Occasional Contributor

Thanks Chris and Joe for your help. I've figured out a lot in the last 2 weeks and am starting to feel more comfortable with the ArcGIS suite. You're responses were very helpful to me. I am actually not going to be using the ArcGIS Online as a geocoding source, but rather a geocoder we purchased through BA.

0 Kudos
BrandonPowell1
Occasional Contributor

Hey cdspatial‌ and jborgion‌ could I ask ya'll one more question about #geocoding postcodes‌? I've been doing some tests on geocoding our customer data again. I realized just yesterday that if I geocode an address for ex: 123 main street, rochester, IN 46975 and then I also goecode that same address but include the zip+4 46975-8008 that I'm still getting the same XY Coordinates. It's like my geocoder doesn't care about zip+4 which is really messing with my location accuracy. Any thoughts on that?

0 Kudos
JoeBorgione
MVP Emeritus

As I understand zip + 4,  it's used by the USPS as more or less a mail sorting tool to get mail into the correct leather bag for delivery.  If you don't use any zip, you may get several 1234 S Main St hits, but the actual location of all of them won't change by adding a zip or a zip + 4.  Using them will just limit your choices.

That should just about do it....
BrandonPowell1
Occasional Contributor

I gave a really bad example above. I should have put PO Box 111, rochester, IN 46975 vs PO Box 111, rochester, IN 46975-8008. Because the geocoder doesn't care about the PO Box # it then just looks at the city/state/zip. Because we are doing location analytics here it really makes a difference whether or not we use the Zip or the Zip+4 because the +4 is much more accurate. I actually found yesterday that my Business Analyst data came with a USA_ZIP4_LocalComposite locator that I think is going to fix this. I was using the USA_LocalComposite before.

0 Kudos
ShanaBritt
Esri Regular Contributor

Brandon:

The USA_ZIP4_LocalComposite locator included with Business Analyst will return a match for just 46975-8008 in the address " PO Box 111, rochester, IN 46975-8008". There is no zip4 information associated with the PointAddress and StreetAddress locators that participate in the USA_ZIP4_LocalComposite locator. The possible candidates you would get back would be from the Zip4, PostalCode and AdminPlaces locators.

-Shana

JoeBorgione
MVP Emeritus

Geocoding PO boxes?  Not sure I see the point in that.

That should just about do it....
0 Kudos