Data Scrubbing Addresses

4981
16
09-02-2016 08:45 AM
BrandonPowell1
Occasional Contributor

This is my first ever post in GeoNet and I am new to the GIS field.

 

One of my first assignments is to geocode all of our customer locations in ArcMap. The task seems simple enough, but I need the data to be clean first. Are there any recommended tools to clean up the addresses? How do I know they are accurate? Etc..

0 Kudos
16 Replies
ChrisDonohue__GISP
MVP Alum

If you haven't already done so, I'd recommend looking at both the Geocoding process and your address data to get a feel for how it all fits together.  In particular, pay attention to how the data is broken out into fields in your data. 

Check the Help of the Version of the software you have.  Here's some older Help information in the same vein:

Geocoding - ArcGIS Desktop 10.0

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&sqi=2&ved=0ahUKEwi7nKD-iPHOAhVIVWMKH... 

Hopefully folks can share some tools that may help.

Adding Addressing Address Points 

Chris Donohue, GISP

ChrisDonohue__GISP
MVP Alum

Some questions to better help the GeoNet community provide answers for you:

  • What GIS software are you using for geocoding?  ArcGIS?  ArcGISPro?  ArcGIS Online? Another?
  • What version?
  • Does your organization use the Local Government Information Model (LGIM)?

Also, let me tag a specific addressing troublemaker expert jborgion .  Joe does 911 addressing and I believe address scrubbing is a normal part of his work as he pulls together addressing data from several sources.  If he is free and sees this, I bet he has some good leads.  Though the Black SUV full of National Addressing Standards police may have finally caught up with him.... 

Chris Donohue, GISP

0 Kudos
BrandonPowell1
Occasional Contributor

Thanks Chris.

I'm using ArcGIS and ArcGIS Online

Version 10.4.1

I have to be honest I'm not sure if we use LGIM or not, so I'm leaning towards a no. Like I said, I'm very new to this.

I appreciate your assistance!

0 Kudos
JoeBorgione
MVP Emeritus

Hey now... I resemble that comment...

... but I need the data to be clean first....

Which data are you talking about?  Your (1) table of customers or the (2)data you are matching to?

#1:  This is the obstacle all of us face when geocoding.  The addresses need to be in a format compatible with the data you are matching against ie: 1234 S Main St  may not get a 100% match if your data says MAINE STREET

#2 What are you matching against and what is the vintage or pedigree of the data?  Is it current?  Is it attributed properly?  ie yesterday some sent me street data to be used in a geocoding project but only some of the streets have names, and NONE of them have address ranges.  (Wow!)

That should just about do it....
ChrisDonohue__GISP
MVP Alum

No address ranges?  Awesome data for geocoding.

"WE DON'T NEED NO STINKIN' ADDRESS RANGES!" 

(Sorry, couldn't help but channel Cheech and Chong there.....)

Chris Donohue, GISP

0 Kudos
BrandonPowell1
Occasional Contributor

Yep, sorry for being too vague.

1. We exported a list of our customers into Excel from our SQL database and this is what I need to be clean.

2. What are we matching against...I suppose I have to figure this out. As a test I have already imported the .csv file into an ArcGIS Online map as a layer and it generally did what I expected it to do, namely I got a bunch of points on the map of where our customers are located. In my excel sheet I have a column for (street address, street address 2(P.O. box, etc.), city, state, zip, zip+4).

I think Chris was helpful by saying:

"In general, I would say a one-shot perfect geocoding run is not possible outside of a classroom.  You can get many of them to work by geocoding in the first try, but there will always be some addresses that will elude the process.  Then one has investigate them and maybe correct the data, and do several followup geocoding runs with different settings to finally get them to work.  Expect the ones that don't work the first time to take a bit of effort to resolve."

I admit to some degree I was hoping it would just work especially since it's over 5000 addresses!

0 Kudos
JoeBorgione
MVP Emeritus

I re-read your original post and noticed you are using ArcGIS Online as your geocoding source; I don't use that and never have so I can't comment on its accuracy or latency. Sounds like you are well on your way with it though;  what is your hit ratio to non hits?  That is, what percentage of your 5,000 records came back with a 85% or better score ?  How many came back un matched?  Look at the unmatched ones or the ones that scored lower that 85 and see how the address information is formatted.  Those are the problem children.

BTW:  5,000 address is a good first start for you.  I've done batches with 400 K +.  Pretty sexy business, eh?!

That should just about do it....
ChrisDonohue__GISP
MVP Alum

Joe - did you get paid by the address match?  Hmmmm, maybe I should go back to working in the Private Sector...... 

Chris Donohue, GISP

0 Kudos
JoeBorgione
MVP Emeritus

I may need to re-negotiate that part of my contract.  That's brilliant Chris!

That should just about do it....