Geocoding: Same Dataset,Address Locator, and Geocoding Options, Different Results

1018
7
03-24-2014 01:34 PM
ChuncuiFan
New Contributor
Hi All, I geocoded a dataset with around 500,000 addresses on my computer, and my team member validated the geocoding results using the same address locator and geocoding options on her computer. A small number of the addresses (0.4% among all addresses) in the dataset have unmatched X and Y coordinates from the two separate runs. Most of the unmatched geocoded X and Y coordinates are several hundred feet apart or even closer. The address locator that we use comes from the ESRI StreetMap Premium. Does anybody have an idea about why this happened? Thanks!
Tags (2)
0 Kudos
7 Replies
JoeBorgione
MVP Emeritus
Hi All, I geocoded a dataset with around 500,000 addresses on my computer, and my team member validated the geocoding results using the same address locator and geocoding options on her computer. A small number of the addresses (0.4% among all addresses) in the dataset have unmatched X and Y coordinates from the two separate runs. Most of the unmatched geocoded X and Y coordinates are several hundred feet apart or even closer. The address locator that we use comes from the ESRI StreetMap Premium. Does anybody have an idea about why this happened? Thanks!


Not sure I understand your question.  You say that you've geocoded 500K records with a 99.6% hit rate.  That's pretty awesome.

But here's where you lose me: most of the unmatched geocoded X and Y coordinates are several hundred feet apart or even closer. Closer to what?

If they are unmatched, how do you know they are 'several hundred feet apart or closer'?    If my math is right, at a hit-rate of 99.6%, you've got 498K records geocoded, and you're wondering why the other two thousand did not?  Perhaps you could clarify.
That should just about do it....
0 Kudos
ChuncuiFan
New Contributor
Hi Joe,

The 99.6% is not the same concept of "match rate" in geocoding. What I mean here is that we geocoded the dataset once and validated the results by geocoding the same dataset on another computer again (same geocoding options and same address locator). We followed the procedures below and expected to have the same X and Y coordinates for all addresses from the two runs.
1. Import the .dbf tables into SAS
2. Keep only uniqueid, x, y
3. In one of the datasets, rename x = x1 and y = y1
4. Merge the two datasets by uniqueid
5. Find the difference between the x�??s and y�??s (diffx = x1 - x and diffy = y1 - y)
6. Run a frequency on diffx and diffy to see if there are records that had different values for x or y

However, only 99.6% of the cases have the same values of X and Y coordinates from the two runs. Around a thousand geocoded addresses vary in their X and Y coordinates. And when I checked the X and Y coordinates of each address from the two separate runs, the distance between the two points are several hundred feet apart or even closer.

Feel free to ask questions if you need further clarification. Thanks!


Not sure I understand your question.  You say that you've geocoded 500K records with a 99.6% hit rate.  That's pretty awesome.

But here's where you lose me: most of the unmatched geocoded X and Y coordinates are several hundred feet apart or even closer. Closer to what?

If they are unmatched, how do you know they are 'several hundred feet apart or closer'?    If my math is right, at a hit-rate of 99.6%, you've got 498K records geocoded, and you're wondering why the other two thousand did not?  Perhaps you could clarify.
0 Kudos
JoeBorgione
MVP Emeritus
Hi Joe,

The 99.6% is not the same concept of "match rate" in geocoding. What I mean here is that we geocoded the dataset once and validated the results by geocoding the same dataset on another computer again (same geocoding options and same address locator). We followed the procedures below and expected to have the same X and Y coordinates for all addresses from the two runs.
1. Import the .dbf tables into SAS
2. Keep only uniqueid, x, y
3. In one of the datasets, rename x = x1 and y = y1
4. Merge the two datasets by uniqueid
5. Find the difference between the x�??s and y�??s (diffx = x1 - x and diffy = y1 - y)
6. Run a frequency on diffx and diffy to see if there are records that had different values for x or y

However, only 99.6% of the cases have the same values of X and Y coordinates from the two runs. Around a thousand geocoded addresses vary in their X and Y coordinates. And when I checked the X and Y coordinates of each address from the two separate runs, the distance between the two points are several hundred feet apart or even closer.

Feel free to ask questions if you need further clarification. Thanks!


Still seems a little fuzzy but for the sake if argument, let's assume it just me...

1.These are the dbf tables from your gecoding results?  Why do you need to import them to SAS?
2-6....  Personally, I'd out put the geocoding results to some flavor of geodatabse; Shapefiles are so 1995.  I would JOIN the two TABLES not the point features via the UniqueID.  Your temporary output table from the join should then have:

UniqueID, X,Y, X1,Y1

Then select where x<> x1 OR y <> y1.  Or make a selection where x<>x1 AND y<>y1.  You could then do your diff analysis there.  The only thing I can think of is how each computer actually computes the x,y pairs; that or the .dbfs are the root of the problem.
That should just about do it....
0 Kudos
RobertBorchert
Frequent Contributor III
I have had better luck downloading the Tiger center line data set.  There are two versions the one that is ready for geocoding and the one that is 500% more complete but takes a lot of work to set up.

The difference is one has only 1 instance of a road and has the address ranges built in.

The other version has all instances of a road.  Example.  In my town State Highway 23 runs through it.  Along that stretch it changes names 3 times.  i.e. Division Street, Broadway, Roosevelt Road before leaving town and becoming just highway 23 again.

To make matters worse not all segments are the same length for each instance of the name.

Then there is a separate table that has address ranges and codes to link them to something in the center line file.

Anywho, it is a lot of work to set up but it more accurate.

Then you just create your own address locator in Catalog

Hi All, I geocoded a dataset with around 500,000 addresses on my computer, and my team member validated the geocoding results using the same address locator and geocoding options on her computer. A small number of the addresses (0.4% among all addresses) in the dataset have unmatched X and Y coordinates from the two separate runs. Most of the unmatched geocoded X and Y coordinates are several hundred feet apart or even closer. The address locator that we use comes from the ESRI StreetMap Premium. Does anybody have an idea about why this happened? Thanks!
0 Kudos
ChuncuiFan
New Contributor
Hi Joe, I agree that all the steps can be done in ArcGIS. We imported the data into SAS because we validated geocoding results by different people on different computers and not everyone was comfortable using ArcGIS for data analysis. Anyway, the join feature gave us the same results. I just wonder why we got two different sets of X and Y pairs for a few addresses. Since we are using the same address locator (from ESRI StreetMap Premium) and same geocoding settings, we should get exactly the same X and Y values for all addresses, shouldn't we? Thanks!


Still seems a little fuzzy but for the sake if argument, let's assume it just me...

1.These are the dbf tables from your gecoding results?  Why do you need to import them to SAS?
2-6....  Personally, I'd out put the geocoding results to some flavor of geodatabse; Shapefiles are so 1995.  I would JOIN the two TABLES not the point features via the UniqueID.  Your temporary output table from the join should then have:

UniqueID, X,Y, X1,Y1

Then select where x<> x1 OR y <> y1.  Or make a selection where x<>x1 AND y<>y1.  You could then do your diff analysis there.  The only thing I can think of is how each computer actually computes the x,y pairs; that or the .dbfs are the root of the problem.
0 Kudos
BradNiemand
Esri Regular Contributor

What version of ArcGIS are you using and what version of StreetMap Premium?

Brad

0 Kudos
ChuncuiFan
New Contributor
Hi Robert, We used one of the address locators from ESRI StreetMap Premium, which we purchased from ESRI. And as we were told, it is based on TomTom's reference data. I'm using the server version StreetMap Premium. So there is no way for us to download the address locator or modify it. Thanks!

I have had better luck downloading the Tiger center line data set.  There are two versions the one that is ready for geocoding and the one that is 500% more complete but takes a lot of work to set up.

The difference is one has only 1 instance of a road and has the address ranges built in.

The other version has all instances of a road.  Example.  In my town State Highway 23 runs through it.  Along that stretch it changes names 3 times.  i.e. Division Street, Broadway, Roosevelt Road before leaving town and becoming just highway 23 again.

To make matters worse not all segments are the same length for each instance of the name.

Then there is a separate table that has address ranges and codes to link them to something in the center line file.

Anywho, it is a lot of work to set up but it more accurate.

Then you just create your own address locator in Catalog
0 Kudos