New geocoding engine is terrible.

1471
24
01-26-2011 10:46 AM
DanMarrier
New Contributor II
I really can't figure this one out.  It flies in the face of the hierarchical priority for composite locators that we've been led to believe is in place.

I have a composite locator service that makes use of 9 individual locator services.  Each of those 9 individual locator services has been tested and works as expected when run by itself on a given input address list.

For a specific address in my data set, I know ahead of time that the first locator service in the composite will not find a match when run as a standalone, but the second locator service will, as will some of the subsequent locator services lower in the hierarchy, although the later ones are more prone to errors.

However, when I run this address through the composite locator service, the second locator service is not returning a match.  It ends up defaulting to the fourth locator service in the composite, and that service is actually returning a bad point location because I'm allowing for tied scores to be matched, and it is randomly picking the wrong one.

How on earth can a locator service find a match for an address when run by itself, but not when run in a composite where no locator higher than it in the hierarchy finds a match???

As an additional slap in the face, if I input just an address and a zipcode, and the locator finds the address in a different zipcode, the reported score is 95 out of 100.  Really???  95???  I would expect the default settings of any geocoding service to report a significantly lower score than 95 when the "matched" Zone is completely different than the one I input! 

There also seems to be no heuristic or even common sense when making a match to a different zipcode.  I know of at least one case where the input zipcode has been miskeyed as "02113" instead of "02133".  But due to the highly arbitrary nature of matching tied score records... the "matched" address has a zipcode of "12508"... that's not even in the same state!!!  You'd think that there would be some sort of additional check that if scores are tied and there are different zone values in play, that the one which more closely resembles the input would get a higher priority.

Has anybody else tested a composite service where an individual locator service works for a given address but fails when implemented in the composite?
Tags (2)
0 Kudos
24 Replies
BradNiemand
Esri Regular Contributor
Ok what you described is a bug but I have a workaround for it.

1. From ArcCatalog, open the composite locator properties dialog for the composite that contains the 10 locators.
2. Delete all of the Zone fields from the "The Field containing:" box of the "Input Address Fields" section (See DeletedAddressFields.png).
3. Add back the all of the Zone (_city, _state, _zip) fields but pre-pend the names with an underscore (See AddAddressFields.png).
4. Remap the locators to the appropriate address fields (See RemapFields.png).
5. Click the "OK" button.

You should be good to go now.

What was happening was that the participating locators, even though you did not map the fields, know about the city, state, and zip fields.  When you pass these fields in from the composite, it tries to use these fields to geocode the address. Because the locator knows about the fields but does not have any data associated with them in the locator, it deducts the score because it thinks that the values are wrong (something compared to nothing = wrong).  By changing the names of the fields for the composite, the participating locators now don't know about the other fields, they don't try to use them and everything works as expected now.

Brad
0 Kudos
DanMarrier
New Contributor II
I was out of the office for a few days, but I will try this proposed solution the next time I have to produce a geocoded address output for one of the agencies I work with.  I'll update this thread with the results.
0 Kudos
KarynBackus
New Contributor
Brad, 

This customization will be very helpful.  For my projects, we geocode by both town and zip code but then we want to geocode remaining addresses by just one zone.  I could not get the v10 composite address locator to properly apply the individual address locators because it always expected both town and zip even when an address locator was created using only 1 zone.

I will try your proposed solutions.  Thank you.

Also, you may want to tag or repost this thread since the title is not representative of your very valuable solution.
0 Kudos
TracyGarrison
New Contributor III
I was out of the office for a few days, but I will try this proposed solution the next time I have to produce a geocoded address output for one of the agencies I work with.  I'll update this thread with the results.


Dan, please let us know if Brad's solution worked for you or not.

Thanks
0 Kudos
TracyGarrison
New Contributor III
Ok what you described is a bug but I have a workaround for it.

1. From ArcCatalog, open the composite locator properties dialog for the composite that contains the 10 locators.
2. Delete all of the Zone fields from the "The Field containing:" box of the "Input Address Fields" section (See DeletedAddressFields.png).
3. Add back the all of the Zone (_city, _state, _zip) fields but pre-pend the names with an underscore (See AddAddressFields.png).
4. Remap the locators to the appropriate address fields (See RemapFields.png).
5. Click the "OK" button.

You should be good to go now.

What was happening was that the participating locators, even though you did not map the fields, know about the city, state, and zip fields.  When you pass these fields in from the composite, it tries to use these fields to geocode the address. Because the locator knows about the fields but does not have any data associated with them in the locator, it deducts the score because it thinks that the values are wrong (something compared to nothing = wrong).  By changing the names of the fields for the composite, the participating locators now don't know about the other fields, they don't try to use them and everything works as expected now.

Brad


Brad,  Thanks for taking the time to work out this solution and your explanation as to why this problem is occurring was very informative.  I have been reviewing the "Custimizing Locators in ArcGIS 10" and a lot of the "Gibberish" starts to become informative as well.

Would you be able to provide the exact section of "Gibberish" in the locator xml that computes the score from all the pieces of the address string that makes up the score?

Thanks
0 Kudos
BradNiemand
Esri Regular Contributor
In the grammar, you would start with the highest level component.  For batch geocoding it would be "MultiLineAddress" and "MultiLineZone".  For single line input, it would be "Location" but there are no weights applied until you get to the "FullNormalAddress".  So each component has a "weight" that is applied to it.  Each top level component may contain 1 to many child components that also have a weight associated with it.  All of these components get scored and contribute to the score.  I will attach the presentation that my colleagues and I presented last year at the user conference that has a section about scoring to the geocoding resource center later today.  I hope this helps.

http://resources.arcgis.com/gallery/file/geocoding

Brad
0 Kudos
TracyGarrison
New Contributor III
In the grammar, you would start with the highest level component.  For batch geocoding it would be "MultiLineAddress" and "MultiLineZone".  For single line input, it would be "Location" but there are no weights applied until you get to the "FullNormalAddress".  So each component has a "weight" that is applied to it.  Each top level component may contain 1 to many child components that also have a weight associated with it.  All of these components get scored and contribute to the score.  I will attach the presentation that my colleagues and I presented last year at the user conference that has a section about scoring to the geocoding resource center later today.  I hope this helps.

http://resources.arcgis.com/gallery/file/geocoding

Brad


Again, thank you for your help and I will begin studying the areas you have outlined more closely.  I will look for the presentation you have mentioned later today.

Thanks
0 Kudos
BradNiemand
Esri Regular Contributor
In the grammar, you would start with the highest level component.  For batch geocoding it would be "MultiLineAddress" and "MultiLineZone".  For single line input, it would be "Location" but there are no weights applied until you get to the "FullNormalAddress".  So each component has a "weight" that is applied to it.  Each top level component may contain 1 to many child components that also have a weight associated with it.  All of these components get scored and contribute to the score.  I will attach the presentation that my colleagues and I presented last year at the user conference that has a section about scoring to the geocoding resource center later today.  I hope this helps.

http://resources.arcgis.com/gallery/file/geocoding

Brad


A little behind schedule but the above url now has the uploaded presentation.  Let me know if anyone has any questions.

Brad
0 Kudos
DanC
by
New Contributor II
I am having the exact same results.  I have a composite locator made up of point and centerline data, and addresses that should match on the 2nd locator in the hierarchy are matching to the 4th locator.  When I run the locators individually, it matches correctly to the 2nd locator.  I tried modifying my composite locator fields as Brad mentions, by deleting and re-adding the street, unit, and zone fields, however that did not change my results.  Do I need to apply that fix to all individual locators, or just the composite?



I am using 9.3.1 and 10 locators styles for the composite locator, mostly US Single Address with Unit and Zone.
0 Kudos
BradNiemand
Esri Regular Contributor
It should be just a composite fix and it should only apply to 10.0 participating locators.  Can you attach a screen shot of the composite after the fix?

Brad
0 Kudos