New geocoding engine is terrible.

1486
24
01-26-2011 10:46 AM
DanMarrier
New Contributor II
I really can't figure this one out.  It flies in the face of the hierarchical priority for composite locators that we've been led to believe is in place.

I have a composite locator service that makes use of 9 individual locator services.  Each of those 9 individual locator services has been tested and works as expected when run by itself on a given input address list.

For a specific address in my data set, I know ahead of time that the first locator service in the composite will not find a match when run as a standalone, but the second locator service will, as will some of the subsequent locator services lower in the hierarchy, although the later ones are more prone to errors.

However, when I run this address through the composite locator service, the second locator service is not returning a match.  It ends up defaulting to the fourth locator service in the composite, and that service is actually returning a bad point location because I'm allowing for tied scores to be matched, and it is randomly picking the wrong one.

How on earth can a locator service find a match for an address when run by itself, but not when run in a composite where no locator higher than it in the hierarchy finds a match???

As an additional slap in the face, if I input just an address and a zipcode, and the locator finds the address in a different zipcode, the reported score is 95 out of 100.  Really???  95???  I would expect the default settings of any geocoding service to report a significantly lower score than 95 when the "matched" Zone is completely different than the one I input! 

There also seems to be no heuristic or even common sense when making a match to a different zipcode.  I know of at least one case where the input zipcode has been miskeyed as "02113" instead of "02133".  But due to the highly arbitrary nature of matching tied score records... the "matched" address has a zipcode of "12508"... that's not even in the same state!!!  You'd think that there would be some sort of additional check that if scores are tied and there are different zone values in play, that the one which more closely resembles the input would get a higher priority.

Has anybody else tested a composite service where an individual locator service works for a given address but fails when implemented in the composite?
Tags (2)
0 Kudos
24 Replies
BradNiemand
Esri Regular Contributor
Can you give me a bit more information?  Can you give me the types of locators that you have in the composite and in what order?  Are these locators built with 10.0 locator styles?  What version of ArcGIS are you using?  What reference data were the locators built off of (local data, Tele Atlas, NAVTEQ, etc...) as well as what region (entire US, county, state)?  Also, can you provide me with some sample addresses that you are having issues with or a table of addresses that contains some issue addresses?

Brad
0 Kudos
JoeBorgione
MVP Emeritus
Brad- take a look a couple of threads down in the list; Dan had posted a question regarding Tiger Line data that I responded to.

I'm all ears on this as I haven't taken the plunge into the 10.x pool nor do I use Tiger data.
That should just about do it....
0 Kudos
DanMarrier
New Contributor II
Can you give me a bit more information?  Can you give me the types of locators that you have in the composite and in what order?  Are these locators built with 10.0 locator styles?  What version of ArcGIS are you using?  What reference data were the locators built off of (local data, Tele Atlas, NAVTEQ, etc...) as well as what region (entire US, county, state)?  Also, can you provide me with some sample addresses that you are having issues with or a table of addresses that contains some issue addresses?

Brad


I'm using ArcMap 10.0 (Build 2800).

The following represents the list of individual locators in the composite in top to bottom order:

-- US Address dual ranges, using an E911-information enhanced version of NAVTEQ roads that we developed and maintain internally in SDE.  This locator requires a zipcode as the zone in the input addresses.  It has a spelling sensitivity of 90, and a minimum match score of 100.  The option to match tied candidates is active.

-- US Address dual ranges, using an E911-information enhanced version of NAVTEQ roads that we developed and maintain internally in SDE.  This locator requires a town or city name as the zone in the input addresses.  It has a spelling sensitivity of 90, and a minimum match score of 100.  The option to match tied candidates is active.

-- US Address dual ranges, using a 2009 version of Census TIGER roads.  This locator requires a single address input and a zipcode as the zone in the input addresses.  It has a spelling sensitivity of 90, and a minimum match score of 100.  The option to match tied candidates is active.

-- US Address dual ranges, using the E911-information enhanced version of NAVTEQ roads.  This locator requires a zipcode as the zone in the input addresses.  It has a spelling sensitivity of 80, and a minimum match score of 60.  The option to match tied candidates is active.

-- US Address dual ranges, using the E911-information enhanced version of NAVTEQ roads.  This locator requires a a town or city name as the zone in the input addresses.  It has a spelling sensitivity of 80, and a minimum match score of 60.  The option to match tied candidates is active.

-- A customized dual range locator, using a 2009 version of TIGER roads, as well as the alternate address range table that is distributed with it.  I had to modify an existing template to get this to work (the only templates provided by ESRI allow for an alias name table, but not an alternate source for the ranges), as well as modifying the address range table to match the schema of LFROM, LTO, RFROM, RTO for address ranges, since the census distributes data in a scheme using "FROM House Number", "TO House Number", "Side of Street (L or R)".  I also performed a series of relates/joins to populate L_TOWN and R_TOWN values.  The locator requires a zipcode as the zone in the input addresses.  It has a spelling sensitivity of 80, and a minimum match score of 60.  The option to match tied candidates is active.

-- Using the same customized 2009 TIGER locator as above, is identical in all regards except requires a town or city name as the zone instead of zipcode.

-- Finally a US Address - ZIP 5-Digit based locator, using a set of zipcode centroids and matching only on zipcode.


The rationale for using the same locator multiple times, but flipping between zipcode and town name is because the address lists I receive often contain local, colloquial, neighborhood, or unofficial community names that may not be reflected in the underlying source data.  I do have a lookup table that represents my best approximation of a comprehensive list of unofficial town names in my state, but it is not always appropriate to use, since some unofficial names are the same as official names, and replacing them would lead to the mistake of replacing the correct name with an incorrect one.  Furthermore, some unofficial town names correspond to more than one official town name.  Zipcodes are given a higher priority as they are usually less erroneous, and also don't suffer as much from spelling typos.

The locators using the E911-enhanced NAVTEQ roads were created in Arc 10.  The others were created in Arc 9.  But it's the ones created in 10 that are malfunctioning.  The addresses and reference data are primarily for the state of Massachusetts, but the reference data does extend a little bit beyond the borders into neighboring states.

I noticed the problem I originally stated by geocoding some addresses that were correctly matched using versions of the locators developed in Arc 9, but were not matching using virtually identical locators developed in 10.  I can't readily share the reference or address data because some of it is proprietary/under license agreement, as well as the fact that the statewide datasets are prohibitively large.

For the time being, I am using a workaround, inserting some of the Arc 9 locators into the composite at a level high enough to supersede the locators providing a bad location.

FWIW, I sent a copy of the customized TIGER-based locator to the Danvers regional office to see if they could use it as the basis for a template, but I never heard back.  And my deadlines allow precious little time to conduct more inhouse testing/development.
0 Kudos
BradNiemand
Esri Regular Contributor
Let me try and confirm the issue that you are having.  Are you getting matches for the locators that have a minimum match score of 80?  I would also suggest that you create one locator that has all three, city state and zip, fields mapped.  In the locator itself, it uses the logic that you are trying to introduce in the composite.  See below:
    <multiline_def name="multilineZone">
     <alt>
      <field_ref ref="ZIP"/>
      <elt ref="GenZIP" weight="100"/>
      <field_ref ref="City"/>
      <elt ref="OptCityNoSearch" weight="20"/>
      <field_ref ref="State"/>
      <elt ref="OptStateNoSearch" weight="20"/>
     </alt>
     <alt fallback="true">
      <field_ref ref="City"/>
      <elt ref="City" weight="40"/>
      <field_ref ref="State"/>
      <elt ref="OptState" weight="60"/>
      <field_ref ref="ZIP"/>
      <elt ref="OptZipNoSearch" weight="20"/>
     </alt>


The above section from the locator defines how zones are used for searching and scoring.  In short, the locator will search on zip first and "fallback" to city, state second.  This does apply scoring penalties for the components that are incorrect from each (ie. If city and state are wrong, a penalty will be applied to the zip search).  This is pretty easy to configure to work differently and I can even send you an updated style to help make it function the way that you would like it to.  So if you would like it to search on zip first and "fallback" to city state second but not apply scoring to the other components, I can do that.

Brad
0 Kudos
JamesTanis
New Contributor
I have this bug already logged with ESRI #NIM062866 

If you uninstall SP1 the problem with addresses getting high scores in the wrong zip goes away as for address not matching when using composite locators I had to use 9.3.1 address locators to fix this issue.  You can download them from here and use them in 10.0

http://resources.arcgis.com/gallery/file/geocoding/details?entryID=12D8D400-1422-2418-34B0-4FE1CC06C...
0 Kudos
DanMarrier
New Contributor II
Let me try and confirm the issue that you are having.  Are you getting matches for the locators that have a minimum match score of 80?  I would also suggest that you create one locator that has all three, city state and zip, fields mapped.  In the locator itself, it uses the logic that you are trying to introduce in the composite.  See below:
    <multiline_def name="multilineZone">
     <alt>
      <field_ref ref="ZIP"/>
      <elt ref="GenZIP" weight="100"/>
      <field_ref ref="City"/>
      <elt ref="OptCityNoSearch" weight="20"/>
      <field_ref ref="State"/>
      <elt ref="OptStateNoSearch" weight="20"/>
     </alt>
     <alt fallback="true">
      <field_ref ref="City"/>
      <elt ref="City" weight="40"/>
      <field_ref ref="State"/>
      <elt ref="OptState" weight="60"/>
      <field_ref ref="ZIP"/>
      <elt ref="OptZipNoSearch" weight="20"/>
     </alt>


The above section from the locator defines how zones are used for searching and scoring.  In short, the locator will search on zip first and "fallback" to city, state second.  This does apply scoring penalties for the components that are incorrect from each (ie. If city and state are wrong, a penalty will be applied to the zip search).  This is pretty easy to configure to work differently and I can even send you an updated style to help make it function the way that you would like it to.  So if you would like it to search on zip first and "fallback" to city state second but not apply scoring to the other components, I can do that.

Brad


I've been busy working on a python script to re-structure, populate, and consolidate the TIGER 2010 related address range table to make it usable in geocoding, so haven't had time to pursue this further yet.

To answer your first question, none of the locators I specified in my post have a minimum match score of 80, so I don't know which one(s) you're talking about.

And I specifically do not want locators that use both zip and city... partly because the code snippet you provided looks like gibberish and I don't see how I can institute that without editing the locator file itself outside of the standard "Create new locator" GUI, and partly because I want to be able to use the information that tells me which locator service was used to make a match that gets stored in the output.  It's easier to do that with a locator name than having to scan the standardized address output to see what was used... well, easier for the people I deliver the results to at any rate.  And those scoring penalties are misleading... I purposely split up zipcode and city/town because I know that an unofficial or unincorporated town name may be in an input address list, so while it is still colloquially "correct", it comes up as "wrong" when compared to the formal town names in the underlying reference street data.  There's still the issue of how scoring is applied in SP1 that jamest582 has addressed.  I'm already using 9.3 locators as a workaround... but I seriously hope that the problems stated in my original post are resolved in SP2.
0 Kudos
BradNiemand
Esri Regular Contributor
Excuse my mistake.  I meant to say, are you getting matches for the locators below that have a spelling sensitivity of 80 and a minimum match score of 60?

-- US Address dual ranges, using the E911-information enhanced version of NAVTEQ roads. This locator requires a zipcode as the zone in the input addresses. It has a spelling sensitivity of 80, and a minimum match score of 60. The option to match tied candidates is active.

-- US Address dual ranges, using the E911-information enhanced version of NAVTEQ roads. This locator requires a a town or city name as the zone in the input addresses. It has a spelling sensitivity of 80, and a minimum match score of 60. The option to match tied candidates is active.
0 Kudos
BradNiemand
Esri Regular Contributor
I've been busy working on a python script to re-structure, populate, and consolidate the TIGER 2010 related address range table to make it usable in geocoding, so haven't had time to pursue this further yet.

To answer your first question, none of the locators I specified in my post have a minimum match score of 80, so I don't know which one(s) you're talking about.

And I specifically do not want locators that use both zip and city... partly because the code snippet you provided looks like gibberish and I don't see how I can institute that without editing the locator file itself outside of the standard "Create new locator" GUI, and partly because I want to be able to use the information that tells me which locator service was used to make a match that gets stored in the output.  It's easier to do that with a locator name than having to scan the standardized address output to see what was used... well, easier for the people I deliver the results to at any rate.  And those scoring penalties are misleading... I purposely split up zipcode and city/town because I know that an unofficial or unincorporated town name may be in an input address list, so while it is still colloquially "correct", it comes up as "wrong" when compared to the formal town names in the underlying reference street data.  There's still the issue of how scoring is applied in SP1 that jamest582 has addressed.  I'm already using 9.3 locators as a workaround... but I seriously hope that the problems stated in my original post are resolved in SP2.


I would really like to help you be successful with 10 locators but I am unsure what exactly it is that you need.  I understand that you have an issue with the scoring for the 10 locators but I think that I might be able to provide you with a custom style that handles things the way that you need them.

The problem with trying to create a generic solution is that it is hard to make everyone happy.  The benefit of the ArcGIS 10 locators is that they are very flexible and can be configured quite easily with some effort.  I do understand that you have deadlines and I would not expect you to be able to dedicate a lot of time to figuring this stuff out but I would be more than happy to provide you with a better solution.  I just need to know what your requirements are.
0 Kudos
DanMarrier
New Contributor II
I thought my original post was clear, but it really all bubbles down to this question I asked:

"How on earth can a locator service find a match for an address when run by itself, but not when run in a composite where no locator higher than it in the hierarchy finds a match???"

There is nothing wrong with the individual locators.  They do exactly what I expect them to do, so there's really no need to be merging zipcode and town information into single locators, which is what your response seemed to imply.  I just want to know why an individual locator nested inside a composite can find a match for an address when used by itself, but not when inside the composite and no other locator higher up in the composite hierarchy has found a match.  And for a fix to be made to this problem.

But I also understand that unless you can reproduce problems on your end, nothing can/will be done.  I will try to find the time to extract a sample if I can, but I was hoping somebody else may have experienced the same problem and could at least validate it as a known issue.
0 Kudos