Geocoding Score Documentation:  How is the score value determined?

10468
15
08-16-2012 11:26 AM
NathanLowry
New Contributor III
I know geocoding is largely controled by the Locators .xml files in C:\Program Files\ArcGIS\Desktop10.0\Locators.  However, where is it written (documented in white papers, technical articles, etc.), how the geocoding Score value is calculated?

I'd like to reference the documentation in metadata.

Thanks,

Nathan Lowry, GISP

Governor's Office of Information Technology
GIS Outreach Coordinator, State of Colorado
601 East 18th Avenue, Suite 220, Denver, CO 80203-1494
303.764.7801 720.402.4462 Cell nathan.lowry@state.co.us, http://www.colorado.gov/oit

How am I doing?  Please contact my manager Jon Gottsegen (Jon.Gottsegen@state.co.us) for comments or questions.
Tags (2)
0 Kudos
15 Replies
TimSpangler
New Contributor II
Nathan;

Check the whitepaper from esri   http://www.esri.com/library/whitepapers/pdfs/geoservices-rest-spec.pdf

This may help explain the process.
0 Kudos
NathanLowry
New Contributor III

Thanks, Tim - but the GeoServices REST Specification provides information on how geocoding match scores may be returned.  It doesn't tell how a score is determined.

  • Where do I find out how the score for a geocoding result is determined?

The GeoServices REST Specification provides information on how geocoding match scores may be returned from a REST but does not describe how geocoding match scores are determined with detail.

D.E. Wright's response at How are geocoding scores calculated in ArcGIS? at GIS Stack Exchange gives some insight, although it is quite general.

The MATCH commands in the ArcGIS 9.2 Geocoding Rule Base Developer Guide looks close.  The guide is introduced at Accommodating changes in the geocoding rule base files in the ArcGIS Desktop 9.3 Help, as well as in EDN documentation and other locations.

Since many things changed with the geocoding engine upon the release of ArcGIS Desktop 10.0 and 10.1 SP1 (particularly) it is not clear how similar (or how much of) the logic from 9.x geocoding is used in products today.

Resources listed at the Good resources on geocoding algorithms on StackExchange are interesting, but (may? or) may not answer the specific question on how match scores are calculated in present ArcGIS address location geocoding engine(s).

A response at the Geocoding Developer's Kit for ArcGIS 10.0? thread on GeoNet refers to a technical paper whose URL is no longer valid (at least as of today).

Sincerely, Nathan Lowry

GIS Outreach Coordinator

P 303.764.7801 |  F 303.764.7764

601 East 18th Avenue, Suite 220, Desk D-23, Denver, CO 80203-1494

nathan.lowry@state.co.us  |  www.colorado.gov/oit

How am I doing?  Please contact my manager Jon Gottsegen (jon.gottsegen@state.co.us) for comments or questions.

0 Kudos
BruceHarold
Esri Regular Contributor

Hello

Score calculation is not documented in detail, but I can give you a thumbnail.

If you open USAddress.lot.xml in Firefox from its installed location at file:///C:/Program Files (x86)/ArcGIS/Desktop10.<version>/Locators you will see a navigable tree.

In Top Level Elements navigate to FullNormalAddress; the superscript numbers for NormalAddress (70) and Zone (30) are the relative weights for score contributions from those elements.  Coincidentally they sum to 100 but only the relative weight is relevant.

Navigating further from NormalAddress you will see 70/100 of the score is contributed 15/75 and 60/75 by House and FullStreetName respectively, where 75 is the sum of the weights, and further down you can see the elements prefix (5/92), pretype (6/92), StName (70/92), suftype (6/92) and suffix (5/92) weights where 92 is the sum of those weights.

An individual score for any lowest level element (like how to calculate a score contribution from an imperfect street name) may be determined by the Spelling/Scoring section of the XML file if an anticipated spelling correction is required to match the reference data, or by a proprietary algorithm for unanticipated spelling errors or noise or repeated characters, as when you have keybounce.

Scores are weight summed, with percentage normalization, from the bottom up.  Missing elements do not penalize a score, they simply do not contribute.

Regards

NathanLowry
New Contributor III

Thank you, Bruce! I'll cross post to How are geocoding scores calculated in ArcGIS? at GIS Stack Exchange.

- Nathan

0 Kudos
JianLiu
New Contributor III

Bruce Harold,

Bruce,

Thanks for the insight, this is helpful! I have also been trying to understand the internal algorithm used for ESRI geocoding.

One thing I find strange is that when there are multiple candidates found for an address, the geocoder will pick whatever the first one, instead of the "best" one.

For example, a search for address "2725 30TH STREET SE, Washington DC" will get a match at "2725 30th St NE, Washington, District of Columbia, 20018" with a score of 92.43, when geocoded using ArcMap. The match is wrong, due to the street directional. Of course, a manual rematch will find a 100 match, but this won't be feasible for batch geocoding.

Below shows all candidates returned by findAddressCandidates REST API, but geocodeAddresses will only return the first one that is in fact wrong. The desktop geocoder is the same as geocodeAddresses. I imagine this is done for performance/speed reasons? But this is not right. What can we do to avoid the wrong matches? One options would be to redo the filtering algorithm by picking the biggest score if using findAddressCandidates. But anything for geocodeAddresses? Thanks and it's much appreciated!

Jian

####search for address "2725 30TH STREET SE, Washington DC", return by findAddressCandidates:#####

Address Candidates: (# address candidates : 6)

Shape:

Point:

X:

    • -76.9664981241894

Y:

      38.92517585657543

Score: 92.43

Address: 2725 30th St NE, Washington, District of Columbia, 20018


Shape:

Point:

X:

      -76.965586314324

Y:

      38.85502946852702

Score: 100.0

Address: 2725 30th St SE, Washington, District of Columbia, 20020


Shape:

Point:

X:

      -76.96649984530907

Y:

      38.92482786588191

Score: 92.43

Address: 2725 30th St NE, Washington, District of Columbia, 20018


Shape:

Point:

X:

      -77.06011761652377

Y:

      38.92385066586513

Score: 92.43

Address: 2725 30th St NW, Washington, District of Columbia, 20008


Shape:

Point:

X:

      -76.96636920394947

Y:

      38.920916363073836

Score: 83.35

Address: 2725 30th Pl NE, Washington, District of Columbia, 20018


Shape:

Point:

X:

      -76.96552627725934

Y:

    • 38.85500159493757

Score: 79.0

Address: 2726 30th St SE, Washington, District of Columbia, 20020

BruceHarold
Esri Regular Contributor

Hi

I can't reproduce what you're seeing, see here:

http://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?SingleLine=...

Are you using Streetmap Premium by any chance?  It is possible the candidate order was influenced by the structure of the composite locator (Point address with acceptable score elevated above street address with perfect score).

regards

0 Kudos
JianLiu
New Contributor III

Bruce,

Yes, you are exactly right. Sorry I forgot to mention we are using a composite locator from Streetmap Premium. It's exactly that "Point address with acceptable score elevated above street address with perfect score" as you said.

How can we avoid that to happen? I see the world locator seems to also use a "composite" locator and it does a good job there, maybe it sets streetAddress over pointAddress, or it does some extra work elsewhere? We do want pointAddress over streetAddress though since pointAddress is theoretically more accurate (or precise), although it's not the case here. Would you please share more insights on this? Thank you and much appreciated.

Jian

BradNiemand
Esri Regular Contributor

Jian,

There are a couple questions here but I will try to explain.

1. The online service has some additional sorting of candidates after the fact to make sure the best candidate is always first.  This is why you see the correct first result as a StreetAddress match as opposed to a PointAddress match.

2. What you can do is update the Minimum Candidate and Minimum Match Scores for the PointAddress locator to something a bit higher (I suggest 93).  This will allow for only high score candidates to get associated with the PointAddress locator.

Let me know if you have any additional questions.

Brad

0 Kudos
JianLiu
New Contributor III

Brad,

Thanks for the reply!

1. It's good to know. So you mean the world geocoder "findAddressCandidates" is different from our "findAddressCandidates" out of the box,  (extra sorting after the fact)? Why cannot we get the same? It will be even better if "geocodeAddresses" does sorting after the fact too. But of course then some users might complain about not honoring the priority of the first locator if both addresses are "acceptable"....Therefore I guess for now we just have to mimic world geocoder to do sorting after the fact on our side?

2. I don't think I will do this since it's a batch geocoder we use to geocode thousands of addresses every day, and I don't want to tweak the setting just for these "special" cases at a risk of breaking others.

3. I guess we could give much more weight to the street directional in the .xml file? Then that's a lot of digging and testing there...

Thanks a lot for the input!

Jian 

0 Kudos