v10 match score algorithm seems less sensitive

625
4
03-01-2011 05:29 AM
KarynBackus
New Contributor
The match scoring algorithm in v10 is different from 9.x.  I'm not opposed to the change, but I was surprised to find that it did not approximate the 9.x versions.

In 9.x, 80% was a reasonable threshold for high quality.  This threshold would prevent matching to incorrect zones and all elements had to be present, but some flexibility remained for matching addresses with differing street types (Rd vs Dr) or for spelling sensitivity. 

In v10, I've had to change my match threshold to 100% to prevent matching cases that are clearly incorrect but scoring mid- or high-90s.  However, raising the threshold to 100% match voids the use of the spelling sensitivity which is a valuable asset for match success. 

Thus, I am finding I will need to make custom edits to the match algorithm.  Even though I have read the white paper on customizing the rule base for v10, I do not want to make these edits without guidance for fear of unforeseen consequences. 

Has anyone else successfully made edits to the scoring algorithm to meet their custom needs?

Karyn
Tags (2)
0 Kudos
4 Replies
KarynBackus
New Contributor
As a specific example:  110 N Main St matches to 110 E Main St at 97%

I do not want a N Main St to match to an E Main St and I would have thought the penalty for a different pre-directional would be greater.
0 Kudos
TracyGarrison
New Contributor III
The match scoring algorithm in v10 is different from 9.x.  I'm not opposed to the change, but I was surprised to find that it did not approximate the 9.x versions.

In 9.x, 80% was a reasonable threshold for high quality.  This threshold would prevent matching to incorrect zones and all elements had to be present, but some flexibility remained for matching addresses with differing street types (Rd vs Dr) or for spelling sensitivity. 

In v10, I've had to change my match threshold to 100% to prevent matching cases that are clearly incorrect but scoring mid- or high-90s.  However, raising the threshold to 100% match voids the use of the spelling sensitivity which is a valuable asset for match success. 

Thus, I am finding I will need to make custom edits to the match algorithm.  Even though I have read the white paper on customizing the rule base for v10, I do not want to make these edits without guidance for fear of unforeseen consequences. 

Has anyone else successfully made edits to the scoring algorithm to meet their custom needs?

Karyn


Karyn,

I am also studying on how to adjust the scoring my editing the locator.  I have yet to modify the different parameters yet.  Have you gotten anywhere with this yet?

Thanks.
0 Kudos
KarynBackus
New Contributor
I contacted ESRI tech support because I was hesitant to edit the .lot file myself. I needed ESRI to make several customizations to the .lot file to improve geocode success for CT addresses. As part of it, ESRI adjusted the score weights for some of the address elements. In looking at their changes, it appears that they doubled the score weights for the prefix, suftype, and suffix. This seems to have increased the penalty when these fields do not match.

Since these edits were made in addition to some others, I can't say what their individual impact was. But upon review of the match scores, I was able to determine that a 90% match score gives me the match quality that I was looking for, where previously it was closer to 95%. With the greater weights for the elements, a 90% works better for me because it allows for penalties due to spelling sensitivity that were lost with a 95% criteria.

From the customized ESRI file:
<!--Street name elements and their assigned weights-->
<def name="FullStreetName">
<alt>
<elt ref="prefix" weight="10" stan_weight="11" pre_separator="required" post_separator="required"/>
<elt ref="pre_type_no_sthwy" match_as="pretype" weight="6" stan_weight="1000" />
<elt ref="StName" weight="70" stan_weight="10" pre_separator="required" post_separator="required"/>
<elt ref="suftype" weight="14" stan_weight="1000"/>
<elt ref="suffix" weight="10" stan_weight="15" pre_separator="required"/>
</alt>
<alt fallback="true">
<elt ref="prefix" weight="10" stan_weight="11" pre_separator="required" post_separator="required"/>
<elt ref="pre_type_sthwy" match_as="pretype" weight="6" stan_weight="2000" />
<elt ref="OptHyphen" weight="0"/>
<elt ref="StName" weight="70" stan_weight="10" pre_separator="optional" post_separator="required"/>
<elt ref="suftype" weight="14" stan_weight="1000"/>
<elt ref="suffix" weight="10" stan_weight="15" pre_separator="required"/>
</alt>
</def>

Hope this helps. -Karyn
0 Kudos
AndrewDecker
New Contributor II
I'm just curious if I'm understanding right...

I didn't think there were any "penalties" in 10.  I thought scoring was only additive.  Missing attributes are not penalized and scoring weights are generated from address element matching.
0 Kudos