Select to view content in your preferred language

Geocoding Postcode with space (Canada)

1154
5
Jump to solution
10-30-2012 05:29 AM
Jean-BernardGariépy
Occasional Contributor
Hi!

My goal is to geocode postcode with or without space inside : X0X0X0 or X0X 0X0
I'm using ArcGIS 10.0

My reference table on which I'm building the geocoder has the postcode field formatted without space : X0X0X0

I've tried 4 different approach :

- 1 - XML Zone Section : the following geocode X0X0X0 properly, but doesn't work with space X0X 0X0

from here http://forums.arcgis.com/threads/22799-Customize-address-locators-in-ArcGIS-10 I tried the following... without success...

        <def name="GenZIP">               <alt ref="Postcode" />         </def>               <def name="Postcode">           <alt>             <elt ref="PC_A" weight="50"/>             <elt ref="PC_B" pre_separator="optional" weight="50"/>           </alt>         </def>         <def name="PC_A">             <alt>`[a-zA-Z][0-9][a-zA-Z]`</alt>         </def>         <def name="PC_B">           <alt>`[0-9][a-zA-Z][0-9]`</alt>         </def>


- 2 - XML Zone Section : the following geocode X0X0X0 properly, but doesn't work with space X0X 0X0
Note that the white space bellow between bracket is writen as follow &#032 ;

        <def name="GenZIP">               <alt ref="Postcode" />         </def>               <def name="Postcode">           <alt>             <alt>`[a-zA-Z][0-9][a-zA-Z][0-9][a-zA-Z][0-9]`</alt>             <alt>`[a-zA-Z][0-9][a-zA-Z][ ][0-9][a-zA-Z][0-9]`</alt>           </alt>         </def>


- 3 - Alias-name table

It work well... until you publish your service on ArcGIS Server 😕 Then it looses all reference to any Alias-Name table. ESRI says you need to move your alias names into the xml when you publish it.
http://resources.arcgis.com/en/help/main/10.1/index.html#//00sq000000sr000000

- 4 - XML alias_list

Defining an alias_list in the Aliases section worked as well. The only major problem is that adding all postal code for 1 province made the XML file reach about 12MB.

Editing, saving, loading, creating geocoder, etc.... every single task now takes for ever to accomplish because of the file size. And I didn't even think about adding our street-name and place-name alias tables to it.

This is absolutly not an option.

You would think that storing this kind for huge alias_list in a database would be common sense... but not with ArcGIS SERVER (Ironic isn't it? I guess not)

- 5 -

So I'm back to step - 1 -... with trying to figure out how to geocode a postcode with a space using the XML and some regular expression. They're must be a way to make it work. I just can't figure it out.

Any hint would be appreciated.
Tags (2)
0 Kudos
1 Solution

Accepted Solutions
Jean-BernardGariépy
Occasional Contributor
Good news !!

With the help of some skilled friends we manage to come up with something that works just perfectly !!
And this, without any alias list !!!

    <section desc="Zones"> ...         <def name="GenPostal">           <alt ref="FSALDU" />           <alt>             <elt ref="Fsa"/>             <elt ref="OptLdu"/>           </alt>         </def>          <def name="Fsa">           <alt>`[a-zA-Z][0-9][a-zA-Z]`</alt>         </def>          <def name="Ldu">           <alt>`[0-9][a-zA-Z][0-9]`</alt>           <alt>`[0-9][a-zA-Z]`</alt>           <alt>`[0-9]`</alt>         </def>          <def name="FSALDU">           <alt>             <elt ref="Fsa" weight="100"/>             <elt ref="OptHyphen" weight="0"/>             <elt ref="Ldu" weight="50" pre_separator="optional"/>           </alt>         </def>          <def name="OptLdu">           <alt/>           <alt fallback="true">             <elt ref="OptHyphen" weight="0"/>             <elt ref="Ldu" weight="50" pre_separator="optional"/>           </alt>             <alt fallback="true">                         <elt ref="word" pre_separator="none"/>             </alt>         </def>     </section>        <section desc="ZonesNoSearch"> ...         <def name="OptPostalNoSearch">           <alt>             <elt ref="GenPostalNoSearch"/>           </alt>           <alt/>         </def>         <def name="GenPostalNoSearch">           <alt>             <elt ref="FSANoSearch" match_as="Fsa"/>             <elt ref="OptLdu"/>           </alt>  <alt fallback="true">   <elt ref="FSANoSearch" match_as="Fsa" />   <elt ref="word" pre_separator="none"/>  </alt>         </def>         <def name="FSANoSearch">           <alt>`[a-zA-Z][0-9][a-zA-Z]`</alt>         </def>        </section>

View solution in original post

0 Kudos
5 Replies
KimOllivier
Honored Contributor
What about this simple expression to include an optional space?

<def name="GenZIP">
              <alt ref="Postcode" />
        </def>
        <def name="Postcode">
          <alt>
           `[a-zA-Z][0-9][a-zA-Z] ?[0-9][a-zA-Z][0-9]`
          </alt>
        </def>


I am worried about the large alias problem for performance. What takes so long?
Are you using an include statement or are you actually pasting in the alternatives?
Do you duplicate the reference features to handle the same (only different by a space) postcodes?
The key point is that the reference data has to have both forms, all the geocode does is score them when found, based on the regular expression filter.

Placename alias tables are really for use with a set of Vanity Addresses. These are well known landmarks used instead of a real address. The alternative way to handle these would be to create a composite locator with the placenames as points and keep the real address as an alternative field.
They are not really designed for parsing address components for matching and scoring.
0 Kudos
Jean-BernardGariépy
Occasional Contributor
Thanks for the reply!

I tried your suggestion without success. It wasn't clear to me what you ment by " ?"... so I tried " ?", "?", "&#032 ;", "&#032 ;?" (without the space before the semicolon).
What ever I tried it does geocode without space, but doesn't find anything with a space 😕

I feel like any regular expression used in my "Postcode" def is not being considered by the geocoder...

Any idea ?

What about this simple expression to include an optional space?

<def name="GenZIP">
              <alt ref="Postcode" />
        </def>
        <def name="Postcode">
          <alt>
           `[a-zA-Z][0-9][a-zA-Z] ?[0-9][a-zA-Z][0-9]`
          </alt>
        </def>

I am worried about the large alias problem for performance. What takes so long?
Are you using an include statement or are you actually pasting in the alternatives?
Do you duplicate the reference features to handle the same (only different by a space) postcodes?
The key point is that the reference data has to have both forms, all the geocode does is score them when found, based on the regular expression filter.

I wrote the alternatives as follow in the XML document defining an alias_list in the Aliases section. I didn't do any include of an external file. But as I mentioned earlier, its not a solution as editing, saving, loading, creating geocoder, etc.... now takes for ever to accomplish because of the file size (12MB).
          <alias_def>
            <alt>X0X0X0</alt>
            <alt>X0X 0X0</alt>
          </alias_def>

How would I include an external file in the Alias Section ? But to overall slow loading problem with remain the same as ArcGIS still need to parse the linked xml file. Don't you think?


Placename alias tables are really for use with a set of Vanity Addresses. These are well known landmarks used instead of a real address. The alternative way to handle these would be to create a composite locator with the placenames as points and keep the real address as an alternative field.
They are not really designed for parsing address components for matching and scoring.


Thanks for the alias table tip!
0 Kudos
KimOllivier
Honored Contributor
The space is a valid character and the questionmark is a metacharacter. So <space>? means an optional space character in the regular expression. You cannot put an encoded html string in a regular expression to mean a space because only metacharacters defined for regular expressions have a meaning other than their literal value. '&#032' means exactly those 5 literal values.

The one regular expression will match postcodes with or without an included space, but it won't score unless it finds the exact string in the reference point field.

To get a score you would have to have aliases (and set up postcodes to have aliases) or simply duplicate every point with the alternative postcode format. You can include the generated alias XML file using an XML include statement as page 74 of the white paper.

Can't Canada decide on a standard postcode format? 8-)

It is very hard to debug the geocoder, but I have found if you match a few records and then use the AddressInspector to rematch, you can see how the address is being parsed on the right hand side. This has given me clues such as street types in a name being parsed as units. eg Green Valley Road ended up as Green Valley, Unit Road
0 Kudos
KimOllivier
Honored Contributor
Correction: I note that the ? metacharacter is not supported in regular expressions in the locator.
0 Kudos
Jean-BernardGariépy
Occasional Contributor
Good news !!

With the help of some skilled friends we manage to come up with something that works just perfectly !!
And this, without any alias list !!!

    <section desc="Zones"> ...         <def name="GenPostal">           <alt ref="FSALDU" />           <alt>             <elt ref="Fsa"/>             <elt ref="OptLdu"/>           </alt>         </def>          <def name="Fsa">           <alt>`[a-zA-Z][0-9][a-zA-Z]`</alt>         </def>          <def name="Ldu">           <alt>`[0-9][a-zA-Z][0-9]`</alt>           <alt>`[0-9][a-zA-Z]`</alt>           <alt>`[0-9]`</alt>         </def>          <def name="FSALDU">           <alt>             <elt ref="Fsa" weight="100"/>             <elt ref="OptHyphen" weight="0"/>             <elt ref="Ldu" weight="50" pre_separator="optional"/>           </alt>         </def>          <def name="OptLdu">           <alt/>           <alt fallback="true">             <elt ref="OptHyphen" weight="0"/>             <elt ref="Ldu" weight="50" pre_separator="optional"/>           </alt>             <alt fallback="true">                         <elt ref="word" pre_separator="none"/>             </alt>         </def>     </section>        <section desc="ZonesNoSearch"> ...         <def name="OptPostalNoSearch">           <alt>             <elt ref="GenPostalNoSearch"/>           </alt>           <alt/>         </def>         <def name="GenPostalNoSearch">           <alt>             <elt ref="FSANoSearch" match_as="Fsa"/>             <elt ref="OptLdu"/>           </alt>  <alt fallback="true">   <elt ref="FSANoSearch" match_as="Fsa" />   <elt ref="word" pre_separator="none"/>  </alt>         </def>         <def name="FSANoSearch">           <alt>`[a-zA-Z][0-9][a-zA-Z]`</alt>         </def>        </section>
0 Kudos