Locator Style XML Questions

DavidWarnock · ‎08-28-2014

I have some questions about writing/modifying locator styles. The first thing I want to say is that I have read the document "Customising Locators in ArcGIS 10" closely, and then answers I seek are not in there. To my knowledge, that is the only official documentation on ArcGIS 10 locator .lot.xml files (excluding the documentation in the schema files). I've also scoured the web for answers to no avail.

So here are my questions.

The <std_elt> element is documented with "Not yet supported. Reserved for future use". What does this tag do? The fact that it is found in the <default_input> tags in the standard .lot.xml files makes me think it really does do something, and it's just undocumented.

TL;DR: What is the <std_elt> element for?

The search_context attribute has "TODO: Documentation" where the documentation should be. The document "Customising Locators in ArcGIS 10" has this to say:

"The content in braces is a hint that a particular search context applies for the element. The engine manages sets of tests for elements within search contexts; these are discussed later in this document."

Later in the document:

"The source style used the "ZIPSearch" search context for Postal, but we will use "PostalSearch." There is also a search context called "CitySearch." These search contexts are defined by the engine and manage a set of tests for the element."

This doesn't really explain what search contexts are available, or how they work. I'm also unclear what the relationship is between the search_context attribute and the <result> element.

TL;DR: What is the search_context attribute for, and how does it relate to the <result> element?

Which parts of the regular expression syntax are actually supported? "Customising Locators in ArcGIS 10" p.56 (see below) states that the expression syntax is limited to 6 items representing a very limited subset of standard regex syntax.

However, I've seen several examples that use functionality not present in this list. For example, page 60 of the same document shows the code snippet: <alt>`[0-9]{4}`</alt>, which shouldn't work if the information on p.52 is correct.

TL;DR: Is there a more accurate description of supported regex functionality for locator styles?

If I have a <ref_data_style> containing a <multiline_grammar>, does this fall back to the top-level <multiline_grammar> if no candidates/matches are found? Does a similar thing happen for <inputs><default_input>? I've had some odd results in my tests where I believe that even if I deliberately break my locator, it still finds things in my data that are exact textual matches, i.e. "AB C" matches "AB C" but not "ABC".

TL;DR: What other work is carried out when geocoding that isn't specified in the <inputs> and <multiline_grammar> elements of the <ref_data_style>?

In <mapping_schema><standardization> tags, are they only used when standardising addresses? "Customising Locators in ArcGIS 10" p.42 states that in order to be valid, a locator style must ensure that "Mapping Schema standardization includes all schema fields." I haven't done this for some of my input, and things seem to work OK.

TL;DR: If I don't plan on standardising address data, do I need to create the <standardization> element in my mapping schema?

Finally, I'm using Python to generate my locators and run tests (specifically using the GeocodeAddresses_geocoding() function). However, I can't see how to make a multiline query from Python, and I believe that this is what I actually want to test. According to this stack-exchange post, this isn't possible. Is this correct, or am I misunderstanding the how multiline grammar works?

TL;DR: Can I perform a multiline query in Python?

Thanks in advance, please let me know if I need to clarify any of my questions.

-David

SergeyIvanenko · ‎09-03-2014

David,

The idea behind std_elt was to map standard-specific name (FGDC, OASIS) to locator inputs - to transparently handle addresses formatted according to those standards. This is not yet supported.
search_context defines a named search context that elements of the corresponding text will be associated with; this context can then be used with a result tag. For instance:

<field_ref ref="Street"/>

<search_value ref="leftStreet" original_values="true"/>

</parameter>

<search_value ref="rightStreet" original_values="true"/>

</parameter>

</method>

<format_ref ref="format_intersections"/>

</result>
</alt>

- here is the grammar for intersection addresses (“Main St & 1^st Ave”). The geocoding process needs to collect two sets of candidates and then find if any of them intersect (have common nodes). The elements that will be included in “FullStreetName” and affect the search context (i.e. StName) for the “left” part of the address will be associated with “leftStreet” search context, and for the “right” part – with the “rightStreet” context. Then, after the parse is done, result of this parse will be calculated using “intersection” method, which in turn will receive the search_value (result of the search using named search_context) for “leftStreet” and “rightStreet”, respectively.

<alt>`[0-9]{4}`</alt> is not supported, unfortunately it’s an error in the whitepaper. Regular expressions should be used with caution, as sophisticated expressions tend to affect parsing process in non-obvious way (think of it as a parser within a parser)
If <ref_data_style> contains <multiline_grammar>, it replaces the top-level <multiline_grammar>. The same applies to <inputs> and <default_input>.
<mapping_schema><standardization> section is only used for standardization. The standardization is invoked in two workflows:
- when locator is built (reference data is processed as it is being submitted to the locator build processes); this is governed by <build> subsection
- when StandardizeAddresses geoprocessing tool is used (<tool> subsection)
GeocodeAddresses_geocoding() works on a table of addresses. Single addresses are not supported in Python.

Sergey