topic Data Interop Ext - Parsing a String Name in Data Management Questions

Data Interop Ext - Parsing a String Name

NelsonDe_Miranda — Fri, 30 Jul 2010 12:35:36 GMT

We are building a series of spatial ETL tools to help our clean up some of the data we have been receiving. I have run into a problem where each road segment name takes the following form:

"1500 W Saint George St"
"1400 W Saint George St"
"1300 W Saint George St"

Using the string seacher I am able to identify all the names that begin with a numeric character using ^[0-9]. The problem is when the matched names are retured (those starting with a numeric value), I am unable to retain the later portion of the string from the end of the numeric variable forward.

In addition to this, the numbers are not always in the same format, for example some street names are listed as such:

"-1 George St."
"500-600 George St"
"Nanaimo Ave"

My idea is to combine a series of string searchers to ensure that I capture all the variables that begin with symbols or numbers and then use the space following those features to seperate the name out.

Unfourtunately I have been unsucessfull in doing so.

Thanks in advance,

Nelson

Re: Data Interop Ext - Parsing a String Name

BruceHarold — Mon, 02 Aug 2010 15:10:55 GMT

Hi Nelson

Welcome to the arcane world of regular expressions!
You are going to need to build a more complex regular expression definition to pick up the address components. For example this pattern parses the case "500-600 George St":

([0-9]+)(-*)([0-9]*) ([a-z ][A-Z ]+)

You will then need to grab the parts from the resulting matched_parts list:

`_matched_parts{0}' has value `500'
`_matched_parts{1}' has value `-'
`_matched_parts{2}' has value `600'
`_matched_parts{3}' has value `George St'

Regards

Re: Data Interop Ext - Parsing a String Name

NelsonDe_Miranda — Tue, 03 Aug 2010 11:57:18 GMT

Perfect!

I tried some regular expressions and couldn't get them to return what I wanted. Now I see what I was doing wrong.

Thanks!

- Nelson