A better way to parse an address?

9964
13
Jump to solution
08-12-2015 08:38 AM
Highlighted
MVP Esteemed Contributor

I looked at the standardize address tool to see if it would do what I need and it works for most of my address data, but there are a number that don't fit the perfect model.( See Error Trapping )

At any rate, holding the rank of Hack Specialist .1 in the Python Legion, I'm running a series of scripts to parse out address components from a single string address in the form of

1234 S Main ST or 1234 E Olive Branch Dr or 1234 S 300 E

The house number, pre-dir and suf-type/suf-dir aren't too bad, but teasing the street itself out is a little more challenging  as it may be multiple words.  What I've come up with is a series of splits and joins that get the job done, but there has got to be a better way.  Any pointers are appreciated.

Here is what I do:

def myStreetName(inString):

  a = inString.split(' ')

  b = a.pop()                    #takes off suf

  c = ' '.join(a)                  # put it back together

  d = c.split(' ')                 #split it back out again

  e = d.pop(0)                 #takes off housenum

  f = ' '.join(d)                  #put it back together again

  g = f.split(' ')                 # split again

  h = g.pop(0)                 # get rid of pre-dir

  street = ' '.join(g)          #leaves just the street

  return street                 #home free

myStreetName(!fullAddress!)

1 Solution

Accepted Solutions
Highlighted
MVP Honored Contributor

Your split string ('a') already contains a usable list of all you need.

If your addresses always take the form:

- first element = house number

- second element = pre-dir

- last element = suf-dir

- street name = everything between second and last element

... you can parse it apart like this:

>>> inString = '1234 E Olive Branch Dr'

... splitString = inString.split(' ')

... a = splitString[0]

... b = splitString[1]

... c = ' '.join(splitString[2:-1]) # everything between 2nd and last element, joined back together with a space

... d = splitString[-1]

... print (a,b,c,d)

...

('1234', 'E', 'Olive Branch', 'Dr')

View solution in original post

13 Replies
Highlighted
Esri Regular Contributor

Hi Joe

You have outed us on Standardize Addresses having issues at 10.x.

The easiest thing to do is download the 9.x styles and use Standardize Addresses with an old style.

http://www.arcgis.com/home/item.html?id=d36e3c27f12342d3b54f697048c71658

For interest, attached is a script I did for a tool for internal use - to find the parts of an address.  It is far from bulletproof.

Another, canonical, way to standardize addresses (worldwide) is to leverage the World Geocode Service, for example:

http://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/find?text=380%20new%20york%20st%2...

You grab the returned JSON and take it from there.

The only catch is if you're storing the results you must use the forStorage parameter and use a token.

Regards

Highlighted
MVP Honored Contributor

Your split string ('a') already contains a usable list of all you need.

If your addresses always take the form:

- first element = house number

- second element = pre-dir

- last element = suf-dir

- street name = everything between second and last element

... you can parse it apart like this:

>>> inString = '1234 E Olive Branch Dr'

... splitString = inString.split(' ')

... a = splitString[0]

... b = splitString[1]

... c = ' '.join(splitString[2:-1]) # everything between 2nd and last element, joined back together with a space

... d = splitString[-1]

... print (a,b,c,d)

...

('1234', 'E', 'Olive Branch', 'Dr')

View solution in original post

Highlighted
MVP Esteemed Contributor

or

>>> inString = "1234 E Olive Branch Dr"

>>> a = (inString.strip()).split(" ")

>>> addr = a[0]

>>> NSEW = [i for i in a if len(i)==1][0]

>>> pre = "{} {}".format(addr,NSEW)

>>> street = inString.replace(pre,"").strip()

>>> street

eg.. 'Main ST'    'Olive Branch Dr'   '300 E'  (not sure of the last on...just in case there was an E street N

We really should quit playing around....

Highlighted
MVP Esteemed Contributor

Bruce: Been using 9.3 locators, but now I'm being 'forced' into the more modern world....

Darren-  that's what I'm talking about!  Thanks!

Dan-  not sure I follow:  MAIN, OLIVE BRANCH, and 300 are all the valid StreetNames for the examples.

Reply
0 Kudos
Highlighted
Esri Regular Contributor

Standardize Addresses doesn't use a locator, it uses a style, so for the US you should be safe to install the old stuff and carry on.

Reply
0 Kudos
Highlighted
MVP Esteemed Contributor

Ahhh I had assumed 'main st' etc would be valid names..oh well another slice

Reply
0 Kudos
Highlighted
New Contributor

hello,

I am looking for a way to batch address parsing.  Is there a python script?

Mike H

Reply
0 Kudos
Highlighted
MVP Esteemed Contributor

I just use the python code provided above and run it a few times to calculate the values of each field.  I'm sure some young, ambitious, aspiring GIS analyst could work that into a tool that is just a double click away from greatness.  But since I'm just a cranky, gray bearded, old school GIS analyst, I do it my way... 

Highlighted
New Contributor

Joe,

Where is the python script? I cannot see it…

PS:I am new to the discussion groups.

I was able to download a python script last year for exporting attribute tables to excel spread sheets.

Thanks,

Michael Hilstrom, R.G.

Environmental Consultant

ADOA - Risk Management Division l State of Arizona

100 North 15th Avenue, Suite 301, Phoenix, AZ 85007

p: 602.542.2863 | michael.hilstrom@azdoa.gov<mailto:michael.hilstrom@azdoa.gov>

https://staterisk.az.gov<https://staterisk.az.gov/>

How am I doing? Please take a moment to answer a few questions.

https://www.surveymonkey.com/r/VOCRiskLossPrev

Reply
0 Kudos