A better way to parse an address?

14751
13
Jump to solution
08-12-2015 08:38 AM
JoeBorgione
MVP Emeritus

I looked at the standardize address tool to see if it would do what I need and it works for most of my address data, but there are a number that don't fit the perfect model.( See Error Trapping )

At any rate, holding the rank of Hack Specialist .1 in the Python Legion, I'm running a series of scripts to parse out address components from a single string address in the form of

1234 S Main ST or 1234 E Olive Branch Dr or 1234 S 300 E

The house number, pre-dir and suf-type/suf-dir aren't too bad, but teasing the street itself out is a little more challenging  as it may be multiple words.  What I've come up with is a series of splits and joins that get the job done, but there has got to be a better way.  Any pointers are appreciated.

Here is what I do:

def myStreetName(inString):
  a = inString.split(' ')
  b = a.pop()                    #takes off suf
  c = ' '.join(a)                  # put it back together
  d = c.split(' ')                 #split it back out again
  e = d.pop(0)                 #takes off housenum
  f = ' '.join(d)                  #put it back together again
  g = f.split(' ')                 # split again
  h = g.pop(0)                 # get rid of pre-dir
  street = ' '.join(g)          #leaves just the street
  return street                 #home free

myStreetName(!fullAddress!)

That should just about do it....
1 Solution

Accepted Solutions
DarrenWiens2
MVP Honored Contributor

Your split string ('a') already contains a usable list of all you need.

If your addresses always take the form:

- first element = house number

- second element = pre-dir

- last element = suf-dir

- street name = everything between second and last element

... you can parse it apart like this:

>>> inString = '1234 E Olive Branch Dr'
... splitString = inString.split(' ')
... a = splitString[0]
... b = splitString[1]
... c = ' '.join(splitString[2:-1]) # everything between 2nd and last element, joined back together with a space
... d = splitString[-1]
... print (a,b,c,d)
...
('1234', 'E', 'Olive Branch', 'Dr')

View solution in original post

13 Replies
BruceHarold
Esri Regular Contributor

Hi Joe

You have outed us on Standardize Addresses having issues at 10.x.

The easiest thing to do is download the 9.x styles and use Standardize Addresses with an old style.

http://www.arcgis.com/home/item.html?id=d36e3c27f12342d3b54f697048c71658

For interest, attached is a script I did for a tool for internal use - to find the parts of an address.  It is far from bulletproof.

Another, canonical, way to standardize addresses (worldwide) is to leverage the World Geocode Service, for example:

http://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/find?text=380%20new%20york%20st%2...

You grab the returned JSON and take it from there.

The only catch is if you're storing the results you must use the forStorage parameter and use a token.

Regards

DarrenWiens2
MVP Honored Contributor

Your split string ('a') already contains a usable list of all you need.

If your addresses always take the form:

- first element = house number

- second element = pre-dir

- last element = suf-dir

- street name = everything between second and last element

... you can parse it apart like this:

>>> inString = '1234 E Olive Branch Dr'
... splitString = inString.split(' ')
... a = splitString[0]
... b = splitString[1]
... c = ' '.join(splitString[2:-1]) # everything between 2nd and last element, joined back together with a space
... d = splitString[-1]
... print (a,b,c,d)
...
('1234', 'E', 'Olive Branch', 'Dr')
DanPatterson_Retired
MVP Emeritus

or

>>> inString = "1234 E Olive Branch Dr"
>>> a = (inString.strip()).split(" ")
>>> addr = a[0]
>>> NSEW = [i for i in a if len(i)==1][0]
>>> pre = "{} {}".format(addr,NSEW)
>>> street = inString.replace(pre,"").strip()
>>> street

eg.. 'Main ST'    'Olive Branch Dr'   '300 E'  (not sure of the last on...just in case there was an E street N

We really should quit playing around....

JoeBorgione
MVP Emeritus

Bruce: Been using 9.3 locators, but now I'm being 'forced' into the more modern world....

Darren-  that's what I'm talking about!  Thanks!

Dan-  not sure I follow:  MAIN, OLIVE BRANCH, and 300 are all the valid StreetNames for the examples.

That should just about do it....
0 Kudos
BruceHarold
Esri Regular Contributor

Standardize Addresses doesn't use a locator, it uses a style, so for the US you should be safe to install the old stuff and carry on.

0 Kudos
DanPatterson_Retired
MVP Emeritus

Ahhh I had assumed 'main st' etc would be valid names..oh well another slice

0 Kudos
MichaelHilstrom
New Contributor

hello,

I am looking for a way to batch address parsing.  Is there a python script?

Mike H

0 Kudos
JoeBorgione
MVP Emeritus

I just use the python code provided above and run it a few times to calculate the values of each field.  I'm sure some young, ambitious, aspiring GIS analyst could work that into a tool that is just a double click away from greatness.  But since I'm just a cranky, gray bearded, old school GIS analyst, I do it my way... 

That should just about do it....
MichaelHilstrom
New Contributor

Joe,

Where is the python script? I cannot see it…

PS:I am new to the discussion groups.

I was able to download a python script last year for exporting attribute tables to excel spread sheets.

Thanks,

Michael Hilstrom, R.G.

Environmental Consultant

ADOA - Risk Management Division l State of Arizona

100 North 15th Avenue, Suite 301, Phoenix, AZ 85007

p: 602.542.2863 | michael.hilstrom@azdoa.gov<mailto:michael.hilstrom@azdoa.gov>

https://staterisk.az.gov<https://staterisk.az.gov/>

How am I doing? Please take a moment to answer a few questions.

https://www.surveymonkey.com/r/VOCRiskLossPrev

0 Kudos