A better way to parse an address?

JoeBorgione · ‎08-12-2015

I looked at the standardize address tool to see if it would do what I need and it works for most of my address data, but there are a number that don't fit the perfect model.( See Error Trapping )

At any rate, holding the rank of Hack Specialist .1 in the Python Legion, I'm running a series of scripts to parse out address components from a single string address in the form of

1234 S Main ST or 1234 E Olive Branch Dr or 1234 S 300 E.

The house number, pre-dir and suf-type/suf-dir aren't too bad, but teasing the street itself out is a little more challenging as it may be multiple words. What I've come up with is a series of splits and joins that get the job done, but there has got to be a better way. Any pointers are appreciated.

Here is what I do:

def myStreetName(inString):
  a = inString.split(' ')
  b = a.pop()                    #takes off suf
  c = ' '.join(a)                  # put it back together
  d = c.split(' ')                 #split it back out again
  e = d.pop(0)                 #takes off housenum
  f = ' '.join(d)                  #put it back together again
  g = f.split(' ')                 # split again
  h = g.pop(0)                 # get rid of pre-dir
  street = ' '.join(g)          #leaves just the street
  return street                 #home free

myStreetName(!fullAddress!)

That should just about do it....

DarrenWiens2 · ‎08-12-2015

Your split string ('a') already contains a usable list of all you need.

If your addresses always take the form:

- first element = house number

- second element = pre-dir

- last element = suf-dir

- street name = everything between second and last element

... you can parse it apart like this:

>>> inString = '1234 E Olive Branch Dr'
... splitString = inString.split(' ')
... a = splitString[0]
... b = splitString[1]
... c = ' '.join(splitString[2:-1]) # everything between 2nd and last element, joined back together with a space
... d = splitString[-1]
... print (a,b,c,d)
...
('1234', 'E', 'Olive Branch', 'Dr')

View solution in original post

BruceHarold · ‎08-12-2015

Hi Joe

You have outed us on Standardize Addresses having issues at 10.x.

The easiest thing to do is download the 9.x styles and use Standardize Addresses with an old style.

http://www.arcgis.com/home/item.html?id=d36e3c27f12342d3b54f697048c71658

For interest, attached is a script I did for a tool for internal use - to find the parts of an address. It is far from bulletproof.

Another, canonical, way to standardize addresses (worldwide) is to leverage the World Geocode Service, for example:

http://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/find?text=380%20new%20york%20st%2...

You grab the returned JSON and take it from there.

The only catch is if you're storing the results you must use the forStorage parameter and use a token.

Regards

DarrenWiens2 · ‎08-12-2015

Your split string ('a') already contains a usable list of all you need.

If your addresses always take the form:

- first element = house number

- second element = pre-dir

- last element = suf-dir

- street name = everything between second and last element

... you can parse it apart like this:

>>> inString = '1234 E Olive Branch Dr'
... splitString = inString.split(' ')
... a = splitString[0]
... b = splitString[1]
... c = ' '.join(splitString[2:-1]) # everything between 2nd and last element, joined back together with a space
... d = splitString[-1]
... print (a,b,c,d)
...
('1234', 'E', 'Olive Branch', 'Dr')

DanPatterson_Retired · ‎08-12-2015

or

>>> inString = "1234 E Olive Branch Dr"
>>> a = (inString.strip()).split(" ")
>>> addr = a[0]
>>> NSEW = [i for i in a if len(i)==1][0]
>>> pre = "{} {}".format(addr,NSEW)
>>> street = inString.replace(pre,"").strip()
>>> street

eg.. 'Main ST' 'Olive Branch Dr' '300 E' (not sure of the last on...just in case there was an E street N

We really should quit playing around....

JoeBorgione · ‎08-12-2015

Bruce: Been using 9.3 locators, but now I'm being 'forced' into the more modern world....

Darren- that's what I'm talking about! Thanks!

Dan- not sure I follow: MAIN, OLIVE BRANCH, and 300 are all the valid StreetNames for the examples.

That should just about do it....

BruceHarold · ‎08-12-2015

Standardize Addresses doesn't use a locator, it uses a style, so for the US you should be safe to install the old stuff and carry on.

DanPatterson_Retired · ‎08-12-2015

Ahhh I had assumed 'main st' etc would be valid names..oh well another slice

MichaelHilstrom · ‎12-17-2015

hello,

I am looking for a way to batch address parsing. Is there a python script?

Mike H

JoeBorgione · ‎12-17-2015

I just use the python code provided above and run it a few times to calculate the values of each field. I'm sure some young, ambitious, aspiring GIS analyst could work that into a tool that is just a double click away from greatness. But since I'm just a cranky, gray bearded, old school GIS analyst, I do it my way...

That should just about do it....

MichaelHilstrom · ‎12-17-2015

Joe,

Where is the python script? I cannot see it…

PS:I am new to the discussion groups.

I was able to download a python script last year for exporting attribute tables to excel spread sheets.

Thanks,

Michael Hilstrom, R.G.

Environmental Consultant

ADOA - Risk Management Division l State of Arizona

100 North 15th Avenue, Suite 301, Phoenix, AZ 85007

p: 602.542.2863 | michael.hilstrom@azdoa.gov<mailto:michael.hilstrom@azdoa.gov>

https://staterisk.az.gov<https://staterisk.az.gov/>

How am I doing? Please take a moment to answer a few questions.

https://www.surveymonkey.com/r/VOCRiskLossPrev