Trim strings at specific point using Field Calculator

6113
11
12-09-2014 11:08 AM
DerrickFrese2
New Contributor

I have a feature class which contains 40,000 mailing addresses. Each address contains the street address, city, state and zipcode separated by spaces.

Example 1: 123 Northwest Johnson St Cleveland Ohio 12345

Example 2: PO Box 3 Pine Springs Ohio 12345

I want to pull out just the street addresses. How do I say: trim off the string starting at the 3rd or 4th to last space? Thanks. Any help would be appreciated. I'm trying combinations of split, trim, etc. but can't get it right.

0 Kudos
11 Replies
XanderBakker
Esri Esteemed Contributor

Below an example of getting the address without the part from the last space (using the Python parser):

FieldCalc.png

Formula (with "Address" being the name of the field that contains the address string):

!Address![:!Address!.rfind(' ')]

The result:

Result.png

If you can detect the part you want to exclude, it is easy. However if you want to obtain "123 Northwest" or "PO Box 3", there is no easy way to detect the parts you need or those you want to exclude.

Kind regards, Xander

0 Kudos
derekthompson1
New Contributor

Try This

Mid( [Address],1,13 )     Get 123 Northwest

0 Kudos
ChrisDonohue__GISP
MVP Frequent Contributor

If you haven't already tried it, you might be able to accomplish part of what you want with VB Script and the Left, Mid, and Right string functions.  Whether this will be usable will depend on how your data is arranged; though from what you have depicted in your examples it would at least allow you to strip the zip code off.

Check out the ESRI help, VBScript string functions section (part way down the page):

Calculate Field Examples

ArcGIS Help 10.1

As to culling out the City, that looks a bit trickier, given the variability in name-length of a city and the presence of multiple words for some cities.  Off the wall idea - somehow tie in a search to a list of known cities that your data would include to then identify what to remove?

Chris Donohue, GISP

0 Kudos
JoeBorgione
MVP Esteemed Contributor

The problem is the variability in contents.  Even if you pop() the last two elements (Zip and State) you are still left with a variable length list.

You might want to try filtering the data through a query or queries.  One such query might be

[ADDRESS_FIELD] not like 'PO BOX %'.  Maybe look for street types as well.  These might give you an idea of what you need to do to tease out  the data you're after.  You need to get acquainted with the data to be able to do such filtering.  After all that you'll still have a ton of error trapping the figure out.  Sorry, not a lot of help...

That should just about do it....
0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

Xander and Joe are on the right track...
Examine the address from the perspective of non-standard lengths brought about the uncertain number of spaces separating elements in the address.

>>> #---------------------------------------------------
>>> # parsing non-standard address...problem examination
>>>
>>> address = "123 Northwest Johnson St Cleveland Ohio 12345"
>>> bits = address.split(" ")          # split it into its components
>>> bits
['123', 'Northwest', 'Johnson', 'St', 'Cleveland', 'Ohio', '12345']
>>> for fiddly in range(len(bits)):    # now lets examine the fiddly bits
... print bits[:-fiddly], "step", fiddly
...
[] step 0
['123', 'Northwest', 'Johnson', 'St', 'Cleveland', 'Ohio'] step 1
['123', 'Northwest', 'Johnson', 'St', 'Cleveland'] step 2
['123', 'Northwest', 'Johnson', 'St'] step 3
['123', 'Northwest', 'Johnson'] step 4
['123', 'Northwest'] step 5
['123'] step 6
>>>
>>> # Now for problem address 2
>>> address = "PO Box 3 Pine Springs Ohio 12345"
>>> bits = address.split(" ")          # split it into its components
>>> bits
['PO', 'Box', '3', 'Pine', 'Springs', 'Ohio', '12345']
>>> for fiddly in range(len(bits)):    # now lets examine the fiddly bits
... print bits[:-fiddly], "step", fiddly
...
[] step 0
['PO', 'Box', '3', 'Pine', 'Springs', 'Ohio'] step 1
['PO', 'Box', '3', 'Pine', 'Springs'] step 2
['PO', 'Box', '3', 'Pine'] step 3
['PO', 'Box', '3'] step 4
['PO', 'Box'] step 5
['PO'] step 6
>>>
>>> # In address 1 it took until step 3 to get rid of everything
>>> # up to the city level.  in address 2 this was completed by step 2...
>>> # Conclusion?  Unless the address contain the same number of elements
>>> # it is not possible to parse data in one pass
>>>

A good reason to parse during data collection and entry.

Do note... that a screw will be thrown into the works when you start mixing state names which are multipart as well...consider

address = "123 Northwest Johnson St South New York New York 12345"    # Now What!!!
0 Kudos
ChrisDonohue__GISP
MVP Frequent Contributor

A different idea - how about this as a possible approach - find an existing dataset of already-parsed addresses that include all the ones you have, then match yours to the parsed ones.  Then output the parsed ones and pare down from there.  I don't know exactly where you would find the parsed dataset(s), but given the proliferation of address information, geocoding, and the like, I suspect it could be feasible.

Can anyone chime in on where specific sources of such already-parsed datasets could be found?

Also, Derek, to help with this - what States are your addresses in?

Chris Donohue, GISP

[Edit:  fixed some typos]

0 Kudos
derekthompson1
New Contributor

Cleveland, Ohio

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

Have you tried to geocode a sample?
Removind Cleveland Ohio ZIP  is going to narrow it down but you are still left with the problem of

'123', 'Northwest', 'Johnson', 'St', .....  being 4 positions in the list

'PO', 'Box', '3', 'Pine', 'Springs', ......     and, this being 5

0 Kudos
ChrisDonohue__GISP
MVP Frequent Contributor

Dan has a point - maybe try geocoding them.  I'm in no ways an expert on Geocoding, but I know there are quite a few geocoding services out there.  While you may not necessarily be interested in the coordinate location of your address, the address result may very well be parsed for you as part of the process.

Here's a site I remember people mentioning in a post here on GeoNet several weeks ago.

Texas A&M Geoservices

List of Online Geocoding Systems

There may also be local geocoding services publicly available from government entities in your area.  Maybe try the city and county governments of your area of interest.

Chris Donohue, GISP

0 Kudos