Select to view content in your preferred language

Trim strings at specific point using Field Calculator

7962
11
12-09-2014 11:08 AM
DerrickFrese2
Deactivated User

I have a feature class which contains 40,000 mailing addresses. Each address contains the street address, city, state and zipcode separated by spaces.

Example 1: 123 Northwest Johnson St Cleveland Ohio 12345

Example 2: PO Box 3 Pine Springs Ohio 12345

I want to pull out just the street addresses. How do I say: trim off the string starting at the 3rd or 4th to last space? Thanks. Any help would be appreciated. I'm trying combinations of split, trim, etc. but can't get it right.

0 Kudos
11 Replies
DerrickFrese2
Deactivated User

Thanks for all the input and help.  I was able to use 3 iterations of the !Address![:!Address!.rfind(' ')] formula to slice off the zip, state, and city.  This left me with just the street address, which is all I needed.  The tough part was that there were just spaces separating everything so it took some finagling to get the city's sorted out but I finally got it.  Leaving a comma between the address, city, state, and zip would have made it much easier.

0 Kudos
TedKowal
Honored Contributor

I just saw this post a little late ... but I had to do some cleaning of a text address field from another agency... I ran across a script that would parse many of the US type of addressing using pyparsing

-- I do not know where I got the script or to whom to give credit to but I have adapted the script for use in the field calculator and has proven to be an excellent workhorse for me....

You will of course have to adapt it for you own specific use.... the example below parses the streetname from the address..... you can retrieve also the house number, streetnumber, streettype, PO Box etc...

from pyparsing import *
def ParseAddress(address):
# define number as a set of words
    units = oneOf("Zero One Two Three Four Five Six Seven Eight Nine Ten"
              "Eleven Twelve Thirteen Fourteen Fifteen Sixteen Seventeen Eighteen Nineteen",
              caseless=True)
    tens = oneOf("Ten Twenty Thirty Forty Fourty Fifty Sixty Seventy Eighty Ninety",caseless=True)
    hundred = CaselessLiteral("Hundred")
    thousand = CaselessLiteral("Thousand")
    OPT_DASH = Optional("-")
    numberword = ((( units + OPT_DASH + Optional(thousand) + OPT_DASH + 
                      Optional(units + OPT_DASH + hundred) + OPT_DASH + 
                      Optional(tens)) ^ tens ) 
                   + OPT_DASH + Optional(units) )


    # number can be any of the forms 123, 21B, 222-A or 23 1/2
    housenumber = originalTextFor( numberword | Combine(Word(nums) + 
                        Optional(OPT_DASH + oneOf(list(alphas))+FollowedBy(White()))) + 
                        Optional(OPT_DASH + "1/2")
                        )
    numberSuffix = oneOf("st th nd rd").setName("numberSuffix")
    streetnumber = originalTextFor( Word(nums) + 
                     Optional(OPT_DASH + "1/2") +
                     Optional(numberSuffix) )


    # just a basic word of alpha characters, Maple, Main, etc.
    name = ~numberSuffix + Word(alphas)


    # types of streets - extend as desired
    type_ = Combine( MatchFirst(map(Keyword,"ST BLVD LN RD AVE PASS "
                            "TRL PATH PSGE WAY LANE "
                            "CIR DR PKWY CT SQ "
                            "LP TER TERR PL".split())) + Optional(".").suppress())


    # street name 
    nsew = Combine(oneOf("N S E W North South East West NW NE SW SE") + Optional("."))
    streetName = (Combine( Optional(nsew) + streetnumber + 
                            Optional("1/2") + 
                            Optional(numberSuffix), joinString=" ", adjacent=False )
                    ^ Combine(~numberSuffix + OneOrMore(~type_ + Combine(Word(alphas) + Optional("."))), joinString=" ", adjacent=False) 
                    ^ Combine("Avenue" + Word(alphas), joinString=" ", adjacent=False)).setName("streetName")


    # PO Box handling
    acronym = lambda s : Regex(r"\.?\s*".join(s)+r"\.?")
    poBoxRef = ((acronym("PO") | acronym("APO") | acronym("AFP")) + 
                 Optional(CaselessLiteral("BOX"))) + Word(alphanums)("boxnumber")


    # basic street address
    streetReference = streetName.setResultsName("name") + Optional(type_).setResultsName("type")
    direct = housenumber.setResultsName("number") + streetReference
    intersection = ( streetReference.setResultsName("crossStreet") + 
                     ( '@' | Keyword("and",caseless=True)) +
                     streetReference.setResultsName("street") )
    streetAddress = ( poBoxRef("street")
                      ^ direct.setResultsName("street")
                      ^ streetReference.setResultsName("street")
                      ^ intersection )


    # how to add Apt, Suite, etc.
    suiteRef = (
                oneOf("Suite Ste Apt Apartment Room Rm #", caseless=True) + 
                Optional(".") + 
                Word(alphanums+'-')("suitenumber"))
    streetAddress = streetAddress + Optional(Suppress(',') + suiteRef("suite"))


    return streetAddress.parseString(address, parseAll=True)




__esri_field_calculator_splitter__
ParseAddress( !PHY_ADDR1!).street.name
0 Kudos