error handling and cursors

3137
18
Jump to solution
11-27-2018 02:49 PM
JustinWolff
Occasional Contributor

I'm using the usaddress library to split single-line addresses into their various components. Below is the functioning script that I'm using for testing:

import arcpy
from arcpy import env
import os
import usaddress
from collections import OrderedDict

# Get input table through parameter
table = arcpy.GetParameterAsText(0)

# Get the field containing the single-line addresses through parameter
infield = arcpy.GetParameterAsText(1)

# List of fields to be updated
fields = ["addr", "addrNum", "stNm", "zip"]

with arcpy.da.UpdateCursor(table, fields) as cursor:
     for addr, addrNum, stNm, zip in cursor:
          try:
               parse = usaddress.tag(addr)[0]
               addrNum = parse.get("AddressNumber", "")
               stNm = parse.get("StreetName", "")
               zip = parse.get("ZipCode", "")
               cursor.updateRow([addr, addrNum, stNm, zip])
               continue
          except Exception:
               arcpy.AddMessage("Error")

Here's what the testing input table looks like before:

before

And after:

after

My goal is to write the records that error out to a separate table.  (Errors include situations where multiple areas of an address have the same label, like "5318 S 86 CT APT 3 APT 412, Omaha, NE 68137" - multiple APT entries, or "PO Box 291 PO BOX 291, Genoa, NE 68640" - multiple PO Box entries.)  I've reviewed the examples in the error handling page, but they seem to be very focused on geoprocessing.

Also, this is the first time I've used cursors in this manner, which prevents me from using something like the following:

     except Exception:
          arcpy.AddMessage(str(row.getValue("OBJECTID")) + " Error")     ‍‍‍‍‍‍‍‍‍‍

At least I can't figure it out so far.  Any suggestions or bump in the right direction are greatly appreciated.

Thanks

Justin

18 Replies
JustinWolff
Occasional Contributor

Josh,

You and Randy both helped and gave me correct answers to what really ended up being separate questions.  I just tossed a coin and will mark you as having the correct answer.

Thank you and Randy both for the help.  (I can't promise I won't be back asking more questions....)

I really appreciate the assistance.

Justin

0 Kudos
JustinWolff
Occasional Contributor

Josh - A quick question for you as you have experience with the usaddress library.  I have things working like I want using usaddress.tag, but I'd also like to see what the output is using the usaddress.parse.  However, when I switch 'tag' to 'parse' in the code below (line 28), all of the records are returned as exceptions, placed in my output error table.  I tried removing commas from the 'addr' input address field values and that didn't make a difference.  Can the 'tag' and 'parse' methods not simply be interchanged like this?  Thanks

Justin

# Created by Justin Wolff 11/26/2018
# Note: usaddress module will need installed from https://github.com/datamade/usaddress

import arcpy
from arcpy import env
import os
import usaddress
from collections import OrderedDict

arcpy.env.overwriteOutput = True

# Get input table
table = arcpy.GetParameterAsText(0)

# Get the field containing the single-line addresses
infield = arcpy.GetParameterAsText(1)

# Get the output table for errors
outErrors = arcpy.GetParameterAsText(2)

# List of fields
fields = ["OID@", "addr", "addrNum", "addrNumPre", "addrNumSuf", "buildNm", "corOf", "intSep", "lndMrkNm", "notAddr", "occTyp", "occId", "plcNm", "recip", "stNm", "strNm", "strNmPreDir", "strNmPreMod", "strNmPreTyp", "strNmPstDir", "strNmPstMod", "strNmPstTyp", "subAddrId", "subAddrTyp", "uspsBxGrpId", "uspsBxGrpTyp", "uspsBxId", "uspsBxTyp", "zip"]
fields1 = ["addr", "oid"]

with arcpy.da.UpdateCursor(table, fields) as cursor:
     for oid, addr, addrNum, addrNumPre, addrNumSuf, buildNm, corOf, intSep, lndMrkNm, notAddr, occTyp, occId, plcNm, recip, stNm, strNm, strNmPreDir, strNmPreMod, strNmPreTyp, strNmPstDir, strNmPstMod, strNmPstTyp, subAddrId, subAddrTyp, uspsBxGrpId, uspsBxGrpTyp, uspsBxId, uspsBxTyp, zip in cursor:
          try:
               method = usaddress.parse(addr)[0]
               addrNum = method.get("AddressNumber", "")
               addrNumPre = method.get("AddressNumberPrefix", "")
               addrNumSuf = method.get("AddressNumberSuffix", "")
               buildNm = method.get("BuildingName", "")
               corOf = method.get("CornerOf", "")
               intSep = method.get("IntersectionSeparator", "")
               lndMrkNm = method.get("LandmarkName", "")
               notAddr = method.get("NotAddress", "")
               occTyp = method.get("OccupancyType", "")
               occId = method.get("OccupancyIdentifier", "")
               plcNm = method.get("PlaceName", "")
               recip = method.get("Recipient", "")
               stNm = method.get("StateName", "")
               strNm = method.get("StreetName", "")
               strNmPreDir = method.get("StreetNamePreDirectional", "")
               strNmPreMod = method.get("StreetNamePreModifier", "")
               strNmPreTyp = method.get("StreetNamePreType", "")
               strNmPstDir = method.get("StreetNamePostDirectional", "")
               strNmPstMod = method.get("StreetNamePostModifier", "")
               strNmPstTyp = method.get("StreetNamePostType", "")
               subAddrId = method.get("SubaddressIdentifier", "")
               subAddrTyp = method.get("SubaddressType", "")
               uspsBxGrpId = method.get("USPSBoxGroupID", "")
               uspsBxGrpTyp = method.get("USPSBoxGroupType", "")
               uspsBxId = method.get("USPSBoxID", "")
               uspsBxTyp = method.get("USPSBoxType", "")
               zip = method.get("ZipCode", "")
               cursor.updateRow([oid, addr, addrNum, addrNumPre, addrNumSuf, buildNm, corOf, intSep, lndMrkNm, notAddr, occTyp, occId, plcNm, recip, stNm, strNm, strNmPreDir, strNmPreMod, strNmPreTyp, strNmPstDir, strNmPstMod, strNmPstTyp, subAddrId, subAddrTyp, uspsBxGrpId, uspsBxGrpTyp, uspsBxId, uspsBxTyp, zip])
          except Exception:
               cursor1 = arcpy.da.InsertCursor(outErrors, fields1)
               cursor1.insertRow([addr, oid])
               arcpy.AddMessage("Bad Input Address = : {}".format(addr))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
0 Kudos
RandyBurton
MVP Alum

Parse returns a list of tuples with an address component and a label in each tuple.  Tag returns an ordered dictionary and an address type inside a tuple.  The documentation says the tag method tries to be a bit smarter - doing some merging, removing comas, etc.  Since tag returns a dictionary it is probably easier to work with.  The parse method will return both apartment numbers in the bad address in its tuple; the tag method will generate an exception. 

# Parse

adr = usaddress.parse("5318 S 86 CT APT 3 APT 412, OMAHA, NE 68137")
print adr

AddressNumber = adr[[x for x, y in enumerate(adr) if y[1] == 'AddressNumber'][0]][0]
StreetName = adr[[x for x, y in enumerate(adr) if y[1] == 'StreetName'][0]][0]
StreetNamePostType = adr[[x for x, y in enumerate(adr) if y[1] == 'StreetNamePostType'][0]][0]

print AddressNumber, StreetName, StreetNamePostType


# Tag

adr = usaddress.tag('123 Main St. Suite 100 Chicago, IL')[0] #ordered dictionary inside tuple
print adr

print adr.get("AddressNumber", "") # return address number or empty string

print adr["AddressNumber"] # return address number or key error exception


# Additional error processing

def some_special_instructions(a, b):
    d = {}
    print "Bad address: {}".format(b)
    for row in a:
        if row[1] not in d.keys():
            d[row[1]] = 1
        else:
            d[row[1]] += 1
    for k, v in d.items():
        if v > 1:
            print "  Repeated Label: {} ({} times)".format(k,v)
    
try:
    tagged_address, address_type = usaddress.tag("5318 S 86 CT APT 3 APT 412, OMAHA, NE 68137")
except usaddress.RepeatedLabelError as e :
    some_special_instructions(e.parsed_string, e.original_string)    
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
JustinWolff
Occasional Contributor

Thanks Randy.  We have this as far as we need it for now, onto other things.  I added some comments to help us remember how it works next time we need it.

# Note: usaddress module will need installed from https://github.com/datamade/usaddress

# This script is meant to be used with a toolbox within ArcGIS

# This script requires two (2) geodatabases, one for the input addresses, and one for the output errors.  

# This script requires the input address table to have a field named 'addr', containing the single-line addresses

# This script requires the input address table to have fields named: oid, addr, addrNum, addrNumPre, addrNumSuf, 
# buildNm, corOf, intSep, lndMrkNm, notAddr, occTyp, occId, plcNm, recip, stNm, strNm, strNmPreDir, strNmPreMod, 
# strNmPreTyp, strNmPstDir, strNmPstMod, strNmPstTyp, subAddrId, subAddrTyp, uspsBxGrpId, uspsBxGrpTyp, uspsBxId, uspsBxTyp, zip
# The fields listed above were created just for this script and like 'addr' can be changed to anything necessary.  (i.e. change to match Nebraska statewide parcel address schema)

import arcpy
from arcpy import env
import os
import usaddress
from collections import OrderedDict

arcpy.env.overwriteOutput = True

# Get input table
table = arcpy.GetParameterAsText(0)

# Get the field containing the single-line addresses
infield = arcpy.GetParameterAsText(1)

# Get the output table for errors (Note: output and input tables must be in different geodatabases)
outErrors = arcpy.GetParameterAsText(2)

# List of fields used for parsing/tagging input addresses (input table) in try section below
fields = ["OID@", "addr", "addrNum", "addrNumPre", "addrNumSuf", "buildNm", "corOf", "intSep", "lndMrkNm", "notAddr", "occTyp", "occId", "plcNm", "recip", "stNm", "strNm", "strNmPreDir", "strNmPreMod", "strNmPreTyp", "strNmPstDir", "strNmPstMod", "strNmPstTyp", "subAddrId", "subAddrTyp", "uspsBxGrpId", "uspsBxGrpTyp", "uspsBxId", "uspsBxTyp", "zip"]

# List of fields used for error reporting (output table) in except section below
fields1 = ["addr", "oid"]

with arcpy.da.UpdateCursor(table, fields) as cursor:
     for oid, addr, addrNum, addrNumPre, addrNumSuf, buildNm, corOf, intSep, lndMrkNm, notAddr, occTyp, occId, plcNm, recip, stNm, strNm, strNmPreDir, strNmPreMod, strNmPreTyp, strNmPstDir, strNmPstMod, strNmPstTyp, subAddrId, subAddrTyp, uspsBxGrpId, uspsBxGrpTyp, uspsBxId, uspsBxTyp, zip in cursor:
          try: #The strings listed in the 'get' parenthesis below correspond to the usaddress output (i.e., "AddressNumber", etc.)
               method = usaddress.tag(addr)[0]
               addrNum = method.get("AddressNumber", "") #address number
               addrNumPre = method.get("AddressNumberPrefix", "") #a modifier before an address number, e.g. ‘Mile’, ‘#’
               addrNumSuf = method.get("AddressNumberSuffix", "") #a modifier after an address number, e.g ‘B’, ‘1/2’
               buildNm = method.get("BuildingName", "") #the name of a building, e.g. ‘Atlanta Financial Center’
               corOf = method.get("CornerOf", "") #words indicating that an address is a corner, e.g. ‘Junction’, ‘corner of’
               intSep = method.get("IntersectionSeparator", "") #a conjunction connecting parts of an intersection, e.g. ‘and’, ‘&’
               lndMrkNm = method.get("LandmarkName", "") #the name of a landmark, e.g. ‘Wrigley Field’, ‘Union Station’
               notAddr = method.get("NotAddress", "") #a non-address component that doesn’t refer to a recipient
               occTyp = method.get("OccupancyType", "") #a type of occupancy within a building, e.g. ‘Suite’, ‘Apt’, ‘Floor’
               occId = method.get("OccupancyIdentifier", "") #the identifier of an occupancy, often a number or letter
               plcNm = method.get("PlaceName", "") #city
               recip = method.get("Recipient", "") #a non-address recipient, e.g. the name of a person/organization
               stNm = method.get("StateName", "") #state
               strNm = method.get("StreetName", "") #street name, excluding type & direction
               strNmPreDir = method.get("StreetNamePreDirectional", "") #a direction before a street name, e.g. ‘North’, ‘S’
               strNmPreMod = method.get("StreetNamePreModifier", "") #a modifier before a street name that is not a direction, e.g. ‘Old’
               strNmPreTyp = method.get("StreetNamePreType", "") #a street type that comes before a street name, e.g. ‘Route’, ‘Ave’
               strNmPstDir = method.get("StreetNamePostDirectional", "") #a direction after a street name, e.g. ‘North’, ‘S’
               strNmPstMod = method.get("StreetNamePostModifier", "") #a modifier after a street name, e.g. ‘Ext’
               strNmPstTyp = method.get("StreetNamePostType", "") #a street type that comes after a street name, e.g. ‘Avenue’, ‘Rd’
               subAddrId = method.get("SubaddressIdentifier", "") #the name/identifier of a subaddress component
               subAddrTyp = method.get("SubaddressType", "") #a level of detail in an address that is not an occupancy within a building, e.g. ‘Building’, ‘Tower’
               uspsBxGrpId = method.get("USPSBoxGroupID", "") #the identifier of a USPS box group, usually a number
               uspsBxGrpTyp = method.get("USPSBoxGroupType", "") #a name for a group of USPS boxes, e.g. ‘RR’
               uspsBxId = method.get("USPSBoxID", "") #the identifier of a USPS box, usually a number
               uspsBxTyp = method.get("USPSBoxType", "") #a USPS box, e.g. ‘P.O. Box’
               zip = method.get("ZipCode", "") #zip code
               cursor.updateRow([oid, addr, addrNum, addrNumPre, addrNumSuf, buildNm, corOf, intSep, lndMrkNm, notAddr, occTyp, occId, plcNm, recip, stNm, strNm, strNmPreDir, strNmPreMod, strNmPreTyp, strNmPstDir, strNmPstMod, strNmPstTyp, subAddrId, subAddrTyp, uspsBxGrpId, uspsBxGrpTyp, uspsBxId, uspsBxTyp, zip])
          except Exception:
               cursor1 = arcpy.da.InsertCursor(outErrors, fields1)
               cursor1.insertRow([addr, oid])
               arcpy.AddMessage("Bad Input Address = : {}".format(addr))

Thanks again for the help.

Justin

JoeBorgione
MVP Emeritus

Are the problem addresses as you describe them with PO boxes and APTs?  Are those 'special' words easy to find?  For example run a query where addr like '% PO %'    Maybe just getting them out that way might be easier?

That should just about do it....
JustinWolff
Occasional Contributor

Yes, the data owner may eventually use some methods like you've suggested for scrubbing their data, but for now our initial task is to help them identify which records (and how many) have issues.

Thanks

0 Kudos
JoeBorgione
MVP Emeritus

Addressing data is famous for funkiness; having scrubbed thousands of addresses for the past umpteen years, you have my empathy...

That should just about do it....
JustinWolff
Occasional Contributor

Joe - Yes, addresses are always funny.  Fortunately the data in question for this client seems decent.  The various needs around our organization keep things interesting; some need to see dots on the map, while others don't care about the location, they just need a clean mailing address.

Thanks

Justin

0 Kudos
RogerDunnGIS
Occasional Contributor II

Typically addMessage is for reporting things passively, or even successes.  You could use addError so that the tool colorizes itself upon the success of the tool.  addError is an arcpy method, but if called within a Python toolbox tool's execute method, you would use messages.addErrorMessage instead.

0 Kudos