I'm using the usaddress library to split single-line addresses into their various components. Below is the functioning script that I'm using for testing:
import arcpy
from arcpy import env
import os
import usaddress
from collections import OrderedDict
# Get input table through parameter
table = arcpy.GetParameterAsText(0)
# Get the field containing the single-line addresses through parameter
infield = arcpy.GetParameterAsText(1)
# List of fields to be updated
fields = ["addr", "addrNum", "stNm", "zip"]
with arcpy.da.UpdateCursor(table, fields) as cursor:
for addr, addrNum, stNm, zip in cursor:
try:
parse = usaddress.tag(addr)[0]
addrNum = parse.get("AddressNumber", "")
stNm = parse.get("StreetName", "")
zip = parse.get("ZipCode", "")
cursor.updateRow([addr, addrNum, stNm, zip])
continue
except Exception:
arcpy.AddMessage("Error")
Here's what the testing input table looks like before:
And after:
My goal is to write the records that error out to a separate table. (Errors include situations where multiple areas of an address have the same label, like "5318 S 86 CT APT 3 APT 412, Omaha, NE 68137" - multiple APT entries, or "PO Box 291 PO BOX 291, Genoa, NE 68640" - multiple PO Box entries.) I've reviewed the examples in the error handling page, but they seem to be very focused on geoprocessing.
Also, this is the first time I've used cursors in this manner, which prevents me from using something like the following:
except Exception:
arcpy.AddMessage(str(row.getValue("OBJECTID")) + " Error")
At least I can't figure it out so far. Any suggestions or bump in the right direction are greatly appreciated.
Thanks
Justin
Solved! Go to Solution.
Josh,
You and Randy both helped and gave me correct answers to what really ended up being separate questions. I just tossed a coin and will mark you as having the correct answer.
Thank you and Randy both for the help. (I can't promise I won't be back asking more questions....)
I really appreciate the assistance.
Justin
Josh - A quick question for you as you have experience with the usaddress library. I have things working like I want using usaddress.tag, but I'd also like to see what the output is using the usaddress.parse. However, when I switch 'tag' to 'parse' in the code below (line 28), all of the records are returned as exceptions, placed in my output error table. I tried removing commas from the 'addr' input address field values and that didn't make a difference. Can the 'tag' and 'parse' methods not simply be interchanged like this? Thanks
Justin
# Created by Justin Wolff 11/26/2018
# Note: usaddress module will need installed from https://github.com/datamade/usaddress
import arcpy
from arcpy import env
import os
import usaddress
from collections import OrderedDict
arcpy.env.overwriteOutput = True
# Get input table
table = arcpy.GetParameterAsText(0)
# Get the field containing the single-line addresses
infield = arcpy.GetParameterAsText(1)
# Get the output table for errors
outErrors = arcpy.GetParameterAsText(2)
# List of fields
fields = ["OID@", "addr", "addrNum", "addrNumPre", "addrNumSuf", "buildNm", "corOf", "intSep", "lndMrkNm", "notAddr", "occTyp", "occId", "plcNm", "recip", "stNm", "strNm", "strNmPreDir", "strNmPreMod", "strNmPreTyp", "strNmPstDir", "strNmPstMod", "strNmPstTyp", "subAddrId", "subAddrTyp", "uspsBxGrpId", "uspsBxGrpTyp", "uspsBxId", "uspsBxTyp", "zip"]
fields1 = ["addr", "oid"]
with arcpy.da.UpdateCursor(table, fields) as cursor:
for oid, addr, addrNum, addrNumPre, addrNumSuf, buildNm, corOf, intSep, lndMrkNm, notAddr, occTyp, occId, plcNm, recip, stNm, strNm, strNmPreDir, strNmPreMod, strNmPreTyp, strNmPstDir, strNmPstMod, strNmPstTyp, subAddrId, subAddrTyp, uspsBxGrpId, uspsBxGrpTyp, uspsBxId, uspsBxTyp, zip in cursor:
try:
method = usaddress.parse(addr)[0]
addrNum = method.get("AddressNumber", "")
addrNumPre = method.get("AddressNumberPrefix", "")
addrNumSuf = method.get("AddressNumberSuffix", "")
buildNm = method.get("BuildingName", "")
corOf = method.get("CornerOf", "")
intSep = method.get("IntersectionSeparator", "")
lndMrkNm = method.get("LandmarkName", "")
notAddr = method.get("NotAddress", "")
occTyp = method.get("OccupancyType", "")
occId = method.get("OccupancyIdentifier", "")
plcNm = method.get("PlaceName", "")
recip = method.get("Recipient", "")
stNm = method.get("StateName", "")
strNm = method.get("StreetName", "")
strNmPreDir = method.get("StreetNamePreDirectional", "")
strNmPreMod = method.get("StreetNamePreModifier", "")
strNmPreTyp = method.get("StreetNamePreType", "")
strNmPstDir = method.get("StreetNamePostDirectional", "")
strNmPstMod = method.get("StreetNamePostModifier", "")
strNmPstTyp = method.get("StreetNamePostType", "")
subAddrId = method.get("SubaddressIdentifier", "")
subAddrTyp = method.get("SubaddressType", "")
uspsBxGrpId = method.get("USPSBoxGroupID", "")
uspsBxGrpTyp = method.get("USPSBoxGroupType", "")
uspsBxId = method.get("USPSBoxID", "")
uspsBxTyp = method.get("USPSBoxType", "")
zip = method.get("ZipCode", "")
cursor.updateRow([oid, addr, addrNum, addrNumPre, addrNumSuf, buildNm, corOf, intSep, lndMrkNm, notAddr, occTyp, occId, plcNm, recip, stNm, strNm, strNmPreDir, strNmPreMod, strNmPreTyp, strNmPstDir, strNmPstMod, strNmPstTyp, subAddrId, subAddrTyp, uspsBxGrpId, uspsBxGrpTyp, uspsBxId, uspsBxTyp, zip])
except Exception:
cursor1 = arcpy.da.InsertCursor(outErrors, fields1)
cursor1.insertRow([addr, oid])
arcpy.AddMessage("Bad Input Address = : {}".format(addr))
Parse returns a list of tuples with an address component and a label in each tuple. Tag returns an ordered dictionary and an address type inside a tuple. The documentation says the tag method tries to be a bit smarter - doing some merging, removing comas, etc. Since tag returns a dictionary it is probably easier to work with. The parse method will return both apartment numbers in the bad address in its tuple; the tag method will generate an exception.
# Parse
adr = usaddress.parse("5318 S 86 CT APT 3 APT 412, OMAHA, NE 68137")
print adr
AddressNumber = adr[[x for x, y in enumerate(adr) if y[1] == 'AddressNumber'][0]][0]
StreetName = adr[[x for x, y in enumerate(adr) if y[1] == 'StreetName'][0]][0]
StreetNamePostType = adr[[x for x, y in enumerate(adr) if y[1] == 'StreetNamePostType'][0]][0]
print AddressNumber, StreetName, StreetNamePostType
# Tag
adr = usaddress.tag('123 Main St. Suite 100 Chicago, IL')[0] #ordered dictionary inside tuple
print adr
print adr.get("AddressNumber", "") # return address number or empty string
print adr["AddressNumber"] # return address number or key error exception
# Additional error processing
def some_special_instructions(a, b):
d = {}
print "Bad address: {}".format(b)
for row in a:
if row[1] not in d.keys():
d[row[1]] = 1
else:
d[row[1]] += 1
for k, v in d.items():
if v > 1:
print " Repeated Label: {} ({} times)".format(k,v)
try:
tagged_address, address_type = usaddress.tag("5318 S 86 CT APT 3 APT 412, OMAHA, NE 68137")
except usaddress.RepeatedLabelError as e :
some_special_instructions(e.parsed_string, e.original_string)
Thanks Randy. We have this as far as we need it for now, onto other things. I added some comments to help us remember how it works next time we need it.
# Note: usaddress module will need installed from https://github.com/datamade/usaddress
# This script is meant to be used with a toolbox within ArcGIS
# This script requires two (2) geodatabases, one for the input addresses, and one for the output errors.
# This script requires the input address table to have a field named 'addr', containing the single-line addresses
# This script requires the input address table to have fields named: oid, addr, addrNum, addrNumPre, addrNumSuf,
# buildNm, corOf, intSep, lndMrkNm, notAddr, occTyp, occId, plcNm, recip, stNm, strNm, strNmPreDir, strNmPreMod,
# strNmPreTyp, strNmPstDir, strNmPstMod, strNmPstTyp, subAddrId, subAddrTyp, uspsBxGrpId, uspsBxGrpTyp, uspsBxId, uspsBxTyp, zip
# The fields listed above were created just for this script and like 'addr' can be changed to anything necessary. (i.e. change to match Nebraska statewide parcel address schema)
import arcpy
from arcpy import env
import os
import usaddress
from collections import OrderedDict
arcpy.env.overwriteOutput = True
# Get input table
table = arcpy.GetParameterAsText(0)
# Get the field containing the single-line addresses
infield = arcpy.GetParameterAsText(1)
# Get the output table for errors (Note: output and input tables must be in different geodatabases)
outErrors = arcpy.GetParameterAsText(2)
# List of fields used for parsing/tagging input addresses (input table) in try section below
fields = ["OID@", "addr", "addrNum", "addrNumPre", "addrNumSuf", "buildNm", "corOf", "intSep", "lndMrkNm", "notAddr", "occTyp", "occId", "plcNm", "recip", "stNm", "strNm", "strNmPreDir", "strNmPreMod", "strNmPreTyp", "strNmPstDir", "strNmPstMod", "strNmPstTyp", "subAddrId", "subAddrTyp", "uspsBxGrpId", "uspsBxGrpTyp", "uspsBxId", "uspsBxTyp", "zip"]
# List of fields used for error reporting (output table) in except section below
fields1 = ["addr", "oid"]
with arcpy.da.UpdateCursor(table, fields) as cursor:
for oid, addr, addrNum, addrNumPre, addrNumSuf, buildNm, corOf, intSep, lndMrkNm, notAddr, occTyp, occId, plcNm, recip, stNm, strNm, strNmPreDir, strNmPreMod, strNmPreTyp, strNmPstDir, strNmPstMod, strNmPstTyp, subAddrId, subAddrTyp, uspsBxGrpId, uspsBxGrpTyp, uspsBxId, uspsBxTyp, zip in cursor:
try: #The strings listed in the 'get' parenthesis below correspond to the usaddress output (i.e., "AddressNumber", etc.)
method = usaddress.tag(addr)[0]
addrNum = method.get("AddressNumber", "") #address number
addrNumPre = method.get("AddressNumberPrefix", "") #a modifier before an address number, e.g. ‘Mile’, ‘#’
addrNumSuf = method.get("AddressNumberSuffix", "") #a modifier after an address number, e.g ‘B’, ‘1/2’
buildNm = method.get("BuildingName", "") #the name of a building, e.g. ‘Atlanta Financial Center’
corOf = method.get("CornerOf", "") #words indicating that an address is a corner, e.g. ‘Junction’, ‘corner of’
intSep = method.get("IntersectionSeparator", "") #a conjunction connecting parts of an intersection, e.g. ‘and’, ‘&’
lndMrkNm = method.get("LandmarkName", "") #the name of a landmark, e.g. ‘Wrigley Field’, ‘Union Station’
notAddr = method.get("NotAddress", "") #a non-address component that doesn’t refer to a recipient
occTyp = method.get("OccupancyType", "") #a type of occupancy within a building, e.g. ‘Suite’, ‘Apt’, ‘Floor’
occId = method.get("OccupancyIdentifier", "") #the identifier of an occupancy, often a number or letter
plcNm = method.get("PlaceName", "") #city
recip = method.get("Recipient", "") #a non-address recipient, e.g. the name of a person/organization
stNm = method.get("StateName", "") #state
strNm = method.get("StreetName", "") #street name, excluding type & direction
strNmPreDir = method.get("StreetNamePreDirectional", "") #a direction before a street name, e.g. ‘North’, ‘S’
strNmPreMod = method.get("StreetNamePreModifier", "") #a modifier before a street name that is not a direction, e.g. ‘Old’
strNmPreTyp = method.get("StreetNamePreType", "") #a street type that comes before a street name, e.g. ‘Route’, ‘Ave’
strNmPstDir = method.get("StreetNamePostDirectional", "") #a direction after a street name, e.g. ‘North’, ‘S’
strNmPstMod = method.get("StreetNamePostModifier", "") #a modifier after a street name, e.g. ‘Ext’
strNmPstTyp = method.get("StreetNamePostType", "") #a street type that comes after a street name, e.g. ‘Avenue’, ‘Rd’
subAddrId = method.get("SubaddressIdentifier", "") #the name/identifier of a subaddress component
subAddrTyp = method.get("SubaddressType", "") #a level of detail in an address that is not an occupancy within a building, e.g. ‘Building’, ‘Tower’
uspsBxGrpId = method.get("USPSBoxGroupID", "") #the identifier of a USPS box group, usually a number
uspsBxGrpTyp = method.get("USPSBoxGroupType", "") #a name for a group of USPS boxes, e.g. ‘RR’
uspsBxId = method.get("USPSBoxID", "") #the identifier of a USPS box, usually a number
uspsBxTyp = method.get("USPSBoxType", "") #a USPS box, e.g. ‘P.O. Box’
zip = method.get("ZipCode", "") #zip code
cursor.updateRow([oid, addr, addrNum, addrNumPre, addrNumSuf, buildNm, corOf, intSep, lndMrkNm, notAddr, occTyp, occId, plcNm, recip, stNm, strNm, strNmPreDir, strNmPreMod, strNmPreTyp, strNmPstDir, strNmPstMod, strNmPstTyp, subAddrId, subAddrTyp, uspsBxGrpId, uspsBxGrpTyp, uspsBxId, uspsBxTyp, zip])
except Exception:
cursor1 = arcpy.da.InsertCursor(outErrors, fields1)
cursor1.insertRow([addr, oid])
arcpy.AddMessage("Bad Input Address = : {}".format(addr))
Thanks again for the help.
Justin
Are the problem addresses as you describe them with PO boxes and APTs? Are those 'special' words easy to find? For example run a query where addr like '% PO %' Maybe just getting them out that way might be easier?
Yes, the data owner may eventually use some methods like you've suggested for scrubbing their data, but for now our initial task is to help them identify which records (and how many) have issues.
Thanks
Addressing data is famous for funkiness; having scrubbed thousands of addresses for the past umpteen years, you have my empathy...
Joe - Yes, addresses are always funny. Fortunately the data in question for this client seems decent. The various needs around our organization keep things interesting; some need to see dots on the map, while others don't care about the location, they just need a clean mailing address.
Thanks
Justin
Typically addMessage is for reporting things passively, or even successes. You could use addError so that the tool colorizes itself upon the success of the tool. addError is an arcpy method, but if called within a Python toolbox tool's execute method, you would use messages.addErrorMessage instead.