Find Values in Field With List Comprehension

JaredPilbeam2 · ‎10-08-2020

I was wondering if I was on the right track with this. I have a table with fields that contain addresses like this:

515 S WEBER RD APT 108 LOCKPORT, IL 60441‍‍‍‍

I'm interested in finding three types of values if they exist, and I'd like to update the value to a new field called 'UnitType'. Right now, this is printing empty brackets. I'm not too sure why.

fc = r'C:\Users\jpilbeam\Downloads\AddPts_AptUnitSte.gdb\Default.gdb\AddPts_AptUnitSte'
field = 'USPS_ADDRE'
values = ['Apt', 'Unit', 'Ste']

# Use SearchCursor with list comprehension to return a
# unique set of values in the specified field
v = [row[0] for row in arcpy.da.SearchCursor(fc, field)]
types = [i for i in v if i in values]
print(types)‍‍‍‍‍‍‍‍‍


>>>[]‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

RandyBurton · ‎10-08-2020

You might try this, and note case sensitivity:

fc = r'C:\Users\jpilbeam\Downloads\AddPts_AptUnitSte.gdb\Default.gdb\AddPts_AptUnitSte'
field = 'USPS_ADDRE'
values = ['APT', 'UNIT', 'STE']

addresses = [row[0] for row in arcpy.da.SearchCursor(fc, field)]

types = [v for v in values for address in addresses if v in address.upper()]

print(types)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I'd like to update the value to a new filed called 'UnitType'.

I'm not sure if list comprehension, at least in this form, will get you closer to your goal. I would use either the field calculator or an update cursor.

fc = r'C:\Users\jpilbeam\Downloads\AddPts_AptUnitSte.gdb\Default.gdb\AddPts_AptUnitSte'
fields = ['USPS_ADDRE','UnitType']
values = ['APT', 'UNIT', 'STE']

with arcpy.da.UpdateCursor(fc, fields) as cursor:
    for row in cursor:
        for v in values:
            if v in row[0].upper():
                row[1] = v
                cursor.updateRow(row)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

View solution in original post

DanPatterson · ‎10-08-2020

You need to reset your cursor after running it, otherwise you get [].

Using a two step/while look is safest

check_for = [8, 5]
with arcpy.da.SearchCursor(fc2, "PNT_COUNT") as cur:
    v = [row[0] for row in cur]
    is_in = [i for i in v if i in check_for]
    

is_in
Out[16]: [8.0, 5.0]‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

# ---- same as but more obtuse
with arcpy.da.SearchCursor(fc2, "PNT_COUNT") as cur:
    v = [r[0] for r in cur if r[0] in check_for]

v
[8.0, 5.0]‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

... sort of retired...

RandyBurton · ‎10-08-2020

You might try this, and note case sensitivity:

fc = r'C:\Users\jpilbeam\Downloads\AddPts_AptUnitSte.gdb\Default.gdb\AddPts_AptUnitSte'
field = 'USPS_ADDRE'
values = ['APT', 'UNIT', 'STE']

addresses = [row[0] for row in arcpy.da.SearchCursor(fc, field)]

types = [v for v in values for address in addresses if v in address.upper()]

print(types)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I'd like to update the value to a new filed called 'UnitType'.

I'm not sure if list comprehension, at least in this form, will get you closer to your goal. I would use either the field calculator or an update cursor.

fc = r'C:\Users\jpilbeam\Downloads\AddPts_AptUnitSte.gdb\Default.gdb\AddPts_AptUnitSte'
fields = ['USPS_ADDRE','UnitType']
values = ['APT', 'UNIT', 'STE']

with arcpy.da.UpdateCursor(fc, fields) as cursor:
    for row in cursor:
        for v in values:
            if v in row[0].upper():
                row[1] = v
                cursor.updateRow(row)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

JaredPilbeam2 · ‎10-09-2020

Thanks to both of you. I thought I'd try list comprehension on this, but I guess I don't grasp it enough. So, I used Randy's second answer. It worked great.

Bonus question: Now I have to get the number that follows the value and put it in a new field called 'UnitName'. So, for example the 108 in this field:

515 S WEBER RD APT 108 LOCKPORT, IL 60441‍

RandyBurton · ‎10-09-2020

Try using split:

fc = 'feature'
fields = ['USPS_ADDRE','Unit']
values = ['APT', 'UNIT', 'STE']

with arcpy.da.UpdateCursor(fc, fields) as cursor:
    for row in cursor:
        for v in values:
            if v in row[0].upper():
                # row[1] = v
                row[1] = "{} {}".format(v,row[0].split(v)[1].split(' ')[1])
                cursor.updateRow(row)
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

If you are doing some serious address parsing, look into the usaddress module.

import usaddress

adr = usaddress.parse("515 S WEBER RD APT 108 LOCKPORT, IL 60441")

print(adr)

# [(u'515', 'AddressNumber'), (u'S', 'StreetNamePreDirectional'), (u'WEBER', 'StreetName'), (u'RD', 'StreetNamePostType'), (u'APT', 'OccupancyType'), (u'108', 'OccupancyIdentifier'), (u'LOCKPORT,', 'PlaceName'), (u'IL', 'StateName'), (u'60441', 'ZipCode')]

‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

JaredPilbeam2 · ‎10-13-2020

Randy Burton‌

I was trying out the usaddress module. It seems ideal. But, it threw an error that I reported to the author on GitHub. UPDATE: I think the "CHECK" in the string is a note by the editor put in to remind him/her to come back to this particular address for whatever reason.

usaddress.RepeatedLabelError: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING:  CHECK S ST LOUIS ST  LOT 34 ELWOOD, IL 60421
PARSED TOKENS:    [('CHECK', 'StreetName'), ('S', 'StreetNamePostDirectional'), ('ST', 'PlaceName'), ('LOUIS', 'StreetName'), ('ST', 'StreetNamePostType'), ('LOT', 'OccupancyType'), ('34', 'OccupancyIdentifier'), ('ELWOOD,', 'PlaceName'), ('IL', 'StateName'), ('60421', 'ZipCode')]
UNCERTAIN LABEL:  StreetName‍‍‍‍‍‍‍‍‍‍

I put this in a try/except block before I realized you can't continue the code once the try block is interrupted. I'm wondering if you've ever ran into this?

RandyBurton · ‎10-13-2020

The "CHECK" is causing some of the problem. I also noticed on GitHub there is an unanswered question about updating the module for Python 3. I haven't done any testing with that version. However, I have some code that might give you some ideas for the "RepeatedLabelError" and how to deal with it.

import usaddress

def address_exception(a, b):
    d = {}
    print("Bad address: {}".format(b))
    print(a)
    for row in a:
        if row[1] not in d.keys():
            d[row[1]] = 1
        else:
            d[row[1]] += 1
    for k, v in d.items():
        if v > 1:
            print("  {} (repeated {} times)".format(k,v))

cursor = [ 
    ("CHECK S ST LOUIS ST  LOT 34 ELWOOD, IL 60421", "addrNum", "stNm", "zip", 1),
    ("S ST LOUIS ST  LOT 34 ELWOOD, IL 60421", "addrNum", "stNm", "zip", 2),
    ("5318 S 86 CT APT 3 APT 412, OMAHA, NE 68137", "addrNum", "stNm", "zip", 3),
    ("2765 HAZEL ST, OMAHA, NE 68105", "addrNum", "stNm", "zip", 4)
    ]

for addr, addrNum, stNm, zip, oid in cursor: # simulate an UpdateCursor
    try:
        parse = usaddress.tag(addr)[0]
        addrNum = parse.get("AddressNumber", "")
        stNm = parse.get("StreetName", "")
        zip = parse.get("ZipCode", "")
        print('ok', [addr, addrNum, stNm, zip, oid])

    except usaddress.RepeatedLabelError as e :        
        address_exception(e.parsed_string, e.original_string)

    except Exception as e:
        # print('ERROR: {}'.format(type(e))) # may provide some additional info
        print("Unknown Error: oid={}".format(oid))
        print('ERROR', [addr, addrNum, stNm, zip, oid])

''' output:
Bad address: CHECK S ST LOUIS ST  LOT 34 ELWOOD, IL 60421
[(u'CHECK', 'StreetName'), (u'S', 'StreetNamePostDirectional'), (u'ST', 'PlaceName'), (u'LOUIS', 'StreetName'), (u'ST', 'StreetNamePostType'), (u'LOT', 'OccupancyType'), (u'34', 'OccupancyIdentifier'), (u'ELWOOD,', 'PlaceName'), (u'IL', 'StateName'), (u'60421', 'ZipCode')]
  PlaceName (repeated 2 times)
  StreetName (repeated 2 times)
('ok', ['S ST LOUIS ST  LOT 34 ELWOOD, IL 60421', '', u'ST LOUIS', u'60421', 2])
Bad address: 5318 S 86 CT APT 3 APT 412, OMAHA, NE 68137
[(u'5318', 'AddressNumber'), (u'S', 'StreetNamePreDirectional'), (u'86', 'StreetName'), (u'CT', 'StreetNamePostType'), (u'APT', 'OccupancyType'), (u'3', 'OccupancyIdentifier'), (u'APT', 'OccupancyType'), (u'412,', 'OccupancyIdentifier'), (u'OMAHA,', 'PlaceName'), (u'NE', 'StateName'), (u'68137', 'ZipCode')]
  OccupancyIdentifier (repeated 2 times)
  OccupancyType (repeated 2 times)
('ok', ['2765 HAZEL ST, OMAHA, NE 68105', u'2765', u'HAZEL', u'68105', 4])
'''‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I have also used the regular expressions module along with dictionaries to do some pre-cleanup before passing the address to the usaddress module. These are mostly things like substituting 'STE' for 'Suite' or 'N' for 'North'. This basically helps standardize the address to USPS guidelines.

The usaddress module isn't perfect, but it can be very helpful.