I am trying to remove prefixes from a table. I have the following code but it removes more than just street prefix. I my code I have E "space", "E ". I need to be able to strip the prefixes so I can sort, then delete duplicates. How can I remove just the prefixes?
with arcpy.da.UpdateCursor(table1,'STREET') as cursor:
for row in cursor:
#print row[0]
if row[0].startswith("S "):
#print "Deleting S"
row [0] = row[0].lstrip('S ')
elif row[0].startswith("E "):
#print "Deleting E"
row [0] = row[0].lstrip('E ')
elif row[0].startswith("W "):
#print "Deleting W"
row [0] = row[0].lstrip('W ')
elif row[0].startswith("N "):
#print "Deleting N"
row [0] = row[0].lstrip('N ')
cursor.updateRow(row)
del cursor
After code runs i get.
Before Code | After code |
E Explorer | xplorer |
E Expedition | xpedtion |
E Exective | xecutive |
E Exchange | xchange |
Solved! Go to Solution.
lstrip is treating "E " as being ["E", " "] and stripping all instances of those two characters from the left of the string, until it reaches a character other than those two.
Try this:
if (row[0].startswith('E ')
or row[0].startswith('W ')
or row[0].startswith('N ')
or row[0].startswith('S ')
):
# print('Deleting Cardinal')
row[0] = row[0][2:]
cursor.updateRow(row) # N.B. if you indent this INSIDE the if statement,
# then it won't update any value it doesn't have
# to which can be handy if you track edits, since
# you didn't technically have anything to change
# on the ones without a cardinal direction
### The ones below this are optional, if you have full-word cardinals on your prefixes.
elif (row[0].startswith('East ')
or row[0].startswith('West ')
):
# print('Deleting Cardinal')
row[0] = row[0][4:]
elif (row[0].startswith('North ')
or row[0].startswith('South ')
):
# print('Deleting Cardinal')
row[0] = row[0][5:]
Since you're always looking for a 2-character* substring at the beginning, you know if you found it that you only need the remainder of the substring after it. [2:] tells it to go to index 2 (the character after your space), and then just give you the rest of the string from there.
* The elif statements at line 15 & line 20 look for 4- and 5-character substrings, so the indices at lines 19 & 24 also change—assuming you need them.
EDIT: Unrelated sidenote, it is surprisingly a pain to edit code block statements for typos on this forum. Apologies to anyone who read it before I caught them.
I'm too lazy to fix the typos in the codeblock right now, but it just occurred to me that the indices for the full-word ones are wrong, because I forgot to count the space.
Line 19:
row[0] = row[0][5:]
Line 24:
row[0] = row[0][6:]
lstrip is treating "E " as being ["E", " "] and stripping all instances of those two characters from the left of the string, until it reaches a character other than those two.
Try this:
if (row[0].startswith('E ')
or row[0].startswith('W ')
or row[0].startswith('N ')
or row[0].startswith('S ')
):
# print('Deleting Cardinal')
row[0] = row[0][2:]
cursor.updateRow(row) # N.B. if you indent this INSIDE the if statement,
# then it won't update any value it doesn't have
# to which can be handy if you track edits, since
# you didn't technically have anything to change
# on the ones without a cardinal direction
### The ones below this are optional, if you have full-word cardinals on your prefixes.
elif (row[0].startswith('East ')
or row[0].startswith('West ')
):
# print('Deleting Cardinal')
row[0] = row[0][4:]
elif (row[0].startswith('North ')
or row[0].startswith('South ')
):
# print('Deleting Cardinal')
row[0] = row[0][5:]
Since you're always looking for a 2-character* substring at the beginning, you know if you found it that you only need the remainder of the substring after it. [2:] tells it to go to index 2 (the character after your space), and then just give you the rest of the string from there.
* The elif statements at line 15 & line 20 look for 4- and 5-character substrings, so the indices at lines 19 & 24 also change—assuming you need them.
EDIT: Unrelated sidenote, it is surprisingly a pain to edit code block statements for typos on this forum. Apologies to anyone who read it before I caught them.
I'm too lazy to fix the typos in the codeblock right now, but it just occurred to me that the indices for the full-word ones are wrong, because I forgot to count the space.
Line 19:
row[0] = row[0][5:]
Line 24:
row[0] = row[0][6:]
Thanks for the reply. I was coming back to my post to add code that worked for me.
try:
# Update the street names in the table
with arcpy.da.UpdateCursor(table, "STREET") as cursor:
for row in cursor:
street_name = row[0]
# List of common street prefixes to be removed
prefixes = ['N ', 'S ', 'E ', 'W ', 'North ', 'South ', 'East ', 'West ']
# Remove prefixes from the street name
for prefix in prefixes:
if street_name.startswith(prefix):
row[0] = street_name[len(prefix):].strip()
cursor.updateRow(row)
break
Your code did work tho. Again thanks for the reply!
Efficient! I like it.
Just for fun, I tried to see if I could condense this further. Here it is!
prefixes = ['N ', 'S ', 'E ', 'W ', 'North ', 'South ', 'East ', 'West ']
with arcpy.da.UpdateCursor(table, 'STREET') as cursor:
for row in cursor:
row[0] = min([row[0][len(prefix):] if row[0].startswith(prefix) else row[0] for prefix in prefixes], key=len)
cursor.updateRow(row)
First, a list comprehension creates a list of your street name with every prefix attempted to be removed from it. Then, it turns out the min() function accepts a key argument. If you give it the built-in len, it gives you the shortest entry from that list back as a single item.
Since, by definition, every item in our generated list is either the original street name or that street name minus a prefix, the shortest possible will always* be the street name minus any applicable prefix.
*An important caveat here: "N South 5th Street" would return as "South 5th Street", which may or may not be desired. But then, that's a problem of all of the code in this thread.
Also, just to be cheeky, here it is even more condensed. I saved a whole 3 lines! But I haven't tested it. And while I think the logic and syntax are all sound, it's the height of absurdity, anyway. Please don't do this to whoever has to read your code behind you. 😛
with arcpy.da.UpdateCursor(table, 'STREET') as cursor:
[cursor.updateRow(row) for [min([row[0][len(prefix):] if row[0].startswith(prefix) else row[0] for prefix in ['N ', 'S ', 'E ', 'W ', 'North ', 'South ', 'East ', 'West ']], key=len)] in cursor]
As fun as code golf can be, list comprehensions were proposed and accepted "to create lists" (PEP 202 – List Comprehensions | peps.python.org). I think many would argue that using a list comprehension to perform a mapping function isn't very Pythonic.
Another approach would be to use regular expressions. Although regular expressions might be a bit overkill for this specific situation, they are much more flexible to handle more complex situations:
with arcpy.da.UpdateCursor(table1,'STREET') as cursor:
for row in cursor:
row[0] = re.sub(r"^((?:N|S|E|W|North|South|East|West) )+", "", row[0])
cursor.updateRow(row)
del cursor
The above will result in "N South 5th Street" becoming "5th Street". If you don't want double prefixes to be removed, just take the "+" out of the regular expression pattern.