Fix O' names using Python in Field Calculator

KimberlyGarbade · ‎04-13-2022

Very simply if I have the following sentence:

"RYAN O'NEAL AND JERRY O'CONNELL WERE ON CONAN O'BRIEN."

I want to get

"Ryan O'Neal and Jerry O'Connell were on Conan O'Brien."

I would like to accomplish this with Python in the Field Calculator....

I can get O' so close (get it?), but I keep getting it O' so wrong (last one I promise)

Thank you for your time.

RhettZufelt · ‎04-13-2022

I had issues like this with street names. Luckily, the list of names that don't play well with str.title() is fairly small in my case so I made a dictionary of them.

This is the code I use for that, maybe you can get an idea (or, maybe even come up with a way to populate the dictionary with python (split at the ', title() the splits, then append them back together as titlecased or something).

Some extra code in here as this code also makes sure streets with N,S, etc. stay capitalized, and streets like 1st street aren't (1ST) (so that the upperWords and lowerWords makes sense).

def CalcStreetLabels():
    fields =  ['DirectPrefix', 'StreetName','StreetLabels']
    upperWords = ["N","NW","W","SW","S","SE","E","NE","US","PR"]
    lowerWords = ["1ST","2ND","3RD","4TH","5TH","6TH","7TH","8TH","9TH","10TH","11TH","12TH","13TH"]

    subDict = { 
                "MCMURRAY ST": "McMurray St",
                "MCMURRAY AVE": "McMurray Ave",
                "MCINTOSH CT": "McIntosh Ct",
                "MCEWAN DR": "McEwan Dr",
                "MCPHERSON AVE": "McPherson Ave",
                "MCCLELLAN ST": "McClellen St",
                "BY-PASS": "Bypass Hwy         SR 240",
                "SR 240": "SR 240",
                "O'CONNOR ST":"O'Connor St",
                "MCMURRAY": "McMurray",
                "MCINTOSH": "McIntosh",
                "MCEWAN": "McEwan",
                "MCPHERSON": "McPherson",
                "MCCLELLAN": "McClellen",
                "O'CONNOR":"O'Connor"
                }
    with arcpy.da.UpdateCursor(infc,fields) as uCur:
        words = [] 
        for row in uCur:
                    label = ""
       
                    if row[1]:
                       words =((' '.join(filter(None, [row[0], row[1]]))).split(" "))
                    for word in words:
                        if word in upperWords:
                            label += word.upper() + " "
                        elif word in lowerWords:
                            label += word.lower() + " "
                        else:
                           label += str(word).title() + " "
                    if subDict.has_key(row[1]):
                        label = subDict[row[1]]
                    row[2] = label   
                    if label:
                       uCur.updateRow(row)

In case it helps any,

R_

View solution in original post

DanPatterson · ‎04-13-2022

normally "title" but you have to de-title the non-names

"{} and {} were on {}".format(*[i.title() for i in names])
"Ryan O'Neal and Jerry O'Connell were on Conan O'Brien"

leaving you to build your sentence from a list of names and sentence fluff as you go

BUT in the field calculator... not really, just "title" the string, at least it looks O'somuchbetter than the full caps

... sort of retired...

JoshuaBixby · ‎04-13-2022

What you are after is called true casing/truecasing, and it is a natural language processing problem. The Python built-in string type does not have any method for doing such handling of a string. There are quite a few Python packages that do truecasing, but you would have to install them and then write a fairly involved code block expression.

I am with Dan, str.title() is probably the best compromise in terms of effort to results.

KimberlyGarbade · ‎04-13-2022

I see the value of what you are saying.

I was hoping to use something like this, but its running up against the limits of my understanding of regular expressions in Python.

I'm just so close I don't want to give up on it.

import re

instring = "RYAN O'NEAL AND JERRY O'CONNELL WERE ON CONAN O'BRIEN."
print("Original: " + instring)

instring = instring.capitalize()
print("Sentence Case: " + instring)

if re.search(r"\b"+"O'"+'[a-z]', instring, re.IGNORECASE): instring = re.sub("(^'O'|['])\s*([a-zA-Z])", lambda p: p.group(0).upper(), instring)

print("Correct O\' names: " + instring)

It gets me here (so close but yet so far).

JoshuaBixby · ‎04-13-2022

This appears to be for an exercise, so does the solution only need to work for this one sentence, or does it need to be generally applicable to strings? If the former, than you can do something like:

>>> s = "RYAN O'NEAL AND JERRY O'CONNELL WERE ON CONAN O'BRIEN."
>>> lower_list = ["and", "were", "on"]
>>> " ".join(
...   i.lower() if i.lower() in lower_list else i.title()
...   for i in s.split()
... )
"Ryan O'Neal and Jerry O'Connell were on Conan O'Brien."
>>>

KimberlyGarbade · ‎04-13-2022

All O' names in this example, but ideally a solution that can be applied to all names that start with more than just a capital letter (I.E. Mc, O', St., etc) I have thousands of comment fields that were traditionally all caps. We are transitioning to sentence case. The comments have a lot of street and person names embedded, so we wind up with things like "June paton stated it was 100 ft of rpc along the west side of 1st st." ... sure you can read it, but it looks wrong. I've corrected a bunch with "exception" lists (basically search and replace pairs of if you find this replace it with this) but names are proving difficult since Bill and water bill are two different things.

RhettZufelt · ‎04-13-2022

I had issues like this with street names. Luckily, the list of names that don't play well with str.title() is fairly small in my case so I made a dictionary of them.

This is the code I use for that, maybe you can get an idea (or, maybe even come up with a way to populate the dictionary with python (split at the ', title() the splits, then append them back together as titlecased or something).

Some extra code in here as this code also makes sure streets with N,S, etc. stay capitalized, and streets like 1st street aren't (1ST) (so that the upperWords and lowerWords makes sense).

def CalcStreetLabels():
    fields =  ['DirectPrefix', 'StreetName','StreetLabels']
    upperWords = ["N","NW","W","SW","S","SE","E","NE","US","PR"]
    lowerWords = ["1ST","2ND","3RD","4TH","5TH","6TH","7TH","8TH","9TH","10TH","11TH","12TH","13TH"]

    subDict = { 
                "MCMURRAY ST": "McMurray St",
                "MCMURRAY AVE": "McMurray Ave",
                "MCINTOSH CT": "McIntosh Ct",
                "MCEWAN DR": "McEwan Dr",
                "MCPHERSON AVE": "McPherson Ave",
                "MCCLELLAN ST": "McClellen St",
                "BY-PASS": "Bypass Hwy         SR 240",
                "SR 240": "SR 240",
                "O'CONNOR ST":"O'Connor St",
                "MCMURRAY": "McMurray",
                "MCINTOSH": "McIntosh",
                "MCEWAN": "McEwan",
                "MCPHERSON": "McPherson",
                "MCCLELLAN": "McClellen",
                "O'CONNOR":"O'Connor"
                }
    with arcpy.da.UpdateCursor(infc,fields) as uCur:
        words = [] 
        for row in uCur:
                    label = ""
       
                    if row[1]:
                       words =((' '.join(filter(None, [row[0], row[1]]))).split(" "))
                    for word in words:
                        if word in upperWords:
                            label += word.upper() + " "
                        elif word in lowerWords:
                            label += word.lower() + " "
                        else:
                           label += str(word).title() + " "
                    if subDict.has_key(row[1]):
                        label = subDict[row[1]]
                    row[2] = label   
                    if label:
                       uCur.updateRow(row)

In case it helps any,

R_

KimberlyGarbade · ‎04-13-2022

I have very similar code for the exceptions and lower and upper, but I like your if subDict.has_key approach better than my for x in approach so I'm stealing that. 🙂

RhettZufelt · ‎04-13-2022

Not an re guy, but this seems to work with my street names:

import arcpy

fc = r'C:\_ESRI\GDB\Landbase.gdb\Centerlines'       # Feature class with table
field = 'StreetName'                                # Field I want to update
ll=[]                                               # Empty list for holding name


with arcpy.da.UpdateCursor(fc,field) as cursor:
    for row in cursor:
        if "'" in row[0]:                           # see if apostrophe is in the name
            ll.append(row[0])                       # if so, append to empty list
            for w in ll:
                g = w.split("'")                    # Split the name at the apostrophe
            row[0] = g[0].title() + "'" + g[1].title()   # concatenate them back together as title() with apostrophe in between
            cursor.updateRow(row)                   # update the name
            ll = []                                 # empty the list for next value

Of course, this operates on a single field with the name. But, should be able to modify to iterate through the words in a sentence also.

R_

Luke_Pinner · ‎04-14-2022

https://github.com/ppannuto/python-titlecase