Fix O' names using Python in Field Calculator

1238
11
Jump to solution
04-13-2022 01:25 PM
KimGarbade
Occasional Contributor III

Very simply if I have the following sentence:

"RYAN O'NEAL AND JERRY O'CONNELL WERE ON CONAN O'BRIEN."

I want to get

"Ryan O'Neal and Jerry O'Connell were on Conan O'Brien."

I would like to accomplish this with Python in the Field Calculator....

I can get O' so close (get it?), but I keep getting it O' so wrong (last one I promise)

Thank you for your time.

0 Kudos
1 Solution

Accepted Solutions
RhettZufelt
MVP Frequent Contributor

I had issues like this with street names.  Luckily, the list of names that don't play well with str.title() is fairly small in my case so I made a dictionary of them.

This is the code I use for that, maybe you can get an idea (or, maybe even come up with a way to populate the dictionary with python (split at the ', title() the splits, then append them back together as titlecased or something).

Some extra code in here as this code also makes sure streets with N,S, etc. stay capitalized, and streets like 1st street aren't (1ST) (so that the upperWords and lowerWords makes sense).

def CalcStreetLabels():
    fields =  ['DirectPrefix', 'StreetName','StreetLabels']
    upperWords = ["N","NW","W","SW","S","SE","E","NE","US","PR"]
    lowerWords = ["1ST","2ND","3RD","4TH","5TH","6TH","7TH","8TH","9TH","10TH","11TH","12TH","13TH"]

    subDict = { 
                "MCMURRAY ST": "McMurray St",
                "MCMURRAY AVE": "McMurray Ave",
                "MCINTOSH CT": "McIntosh Ct",
                "MCEWAN DR": "McEwan Dr",
                "MCPHERSON AVE": "McPherson Ave",
                "MCCLELLAN ST": "McClellen St",
                "BY-PASS": "Bypass Hwy         SR 240",
                "SR 240": "SR 240",
                "O'CONNOR ST":"O'Connor St",
                "MCMURRAY": "McMurray",
                "MCINTOSH": "McIntosh",
                "MCEWAN": "McEwan",
                "MCPHERSON": "McPherson",
                "MCCLELLAN": "McClellen",
                "O'CONNOR":"O'Connor"
                }
    with arcpy.da.UpdateCursor(infc,fields) as uCur:
        words = [] 
        for row in uCur:
                    label = ""
       
                    if row[1]:
                       words =((' '.join(filter(None, [row[0], row[1]]))).split(" "))
                    for word in words:
                        if word in upperWords:
                            label += word.upper() + " "
                        elif word in lowerWords:
                            label += word.lower() + " "
                        else:
                           label += str(word).title() + " "
                    if subDict.has_key(row[1]):
                        label = subDict[row[1]]
                    row[2] = label   
                    if label:
                       uCur.updateRow(row)

 

In case it helps any,

R_

 

View solution in original post

11 Replies
DanPatterson
MVP Esteemed Contributor

normally "title" but you have to de-title the non-names

 

"{} and {} were on {}".format(*[i.title() for i in names])
"Ryan O'Neal and Jerry O'Connell were on Conan O'Brien"

 

 leaving you to build your sentence from a list of names and sentence fluff as you go 

BUT in the field calculator... not really, just "title" the string, at least it looks O'somuchbetter than the full caps


... sort of retired...
JoshuaBixby
MVP Esteemed Contributor

What you are after is called true casing/truecasing, and it is a natural language processing problem.   The Python built-in string type does not have any method for doing such handling of a string.  There are quite a few Python packages that do truecasing, but you would have to install them and then write a fairly involved code block expression.

I am with Dan, str.title() is probably the best compromise in terms of effort to results.

KimGarbade
Occasional Contributor III

I see the value of what you are saying. 

I was hoping to use something like this, but its running up against the limits of my understanding of regular expressions in Python.

I'm just so close I don't want to give up on it.

import re

instring = "RYAN O'NEAL AND JERRY O'CONNELL WERE ON CONAN O'BRIEN."
print("Original: " + instring)

instring = instring.capitalize()
print("Sentence Case: " + instring)

if re.search(r"\b"+"O'"+'[a-z]', instring, re.IGNORECASE): instring = re.sub("(^'O'|['])\s*([a-zA-Z])", lambda p: p.group(0).upper(), instring)

print("Correct O\' names: " + instring)

 

It gets me here (so close but yet so far).

KimGarbade_0-1649885010394.png

 

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

This appears to be for an exercise, so does the solution only need to work for this one sentence, or does it need to be generally applicable to strings?  If the former, than you can do something like:

>>> s = "RYAN O'NEAL AND JERRY O'CONNELL WERE ON CONAN O'BRIEN."
>>> lower_list = ["and", "were", "on"]
>>> " ".join(
...   i.lower() if i.lower() in lower_list else i.title()
...   for i in s.split()
... )
"Ryan O'Neal and Jerry O'Connell were on Conan O'Brien."
>>> 
KimGarbade
Occasional Contributor III

All O' names in this example, but ideally a solution that can be applied to all names that start with more than just a capital letter (I.E. Mc, O', St., etc)  I have thousands of comment fields that were traditionally all caps. We are transitioning to sentence case. The comments have a lot of street and person names embedded, so we wind up with things like "June paton stated it was 100 ft of rpc along the west side of 1st st." ... sure you can read it, but it looks wrong. I've corrected a bunch with "exception" lists (basically search and replace pairs of if you find this replace it with this) but names are proving difficult since Bill and water bill are two different things.  

0 Kudos
RhettZufelt
MVP Frequent Contributor

I had issues like this with street names.  Luckily, the list of names that don't play well with str.title() is fairly small in my case so I made a dictionary of them.

This is the code I use for that, maybe you can get an idea (or, maybe even come up with a way to populate the dictionary with python (split at the ', title() the splits, then append them back together as titlecased or something).

Some extra code in here as this code also makes sure streets with N,S, etc. stay capitalized, and streets like 1st street aren't (1ST) (so that the upperWords and lowerWords makes sense).

def CalcStreetLabels():
    fields =  ['DirectPrefix', 'StreetName','StreetLabels']
    upperWords = ["N","NW","W","SW","S","SE","E","NE","US","PR"]
    lowerWords = ["1ST","2ND","3RD","4TH","5TH","6TH","7TH","8TH","9TH","10TH","11TH","12TH","13TH"]

    subDict = { 
                "MCMURRAY ST": "McMurray St",
                "MCMURRAY AVE": "McMurray Ave",
                "MCINTOSH CT": "McIntosh Ct",
                "MCEWAN DR": "McEwan Dr",
                "MCPHERSON AVE": "McPherson Ave",
                "MCCLELLAN ST": "McClellen St",
                "BY-PASS": "Bypass Hwy         SR 240",
                "SR 240": "SR 240",
                "O'CONNOR ST":"O'Connor St",
                "MCMURRAY": "McMurray",
                "MCINTOSH": "McIntosh",
                "MCEWAN": "McEwan",
                "MCPHERSON": "McPherson",
                "MCCLELLAN": "McClellen",
                "O'CONNOR":"O'Connor"
                }
    with arcpy.da.UpdateCursor(infc,fields) as uCur:
        words = [] 
        for row in uCur:
                    label = ""
       
                    if row[1]:
                       words =((' '.join(filter(None, [row[0], row[1]]))).split(" "))
                    for word in words:
                        if word in upperWords:
                            label += word.upper() + " "
                        elif word in lowerWords:
                            label += word.lower() + " "
                        else:
                           label += str(word).title() + " "
                    if subDict.has_key(row[1]):
                        label = subDict[row[1]]
                    row[2] = label   
                    if label:
                       uCur.updateRow(row)

 

In case it helps any,

R_

 

KimGarbade
Occasional Contributor III

I have very similar code for the exceptions and lower and upper, but I like your if subDict.has_key approach better than my for x in approach so I'm stealing that. 🙂  

0 Kudos
RhettZufelt
MVP Frequent Contributor

Not an re guy, but this seems to work with my street names:

 

 

import arcpy

fc = r'C:\_ESRI\GDB\Landbase.gdb\Centerlines'       # Feature class with table
field = 'StreetName'                                # Field I want to update
ll=[]                                               # Empty list for holding name


with arcpy.da.UpdateCursor(fc,field) as cursor:
    for row in cursor:
        if "'" in row[0]:                           # see if apostrophe is in the name
            ll.append(row[0])                       # if so, append to empty list
            for w in ll:
                g = w.split("'")                    # Split the name at the apostrophe
            row[0] = g[0].title() + "'" + g[1].title()   # concatenate them back together as title() with apostrophe in between
            cursor.updateRow(row)                   # update the name
            ll = []                                 # empty the list for next value

 

 

Of course, this operates on a single field with the name.  But, should be able to modify to iterate through the words in a sentence also.

R_

0 Kudos
Luke_Pinner
MVP Regular Contributor