Very simply if I have the following sentence:
"RYAN O'NEAL AND JERRY O'CONNELL WERE ON CONAN O'BRIEN."
I want to get
"Ryan O'Neal and Jerry O'Connell were on Conan O'Brien."
I would like to accomplish this with Python in the Field Calculator....
I can get O' so close (get it?), but I keep getting it O' so wrong (last one I promise)
Thank you for your time.
Solved! Go to Solution.
I had issues like this with street names. Luckily, the list of names that don't play well with str.title() is fairly small in my case so I made a dictionary of them.
This is the code I use for that, maybe you can get an idea (or, maybe even come up with a way to populate the dictionary with python (split at the ', title() the splits, then append them back together as titlecased or something).
Some extra code in here as this code also makes sure streets with N,S, etc. stay capitalized, and streets like 1st street aren't (1ST) (so that the upperWords and lowerWords makes sense).
def CalcStreetLabels():
fields = ['DirectPrefix', 'StreetName','StreetLabels']
upperWords = ["N","NW","W","SW","S","SE","E","NE","US","PR"]
lowerWords = ["1ST","2ND","3RD","4TH","5TH","6TH","7TH","8TH","9TH","10TH","11TH","12TH","13TH"]
subDict = {
"MCMURRAY ST": "McMurray St",
"MCMURRAY AVE": "McMurray Ave",
"MCINTOSH CT": "McIntosh Ct",
"MCEWAN DR": "McEwan Dr",
"MCPHERSON AVE": "McPherson Ave",
"MCCLELLAN ST": "McClellen St",
"BY-PASS": "Bypass Hwy SR 240",
"SR 240": "SR 240",
"O'CONNOR ST":"O'Connor St",
"MCMURRAY": "McMurray",
"MCINTOSH": "McIntosh",
"MCEWAN": "McEwan",
"MCPHERSON": "McPherson",
"MCCLELLAN": "McClellen",
"O'CONNOR":"O'Connor"
}
with arcpy.da.UpdateCursor(infc,fields) as uCur:
words = []
for row in uCur:
label = ""
if row[1]:
words =((' '.join(filter(None, [row[0], row[1]]))).split(" "))
for word in words:
if word in upperWords:
label += word.upper() + " "
elif word in lowerWords:
label += word.lower() + " "
else:
label += str(word).title() + " "
if subDict.has_key(row[1]):
label = subDict[row[1]]
row[2] = label
if label:
uCur.updateRow(row)
In case it helps any,
R_
normally "title" but you have to de-title the non-names
"{} and {} were on {}".format(*[i.title() for i in names])
"Ryan O'Neal and Jerry O'Connell were on Conan O'Brien"
leaving you to build your sentence from a list of names and sentence fluff as you go
BUT in the field calculator... not really, just "title" the string, at least it looks O'somuchbetter than the full caps
What you are after is called true casing/truecasing, and it is a natural language processing problem. The Python built-in string type does not have any method for doing such handling of a string. There are quite a few Python packages that do truecasing, but you would have to install them and then write a fairly involved code block expression.
I am with Dan, str.title() is probably the best compromise in terms of effort to results.
I see the value of what you are saying.
I was hoping to use something like this, but its running up against the limits of my understanding of regular expressions in Python.
I'm just so close I don't want to give up on it.
import re
instring = "RYAN O'NEAL AND JERRY O'CONNELL WERE ON CONAN O'BRIEN."
print("Original: " + instring)
instring = instring.capitalize()
print("Sentence Case: " + instring)
if re.search(r"\b"+"O'"+'[a-z]', instring, re.IGNORECASE): instring = re.sub("(^'O'|['])\s*([a-zA-Z])", lambda p: p.group(0).upper(), instring)
print("Correct O\' names: " + instring)
It gets me here (so close but yet so far).
This appears to be for an exercise, so does the solution only need to work for this one sentence, or does it need to be generally applicable to strings? If the former, than you can do something like:
>>> s = "RYAN O'NEAL AND JERRY O'CONNELL WERE ON CONAN O'BRIEN."
>>> lower_list = ["and", "were", "on"]
>>> " ".join(
... i.lower() if i.lower() in lower_list else i.title()
... for i in s.split()
... )
"Ryan O'Neal and Jerry O'Connell were on Conan O'Brien."
>>>
All O' names in this example, but ideally a solution that can be applied to all names that start with more than just a capital letter (I.E. Mc, O', St., etc) I have thousands of comment fields that were traditionally all caps. We are transitioning to sentence case. The comments have a lot of street and person names embedded, so we wind up with things like "June paton stated it was 100 ft of rpc along the west side of 1st st." ... sure you can read it, but it looks wrong. I've corrected a bunch with "exception" lists (basically search and replace pairs of if you find this replace it with this) but names are proving difficult since Bill and water bill are two different things.
I had issues like this with street names. Luckily, the list of names that don't play well with str.title() is fairly small in my case so I made a dictionary of them.
This is the code I use for that, maybe you can get an idea (or, maybe even come up with a way to populate the dictionary with python (split at the ', title() the splits, then append them back together as titlecased or something).
Some extra code in here as this code also makes sure streets with N,S, etc. stay capitalized, and streets like 1st street aren't (1ST) (so that the upperWords and lowerWords makes sense).
def CalcStreetLabels():
fields = ['DirectPrefix', 'StreetName','StreetLabels']
upperWords = ["N","NW","W","SW","S","SE","E","NE","US","PR"]
lowerWords = ["1ST","2ND","3RD","4TH","5TH","6TH","7TH","8TH","9TH","10TH","11TH","12TH","13TH"]
subDict = {
"MCMURRAY ST": "McMurray St",
"MCMURRAY AVE": "McMurray Ave",
"MCINTOSH CT": "McIntosh Ct",
"MCEWAN DR": "McEwan Dr",
"MCPHERSON AVE": "McPherson Ave",
"MCCLELLAN ST": "McClellen St",
"BY-PASS": "Bypass Hwy SR 240",
"SR 240": "SR 240",
"O'CONNOR ST":"O'Connor St",
"MCMURRAY": "McMurray",
"MCINTOSH": "McIntosh",
"MCEWAN": "McEwan",
"MCPHERSON": "McPherson",
"MCCLELLAN": "McClellen",
"O'CONNOR":"O'Connor"
}
with arcpy.da.UpdateCursor(infc,fields) as uCur:
words = []
for row in uCur:
label = ""
if row[1]:
words =((' '.join(filter(None, [row[0], row[1]]))).split(" "))
for word in words:
if word in upperWords:
label += word.upper() + " "
elif word in lowerWords:
label += word.lower() + " "
else:
label += str(word).title() + " "
if subDict.has_key(row[1]):
label = subDict[row[1]]
row[2] = label
if label:
uCur.updateRow(row)
In case it helps any,
R_
I have very similar code for the exceptions and lower and upper, but I like your if subDict.has_key approach better than my for x in approach so I'm stealing that. 🙂
Not an re guy, but this seems to work with my street names:
import arcpy
fc = r'C:\_ESRI\GDB\Landbase.gdb\Centerlines' # Feature class with table
field = 'StreetName' # Field I want to update
ll=[] # Empty list for holding name
with arcpy.da.UpdateCursor(fc,field) as cursor:
for row in cursor:
if "'" in row[0]: # see if apostrophe is in the name
ll.append(row[0]) # if so, append to empty list
for w in ll:
g = w.split("'") # Split the name at the apostrophe
row[0] = g[0].title() + "'" + g[1].title() # concatenate them back together as title() with apostrophe in between
cursor.updateRow(row) # update the name
ll = [] # empty the list for next value
Of course, this operates on a single field with the name. But, should be able to modify to iterate through the words in a sentence also.
R_