Fix O' names using Python in Field Calculator

1317
11
Jump to solution
04-13-2022 01:25 PM
Labels (1)
KimGarbade
Occasional Contributor III

Very simply if I have the following sentence:

"RYAN O'NEAL AND JERRY O'CONNELL WERE ON CONAN O'BRIEN."

I want to get

"Ryan O'Neal and Jerry O'Connell were on Conan O'Brien."

I would like to accomplish this with Python in the Field Calculator....

I can get O' so close (get it?), but I keep getting it O' so wrong (last one I promise)

Thank you for your time.

0 Kudos
11 Replies
JoshuaBixby
MVP Esteemed Contributor

That Python package expands, a little, on the built-in str.title() functionality so I don't know if it really offers much here since the OP is after truecase conversion.

0 Kudos
KimGarbade
Occasional Contributor III

I finally had a chance to circle back on this one.

This code works for "O'" names but like some many have discovered before me when you are dealing with hundreds of thousands of comments and 60,000 street names and embedded organization acronyms and proper names.... there are just too many exceptions to handle.  I think the solution might just be title case and call it a day.  Not optimum, but trying to achieve optimum is probably wasted effort in this case.  At least I got to mess around with re and lambda functions.

import re

test = "RYAN O'NEAL AND JERRY O'CONNELL WERE ON CONAN O'BRIEN."
test2 = test.capitalize()

wholeExceptions = [
   ("jerry","Jerry"),
   ("ryan","Ryan"),
   ("conan","Conan")
]


def subsmade (instring):


  #Handle exception that are whole words
  for x in wholeExceptions:
      if re.search(r"\b" + re.escape(x[0]) + r"\b", instring): 
          instring = re.sub(r'\b'+x[0]+r'\b',x[1],instring)

      

  if re.search(r"\b"+"O'"+'[a-z]', instring, re.IGNORECASE): instring = re.sub(r"([o]['][a-zA-Z]|[O]['][a-z])", lambda p: p.group(0).upper(), instring)
  return instring

print (subsmade(test2))
# prints "Ryan O'Neal and Jerry O'Connell were on Conan O'Brien."

 

0 Kudos