I have a field that has multiple names for the same company, for several companies. I would like to build a script to replace the name for each company...
example: replace( [OPERATOR],"ANADARKO E & P ONSHORE LLC","ANADARKO")
Find | Replace |
ANADARKO E & P ONSHORE LLC | ANADARKO |
ANADARKO E&P ONSHORE | ANADARKO |
ANADARKO PET | ANADARKO |
ANADARKO PET CORP | ANADARKO |
ANADARKO PETROLEUM CORPORATION | ANADARKO |
CHESAPEAKE LIMITED PARTNERSHIP | CHESAPEAKE |
CHESAPEAKE OPERATING INC. | CHESAPEAKE |
CHESAPEAKE OPERATING INCORPORATED | CHESAPEAKE |
CHESAPEAKE OPERATING, INC. | CHESAPEAKE |
CHESAPEAKE OPERG INC | CHESAPEAKE |
How can I build a single script for all replacements?
Thanks!
So the data you attached really explains the situation you have. I assume you don't only what to correct 'ANADARKO' and 'CHESAPEAKE', but there is a large list of cases where multiple notations are used for the same operator.
I did a summarize on the operator field and found 748 unique values. How many operators are there? When I take the first word from the operator name and summarized again (including first and last operator and sum of cnt operator):
It returns 446 unique values (including things that don't make sense). There may be 8 different notations for DEVON, but maybe they are not all the same.
Do you have an official list of all the valid operators? In that case you could do something like determining the score with string comparison as explained here: python - Fuzzy String Comparison - Stack Overflow
The long manual way, but more secure method, would be to summarize on Operator and in the resulting table, add a field with the correct name (editing the 748 records!) and join the result back to the original table.
Question: how was this data generated? It would be good to revise the way the data was captures to avoid this work the next time by using for instance a domain with valid names for operators (although having a drop down list that has nearly 500 elements is no pleasure to work with).
visual guide...