I have a field in a .dbf file with a long text string. I need to delete x amount of characters WITHIN the text string. I am also not very good with programming so I haven't looked into creating a script.
Essentially I have this:
aaa bbb ccc ddd eee fff ggg
What I need is this:
aaa bbb ccc fff ggg
I'm attaching the .dbf in MS Excel format (.xls). I have had some luck with functions in MS Excel but I can't get the consistency I need.
The REPLACE function might work for this:
Excel formula: Remove text by position
Original text | Formula | New text |
---|---|---|
NAP NAP 012.039 DH 00.449 LINCOLN AVE OC 21-89 | =REPLACE(A1,4,12,"") | NAP DH 00.449 LINCOLN AVE OC 21-89 |
Stephen,
Thanks for your efforts but I have been married to MS Excel for the last 4 days!!!
=0
The main reason I was posting on the ArcGIS Forum is that I wanted to hope beyond all hope that maybe there is a VBA or Python script out there that would do what I’m trying to do in MS Excel?
Sheet 2 of Naplables0403 is closer to what I am wanting. I can do nested SUBSTITUTE functions to replace the text strings but I can’t get one set of SUBSTITUTE functions to repeat because somehow its text-string specific. i.e., DH 00.334 can only be deleted from the first text string, but even if I put an instance number of 2 (where I’m hoping the second string can be replaced), it doesn’t do that—it only replaces the text its referring to!
Perhaps a better way of putting this is how can I replace substrings of text of varying lengths with spaces between varying locations of spaces??
As in:
Aaa bbb ccc ddd eee fff
Would result in
Aaa bbb eee fff
But then
AAA BBB CCC DD EEE FF GGG
Could result in
AAA BBB CCC FF GG
Daniel T. Brenner
RESEARCH ANALYST I (GEOGRAPHIC INFORMATION SYSTEMS)
DIVISION OF RESEARCH, INNOVATION, and SYSTEM INFORMATION
Highway System Information and Performance
1120 N Street MS 38
Sacramento, CA 95814
(916) 657-4277
(510) 593-7547 cell
daniel.brenner@dot.ca.gov<mailto:daniel.brenner@dot.ca.gov>
“Consider the unconsidered.”
“Certainty is not a sure thing!”
Are there a lot of different substrings you want to replace? I'm thinking maybe you could use a Python dictionary, but that would require explicit definition of substrings and their replacements. Since the substrings aren't always replaced with the same text, you'd also have to have some sort of decision-making in the code.
Yes there are a lot of different values I need to replace of varying lengths (see the parts of the text strings beginning with ‘DH’ or the one with ‘E DH’….)
I haven’t worked with Python Dictionaries. Could you give me an example of what you are talking about?
Dictionaries are collections of key:value pairs, for example:
d = {"NAP NAP R": "NAP-029-R", "DH 00.334": ""}
Using this dictionary, d["NAP NAP R"] would give you the value "NAP-029-R", and d["00.334"] would give you an empty string. Using a dictionary you could loop through the rows and do replaces based on dictionary values.
Using the field calculator in ArcGIS Pro, for example:
Expression: mod = mod_substr(!orig!)
Code Block:
import re
def mod_substr(orig):
d = {"NAP NAP R": "NAP-029-R",
"DH 00.334": "",
"NAP NAP 0": "NAP-121-0",
"NAP YNTV 0": "NAP-121-0",
"NAP R 006": "NAP-121-R 006"}
tmp_str = orig
# Replace substrings if they exist as a key in the dictionary
for key in d.keys():
if key in tmp_str:
tmp_str = tmp_str.replace(key, d[key])
# Remove duplicate spaces
tmp_str = re.sub(' +', ' ', tmp_str)
return tmp_str
May not be feasible though, since you'd have to define each substring and its replacement.
Stephen,
Thanks big time for this; this stuff is huge but I will do what I can to follow through on it.
But yes, your last line does get at one of the issues that Excel does much better on. That was one of (if not THE) core issues, was that what would have represented X number of characters at varying lengths and varying places in a text string. Like you’re saying, to have to put in each one of the text strings that I wanted to conver would be kind of bulky.
But at the same time some of what you’re doing is what I came up with in terms of character substrings and how to replace them—See attached!
My last questions to you are how did you learn this and where did you get the opportunity to do so? Was it on the job or in a class?
Thanks again and again,
Daniel T. Brenner
RESEARCH ANALYST I (GEOGRAPHIC INFORMATION SYSTEMS)
DIVISION OF RESEARCH, INNOVATION, and SYSTEM INFORMATION
Highway System Information and Performance
1120 N Street MS 38
Sacramento, CA 95814
(916) 657-4277
(510) 593-7547 cell
daniel.brenner@dot.ca.gov<mailto:daniel.brenner@dot.ca.gov>
“Consider the unconsidered.”
“Certainty is not a sure thing!”
Looking at that attachment, regular expressions might be useful. If you're looking to get rid of all text matching the pattern "DH ##.###", for example:
What I know of Python has been self-taught, but I did have a baseline established from studying other programming languages. I took a handful of programming classes in college, then started tinkering with Python when I was studying GIS. I've been able to continue tinkering in my job, using it to automate processes and try to solve problems.
Lucky you.
I’ve been married to MS Excel and watch my GIS career gently weep.
More soon.
I don't understand what criteria/condition is driving the replacement text. For example, when
NAP NAP R 004.501 DH 00.334 VIA IMOLA AVE
gets changed to
NAP-121-R 004.501 DH 00.334 VIA IMOLA AVE
, " NAP " is getting changed to "-121-" .
However, when
NAP NAP 010.698 DH 00.501 OLD SONOMA RD OC 21 87
gets changed to
NAP-029-010.698 OLD SONOMA RD OC 21 87
, " NAP " is getting changed to "-029-".
How are you deciding when " NAP " gets changed to one value versus another?