Delete text substring from longer text

691
14
04-02-2019 04:32 PM
DanielBrenner
New Contributor III

I have a field in a .dbf file with a long text string.   I need to delete x amount of characters WITHIN the text string.    I am also not very good with programming so I haven't looked into creating a script.  

Essentially I have this:

aaa bbb ccc ddd eee fff ggg

What I need is this:

aaa bbb ccc fff ggg

I'm attaching the .dbf in MS Excel format (.xls).   I have had some luck with functions in MS Excel but I can't get the consistency I need.

0 Kudos
14 Replies
StephenM
Occasional Contributor II

The REPLACE function might work for this:

Excel formula: Remove text by position

Original textFormulaNew text
NAP NAP 012.039 DH 00.449 LINCOLN AVE OC 21-89=REPLACE(A1,4,12,"")NAP DH 00.449 LINCOLN AVE OC 21-89
0 Kudos
DanielBrenner
New Contributor III

Stephen,

Thanks for your efforts but I have been married to MS Excel for the last 4 days!!!

=0

The main reason I was posting on the ArcGIS Forum is that I wanted to hope beyond all hope that maybe there is a VBA or Python script out there that would do what I’m trying to do in MS Excel?

Sheet 2 of Naplables0403 is closer to what I am wanting. I can do nested SUBSTITUTE functions to replace the text strings but I can’t get one set of SUBSTITUTE functions to repeat because somehow its text-string specific. i.e., DH 00.334 can only be deleted from the first text string, but even if I put an instance number of 2 (where I’m hoping the second string can be replaced), it doesn’t do that—it only replaces the text its referring to!

Perhaps a better way of putting this is how can I replace substrings of text of varying lengths with spaces between varying locations of spaces??

As in:

Aaa bbb ccc ddd eee fff

Would result in

Aaa bbb eee fff

But then

AAA BBB CCC DD EEE FF GGG

Could result in

AAA BBB CCC FF GG

Daniel T. Brenner

RESEARCH ANALYST I (GEOGRAPHIC INFORMATION SYSTEMS)

DIVISION OF RESEARCH, INNOVATION, and SYSTEM INFORMATION

Highway System Information and Performance

1120 N Street MS 38

Sacramento, CA 95814

(916) 657-4277

(510) 593-7547 cell

daniel.brenner@dot.ca.gov<mailto:daniel.brenner@dot.ca.gov>

“Consider the unconsidered.”

“Certainty is not a sure thing!”

0 Kudos
StephenM
Occasional Contributor II

Are there a lot of different substrings you want to replace? I'm thinking maybe you could use a Python dictionary, but that would require explicit definition of substrings and their replacements. Since the substrings aren't always replaced with the same text, you'd also have to have some sort of decision-making in the code.

0 Kudos
DanielBrenner
New Contributor III

Yes there are a lot of different values I need to replace of varying lengths (see the parts of the text strings beginning with ‘DH’ or the one with ‘E DH’….)

I haven’t worked with Python Dictionaries. Could you give me an example of what you are talking about?

0 Kudos
StephenM
Occasional Contributor II

Dictionaries are collections of key:value pairs, for example:

d = {"NAP NAP R": "NAP-029-R", "DH 00.334": ""}

Using this dictionary, d["NAP NAP R"] would give you the value "NAP-029-R", and d["00.334"] would give you an empty string. Using a dictionary you could loop through the rows and do replaces based on dictionary values.

Using the field calculator in ArcGIS Pro, for example:

table showing original strings and strings after replacement

Expression: mod = mod_substr(!orig!)

Code Block:

import re

def mod_substr(orig):
    d = {"NAP NAP R": "NAP-029-R",
         "DH 00.334": "",
         "NAP NAP 0": "NAP-121-0",
         "NAP YNTV 0": "NAP-121-0",
         "NAP R 006": "NAP-121-R 006"}
    tmp_str = orig

    # Replace substrings if they exist as a key in the dictionary
    for key in d.keys():
        if key in tmp_str:
            tmp_str = tmp_str.replace(key, d[key])

    # Remove duplicate spaces
    tmp_str = re.sub(' +', ' ', tmp_str)

    return tmp_str

May not be feasible though, since you'd have to define each substring and its replacement.

DanielBrenner
New Contributor III

Stephen,

Thanks big time for this; this stuff is huge but I will do what I can to follow through on it.

But yes, your last line does get at one of the issues that Excel does much better on. That was one of (if not THE) core issues, was that what would have represented X number of characters at varying lengths and varying places in a text string. Like you’re saying, to have to put in each one of the text strings that I wanted to conver would be kind of bulky.

But at the same time some of what you’re doing is what I came up with in terms of character substrings and how to replace them—See attached!

My last questions to you are how did you learn this and where did you get the opportunity to do so? Was it on the job or in a class?

Thanks again and again,

Daniel T. Brenner

RESEARCH ANALYST I (GEOGRAPHIC INFORMATION SYSTEMS)

DIVISION OF RESEARCH, INNOVATION, and SYSTEM INFORMATION

Highway System Information and Performance

1120 N Street MS 38

Sacramento, CA 95814

(916) 657-4277

(510) 593-7547 cell

daniel.brenner@dot.ca.gov<mailto:daniel.brenner@dot.ca.gov>

“Consider the unconsidered.”

“Certainty is not a sure thing!”

0 Kudos
StephenM
Occasional Contributor II

Looking at that attachment, regular expressions might be useful. If you're looking to get rid of all text matching the pattern "DH ##.###", for example:

example of regular expressions in Python

What I know of Python has been self-taught, but I did have a baseline established from studying other programming languages. I took a handful of programming classes in college, then started tinkering with Python when I was studying GIS. I've been able to continue tinkering in my job, using it to automate processes and try to solve problems.

0 Kudos
DanielBrenner
New Contributor III

Lucky you.

I’ve been married to MS Excel and watch my GIS career gently weep.

More soon.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I don't understand what criteria/condition is driving the replacement text.  For example, when

NAP NAP R 004.501 DH 00.334 VIA IMOLA AVE

gets changed to

NAP-121-R 004.501 DH 00.334 VIA IMOLA AVE

, " NAP " is getting changed to "-121-" .

However, when

NAP NAP 010.698 DH 00.501 OLD SONOMA RD OC 21 87

gets changed to

NAP-029-010.698  OLD SONOMA RD OC 21 87

, " NAP " is getting changed to "-029-". 

How are you deciding when " NAP " gets changed to one value versus another?

0 Kudos