Delete text substring from longer text

2338
14
04-02-2019 04:32 PM
DanielBrenner
Occasional Contributor

I have a field in a .dbf file with a long text string.   I need to delete x amount of characters WITHIN the text string.    I am also not very good with programming so I haven't looked into creating a script.  

Essentially I have this:

aaa bbb ccc ddd eee fff ggg

What I need is this:

aaa bbb ccc fff ggg

I'm attaching the .dbf in MS Excel format (.xls).   I have had some luck with functions in MS Excel but I can't get the consistency I need.

0 Kudos
14 Replies
DanielBrenner
Occasional Contributor

Joshua,

See attached.

Its based on how the original text string describes a point location on a certain highway number network (like those little sticks along the side of the highway that have County-Route-Postmile stenciled on them) that’s in another field.

I did discuss the same matter with another respondent from a different forum and he was saying that the text strings couldn’t be created with out a logical scenario (IF location = x then label with Y) and I thought of using a drop down list of all the routes, and even joining them by another attribute, but all of that was going to be too much and given that I was going to be using the points layer’s dbf file in MS Excel I was going to be able to see the route number for the point location that needed the text string I’m trying to create and so it was just a matter of substituting when I had to.

Someone else emailed about using Data Dictionaries and I saw how they essentially did the same thing but the advantage for MS Excel is that I wouldn’t have had to include every record’s substring as part of one of them, and that the code to use was way simpler.

Hope this helps.

Daniel T. Brenner

RESEARCH ANALYST I (GEOGRAPHIC INFORMATION SYSTEMS)

DIVISION OF RESEARCH, INNOVATION, and SYSTEM INFORMATION

Highway System Information and Performance

1120 N Street MS 38

Sacramento, CA 95814

(916) 657-4277

(510) 593-7547 cell

daniel.brenner@dot.ca.gov<mailto:daniel.brenner@dot.ca.gov>

“Consider the unconsidered.”

“Certainty is not a sure thing!”

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I agree with the other respondent from the different forum.  The crux of your automation/scripting problem lies in:

Edited the "NAP" text string as need be to represent the correct character string (or spaces) as need be

You know what is correct "as need be," but you don't seem to be able to write out a rule set for it.  If you are changing your script/formula every few rows, and not relying on a per-defined rule set of what needs to get changed to what, then you might as well just do it by hand.

0 Kudos
DanielBrenner
Occasional Contributor

Guys,

See attached and also refer to the other stuff.

The deal is is that NAP ‘xxx’ can a 3-4 letter character string or a space between spaces. It’s the variability of what needs to be changed to the even bigger variability in terms of the number of text strings that could go in front of the space between the 3 -4 character string or the space AND the R or L or other space before a 6 digit string representing a number.

I did come up with a partial solution, where I created a text string in MS Excel with a dash before and after the route number; this text string would replace the 2nd set of characters, but again, the variability of the size of the string is a kicker.

How does any of what you guys are suggesting deal with this?

Yes it would cover for some of what the bigger formula was getting at with the “ -121- “ part, but again, not one string fits all. I even had to edit 29 to become 029 because it wasn’t in a consistent 3 digit format like the other route.

The second string is nearly out the window in terms of variability of starting position and numbers of characters. But again, I didn’t have a problem editing the characters and the text strings as need be.

Seriously I do thank you for this.

Daniel T. Brenner

RESEARCH ANALYST I (GEOGRAPHIC INFORMATION SYSTEMS)

DIVISION OF RESEARCH, INNOVATION, and SYSTEM INFORMATION

Highway System Information and Performance

1120 N Street MS 38

Sacramento, CA 95814

(916) 657-4277

(510) 593-7547 cell

daniel.brenner@dot.ca.gov<mailto:daniel.brenner@dot.ca.gov>

“Consider the unconsidered.”

“Certainty is not a sure thing!”

0 Kudos
StephenM
Occasional Contributor II

Maybe you could use regular expressions to replace substrings based on the character index of the first number in the ###.### sequence. Get the index of that character, then use string slices to replace all text between the third character (assuming all the rows start with "NAP") and that index.

Example in ArcGIS Pro using Python:

def replace_substr(s, rte):
    import re

    # Set pattern to replace beginning section. Optional
    # L or R and a space, then 3 numbers, a period, three
    # numbers, a space, 1 or 2 letters, and a space.
    ptn = re.compile("([LR] )?[0-9]{3}.[0-9]{3} [A-Z]{1,2} ")
    
    # Match the pattern in the string
    match = re.finditer(ptn, s)
    
    # Get the first match
    m = next(match)
    
    # Get the index where the match started
    idx = m.start()
    
    # Create new string by concatenating string slices and route
    new_str = s[:3] + rte + s[idx:]
    
    # Set pattern to replace the 'DH ##.###' section
    ptn = re.compile("DH [0-9]{2}.[0-9]{3}")
    
    # Replace the 'DH ##.###' section with nothing
    new_str = re.sub(" +", " ", re.sub(ptn, "", new_str))
    
    return new_str‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

table showing original strings and strings after replacement using regular expressions

DanielBrenner
Occasional Contributor

I wouldn’t say that solves the issue because it feels like too much and I haven’t tried it out either; although I have gotten a few things to work, just had to do some editing in Excel to do so.

0 Kudos