Splitting numbers and removing duplicates in a text string

706
12
Jump to solution
01-09-2025 09:05 AM
Labels (1)
SimonCrutchley
Frequent Contributor

Hi there,

Some time ago people helped me with a similar issue where I wanted to split textual data in a string

https://community.esri.com/t5/mapping-questions/splitting-data-in-an-attribute-table/m-p/1008011#M82...

I'm trying to do the same thing with figures relating to areas, but possibly because these are numeric they behave differently. Running the code on 285311,285311,11540,11540 gives me the answer 11540285311, where what I really want is 285311, 11540.

I'm sure there's a simple solution, but looking at the https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/calculate-field-examples.htm hasn't helped.

Thanks

0 Kudos
1 Solution

Accepted Solutions
DavidSolari
MVP Regular Contributor

This code block function will preserve the original order:

 

def dedupe(item_str):
    cleaned = []
    for item in item_str.split(","):
        if item not in cleaned:
            cleaned.append(item)
    if len(cleaned):
        return ", ".join(cleaned)
    return ""

 

Change the strings on lines 3, 7 & 8 as needed to fit your data.

View solution in original post

12 Replies
SimonCrutchley
Frequent Contributor

On a relate note, how would I go about running the previous code but without sorting by alphabetical order? I assume it's a case of removing a bit of code, but I'm not sure which.

Thanks

0 Kudos
DavidSolari
MVP Regular Contributor

This code block function will preserve the original order:

 

def dedupe(item_str):
    cleaned = []
    for item in item_str.split(","):
        if item not in cleaned:
            cleaned.append(item)
    if len(cleaned):
        return ", ".join(cleaned)
    return ""

 

Change the strings on lines 3, 7 & 8 as needed to fit your data.

SimonCrutchley
Frequent Contributor

Hi David,

Thanks for that, but I'm very much a novice when it comes to coding. When you say 'change the strings' I'm not quite sure what you mean. When I try to select the fields in Field Calculator, it puts them in the top bit and it doesn't work. Assuming the field I want to use this on is called 'Area', how would this look.

Sorry.

0 Kudos
DavidSolari
MVP Regular Contributor

To use the function, throw it in the "Code Block", then you can use this as your expression: dedupe(!Area!). When I said "change the strings" I meant you can replace things like "," or "" to make the function split and join the results differently.

For context: the calculator runs all of the Python code in the code block before doing anything else, then for every selected record the calculator:

  1. Grabs every field that has a pair of exclamation marks (!Area! in our example).
  2. Does any data conversions it has to (usually only relevant for date or geometry fields).
  3. Converts the expression you gave it into its complete state.
  4. Evaluates the expression.
  5. Stores that result in the field you're calculating

In our case, the expression will evaluate to dedupe(123.45), dedupe(23.43), dedupe(None) or whatever is in each record's Area attribute. Hope this helps you get a handle on how the field calculator runs.

JoshuaBixby
MVP Esteemed Contributor

The following should work in the Calculate Field expression without having to define a code block:

",".join(set(!fieldname!.split(",")))
DavidSolari
MVP Regular Contributor

Python sets are not guaranteed to preserve insertion order so this could break in a future version of Pro. But it should work as of Pro 3.4 and Python 3.11.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

True, but the OP never stated maintaining the order was a requirement.  If maintaining order is a requirement, the expression can be changed to use a dict constructor instead of a set.

",".join({i:None for i in !fieldname!.split(",")}.keys())
DavidSolari
MVP Regular Contributor

Dictionaries are also not guaranteed to preserve insertion order and OrderedDict requires a code block so this isn't saving much. Although an OrderedDict would probably handle lists with thousands of items a bit faster.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

That changed in Python 3.7:  What’s New In Python 3.7 — Python 3.13.1 documentation

Python data model improvements:

  • the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.