Hi
I have an feature class with a field of concatenated species_codes. I would like to remove the duplicate values.
I would like to have just one label for each species. ie: AO, BB, BMC, CC, CH....
is there a way to do this?
Any help appreciated
Tina
Solved! Go to Solution.
Piece of cake; I just did this operation on some data this morning.
Here's some code for Field Calculator:
A. Function Call
RemoveDupes(!AC!)
B. Code Block
def RemoveDupes(old):
# Watch out for NULL values, because they'll crash the rest of the code
if old is None:
return None
# If you're still here, assume it's a comma-delimited string, and break
# it up into parts.
oldEntries = old.split(',')
newEntries = []
# Loop through the entries
for entry in oldEntries:
# Only save the entry if you haven't seen it already
if entry not in newEntries:
newEntries.append(entry)
# Stitch the entries back to a string & write it to the field
return ','.join(newEntries)
Results for the first row:
ANODON,BB,BMC,C,CAS,CC,CH,DC,DV,KO,L,LDC,LNC,LSU,MARFAL,MW,NSC,PK,RB,RSC,SP,SU,WF
This could also be done via an UpdateCursor if you're doing other things, but I gave you Field Calculator first, for the simpler option.
Also, for those familiar with python: I could have used a set for newEntries, and it would've taken care of skipping the duplicates automatically. But sets in python are unordered, which means the order would scramble when you ran this. The longer approach of the code above arbitrarily maintains whatever order the entries occur in the field.
related, this code block may work
Splitting numbers and removing duplicates in a text string
Or for a numpy def
import numpy as np
def dedup(vals):
"""remove duplicate entries"""
u = np.unique([i.strip() for i in vals.split(",")])
return ", ".join([i for i in u])
Very efficient; love it!
import numpy as np
def dedup(vals):
"""remove duplicate entries"""
u, cnts = np.unique([i.strip() for i in vals.split(",")], return_counts=True)
v = np.vstack((u, cnts)).T
return ", ".join(["{}: {}".format(i[0], i[1]) for i in v])
b ='AO, BB, BMC, CC, CH, AO, BB, BMC, CC, CH'
dedup(b)
'AO: 2, BB: 2, BMC: 2, CC: 2, CH: 2'
This can be simplified, but verbose demonstrates the principle
Piece of cake; I just did this operation on some data this morning.
Here's some code for Field Calculator:
A. Function Call
RemoveDupes(!AC!)
B. Code Block
def RemoveDupes(old):
# Watch out for NULL values, because they'll crash the rest of the code
if old is None:
return None
# If you're still here, assume it's a comma-delimited string, and break
# it up into parts.
oldEntries = old.split(',')
newEntries = []
# Loop through the entries
for entry in oldEntries:
# Only save the entry if you haven't seen it already
if entry not in newEntries:
newEntries.append(entry)
# Stitch the entries back to a string & write it to the field
return ','.join(newEntries)
Results for the first row:
ANODON,BB,BMC,C,CAS,CC,CH,DC,DV,KO,L,LDC,LNC,LSU,MARFAL,MW,NSC,PK,RB,RSC,SP,SU,WF
This could also be done via an UpdateCursor if you're doing other things, but I gave you Field Calculator first, for the simpler option.
Also, for those familiar with python: I could have used a set for newEntries, and it would've taken care of skipping the duplicates automatically. But sets in python are unordered, which means the order would scramble when you ran this. The longer approach of the code above arbitrarily maintains whatever order the entries occur in the field.
Bonus solution for anyone with a similar case that wants to keep/display the duplicate entries, but wants a slightly easier-to-read value in the attribute table:
def CountDupes(old):
if old is None:
return None
oldEntries = old.split(',')
newEntries = {}
for entry in oldEntries:
if entry not in newEntries:
newEntries[entry] = 0
newEntries[entry] += 1
return ','.join([f'{entry}:{count}' for entry, count in newEntries.items()]
Run on the sample data from the original post, this would be the result in the first row:
ANODON:1,BB:3,BMC:2,C:1,CAS:1,CC:3,CCG:1,CH:2,DC:2,DV:1,KO:2,L:1,LDC:1,LNC:1,LSU:1,MARFAL:1,MW:2,NSC:2,PK:1,RB:1,RSC:3,SP:1,SU:2,WF:1
Even cleaner, would be displaying it without the singles:
def CountDupes(old):
if old is None:
return None
oldEntries = old.split(',')
newEntries = {}
strEntries = []
for entry in oldEntries:
if entry not in newEntries:
newEntries[entry] = 0
newEntries[entry] += 1
for entry, count in newEntries.items():
if count == 1:
strEntries.append(entry)
else:
strEntries.append(f'{entry}:{count}')
And results:
ANODON,BB:3,BMC:2,C,CAS,CC:3,CCG,CH:2,DC:2,DV,KO:2,L,LDC,LNC,LSU,MARFAL,MW:2,NSC:2,PK,RB,RSC:3,SP,SU:2,WF
Since counting the number of species codes wasn't stated as a requirement, this can be done much simpler without having to use a code block. For the expression:
",".join(sorted(set(!field!.split(',')))) if !field! is not None else None
Worth noting: That one-liner presumes that alphanumeric sorting is the desired order.
The example data is all alphabetical, but there's insufficient information to predict whether that's presumable or just coincidental.
By avoiding casting to a set, whatever order each value appears in the field is maintained, sans duplicate entries.
(Also, the version that counts instances was just a fun 2-second self-challenge, more for the benefit of people landing on this page from a search in the future.)