Select to view content in your preferred language

How do I select the LAST of any number of duplicate records?

1786
13
05-23-2018 03:49 PM
Dean_Heather
New Contributor

I have a table with parcel numbers in one field and soil types in another. There are several soil types for each parcel, and therefore several records for each parcel. 

I am looking for a script (Python, SQL, VB, Arcade) that will iterate through each record grouped by APN, selecting (or outputting) the LAST record for each APN.

Example input table:

APNSOIL TYPE
1A
2A
2A, B
3A
3A, E
3A, E, G

Desired result:

APNSOIL TYPE
1A
2A, B
3A, E, G

It's a simple task that seems like it should be easy, but I can't seem to find a solution. I've been working in Excel, ModelBuilder inside of ArcGIS Pro, and Python in Jupyter Notebook, but I'm not very well versed in any one program or language; maybe there's a better route of attack.

Thank you for any suggestions you might be able to provide!

0 Kudos
13 Replies
DanPatterson_Retired
MVP Emeritus

Haven't checked all scenarios, but try this to get the index value for the elements of both

a = [1, 2, 2, 3, 3, 3]

b = []
def dif(a):
    """differencing"""
    for i in range(1, len(a)):
        if a[i-1] != a[i]:
            b.append(i-1)
    b.append(i)
    return b


dif(a)

[0, 2, 5]  # zero based indexing
XanderBakker
Esri Esteemed Contributor

If you create a dictionary and replace the value for the same key, you will end up with the last value for each key.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

If they are duplicates, how do you differentiate the first from last record, i.e., why not grab the first record?  It looks like you are trying to select the record with the most number of soil types per APN, not the last record of duplicates.  Depending on your answer, there may be a different approach to this.

JoeBorgione
MVP Emeritus

I've been wondering the same thing: first, last....  Based on what?

That should just about do it....
0 Kudos
DanPatterson_Retired
MVP Emeritus

If python 3.6+ is being used, then dictionary order is retained...

If python 2.7 is being used, the dictionary approach will fail.

In that simple example, there is only one set of sequences and the ID number is given by the APN field, which is assumed to be sequential.  If it were not an id field and the pattern represented classes, there could be duplicates in the sequence and the dictionary approach would fail again.

A blog is coming on sequences, patterns and replicants which will show how sequences and patterns of sequences and their representation and slicing (eg, start of sequence, sequence length and end of sequence) can all be obtained relatively simply and produce a summary table in ArcGIS PRO.

I will post back here hopefully within a day

XanderBakker
Esri Esteemed Contributor

IMHO a dictionary will work just fine with sequential access to the table rows. See snippet below:

table = [[1, "A"], [2, "A"], [2, "A, B"], [3, "A"], [3, "A, E"], [3, "A, E, G"]]
dct = {}

for row in table:
    apn = row[0]
    soil_type = row[1]
    dct[apn] = soil_type

for apn, soil_type in sorted(dct.items()):
    print("{} - {}".format(apn, soil_type))

This will yield:

1 - A
2 - A, B
3 - A, E, G
0 Kudos
DanPatterson_Retired
MVP Emeritus

If they are completely sequential... not just consisting of sequences... that is not fully clear from the question posed, it might be one of those expanding questions.

upcoming details soon

JoeBorgione
MVP Emeritus

Xander- I've been messing with this on and off all day.  I created a file gdb table just like the one in the o.p. and with a da.SearchCursor created a list.  That list was in the form of tuples so somewhat different than yours:

myList = [(1, "A"), (2, "A"), (2, "A, B"), (3, "A"), (3, "A, E"), (3, "A, E, G")]

I then created a dictionary from that list and since the keys are not unique, the last one wins and that produces the desired result as shown in the o.p.  But that's where things fall apart for me.  If I pop() the last element and then insert() it back into the list at the first location, the resulting dictionary is quite different. 3:'A,E' wins.....

Would line 9 in your example take care of that?

Xander Bakker

That should just about do it....
0 Kudos
XanderBakker
Esri Esteemed Contributor

That's exactly what "should" happen, since the last element for APN 3 will be "A, E".

Or that exactly what I understand as objective from the question that the OP asked. It's best if the OP clarifies if this is the objective or that the LAST element should be interpreted differently (like last when sorted alphabetically or the one with the most "complex" soil type). Point is that with the solution I suggested the order of the data will influence the result and this is often not what one would like to happen.