Select to view content in your preferred language

Search Cursor to sort, group data with itertools, then sort on groups in list on lowest OBID

2960
10
01-09-2019 01:07 PM
JamesRender
Emerging Contributor

Hi,

I have a challenge to group data based on an attribute, then sort the groups and assign the lowest object ID's global ID to another field for the items in that given group. This needs to be iterated as there will be dozens, if not hundreds of groups. This is what I've come up with so far as an outline. Any pointers on this would be greatly appreciated:

import arcpy, time, datetime, os, itertools

fields = ['OBJECTID', 'LegacyGUID', 'GlobalID', 'LEGACYID']
groups = []
ukeys = []


try:
     with arcpy.da.SearchCursor(fc, fields, sql_clause=('','ORDER BY LEGACYID')) as cursor:
          for k, g in itertools.groupby(cursor, lambda x: x[3]):
               groups.append(list(g))
               ukeys.append(k)          
except:
     e = sys.exc_info()[1]
     GroupFail = ('Grouping Operation Failed at ' + str(datetime.datetime.now()) + ' ' + (e.args[0]) + '\n')
     flog.write(GroupFail)

flog.write(str(groups))

#newgroups = sorted(groups, key=lambda x: x[0])‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
0 Kudos
10 Replies
JoshuaBixby
MVP Esteemed Contributor

Since I wrote the code to accept multiple case/grouping fields, I have to create the groupby grouping function dynamically.  Since I am grouping based on positions in a list, I used operator.itemgetter, which accepts multiple indexes.  To keep it simple, I put the case/grouping fields at the front end of the list of fields so I can use len(case_fields) to setup itemgetter.  So, len(case_fields) is being used to identify how many of the first list elements are related to grouping.  I am using range to create a sequence for itemgetter that starts at the lowest index in the list, zero, and counts up.  I put the star/asterisk in front of range to unpack the sequence when passing it to itemgetter.

In terms of the groupby key, you could also right:

case_func = lambda x: x[:len(case_fields)] 

When working with groupby, the iterable (cursor in this instance) needs to be sorted.  I used a compound SQL ORDER BY clause to create sorted groups for the cursor.  For this situation, my SQL statement looks like:

ORDER BY LegacyID, OBJECTID ASC

Since the rows are sorted by LegacyID and then OBJECTID, the first record within a group will contain the GlobalID you want to use for populated the other rows in the group.  That is why I manually move the group iterator forward by using next(), and then continue looping using a for statement.