Search Cursor to sort, group data with itertools, then sort on groups in list on lowest OBID

1852
10
01-09-2019 01:07 PM
JamesRender
New Contributor

Hi,

I have a challenge to group data based on an attribute, then sort the groups and assign the lowest object ID's global ID to another field for the items in that given group. This needs to be iterated as there will be dozens, if not hundreds of groups. This is what I've come up with so far as an outline. Any pointers on this would be greatly appreciated:

import arcpy, time, datetime, os, itertools

fields = ['OBJECTID', 'LegacyGUID', 'GlobalID', 'LEGACYID']
groups = []
ukeys = []


try:
     with arcpy.da.SearchCursor(fc, fields, sql_clause=('','ORDER BY LEGACYID')) as cursor:
          for k, g in itertools.groupby(cursor, lambda x: x[3]):
               groups.append(list(g))
               ukeys.append(k)          
except:
     e = sys.exc_info()[1]
     GroupFail = ('Grouping Operation Failed at ' + str(datetime.datetime.now()) + ' ' + (e.args[0]) + '\n')
     flog.write(GroupFail)

flog.write(str(groups))

#newgroups = sorted(groups, key=lambda x: x[0])‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
0 Kudos
10 Replies
DarrenWiens2
MVP Honored Contributor

Perhaps you've considered and passed on this idea, but you can get the minimum OID for each group using Summary Statistics, then read those values directly as you cursor through the features, or join all at once rather than cursoring at all.

0 Kudos
JamesRender
New Contributor

So create a table of the lowest object ID with it's corresponding legacy ID, and global ID for each group, then join the table to the original feature class and replace values with the UpdateCursor? 

That might be the easiest way, but thought there may be some other operators with PY or ArcPY that could get me their without leaving the script. 

0 Kudos
DarrenWiens2
MVP Honored Contributor

You can run almost all ArcGIS functionality from Python, like:
Add Join—Data Management toolbox | ArcGIS Desktop 

Summary Statistics—Help | ArcGIS for Desktop 

0 Kudos
JamesRender
New Contributor

I'm able to get these groups into a list, but now need to sort each group by object ID and complete the replacement of the other records legacyID, with the lowest object ID's global ID per group.

0 Kudos
DarrenWiens2
MVP Honored Contributor

With Summary Statistics, you can get a table with `Group, Min_OID_for_each_group`. Join this table to your original table by OID & Min_OID_for_each_group. Save the table or read to Python dictionary so you have `Group, Min_OID_for_each_group, GlobalID`, then either join to your original table by Group, or cursor through matching by group to update the LegacyID.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Do you want to update a feature class?  Your current code is using a search cursor, which obviously can't update the feature class or table in question. 

Where does LegacyGUID comes into play?  At first glance, I don't see where it is being used.

0 Kudos
JamesRender
New Contributor

Yes, clearly I'm not attempting to edit at this point in the code. But the end result  is to replace the legacyGlobal ID per group of the lowest ob id's global id. So will need to edit after I have a table or list to match to the groups legacyID. 

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

You can do it all in one pass using an update cursor, if I understand what you are trying to do.  I will post some code tomorrow.

UPDATE: In terms of specifics, there are still a couple of things I am unclear about.  For example, where does LegacyGUID come into play.  Also, what is the target field you want to update with the LegacyID of the lowest ObjectID of the group?

For the code below, I am using LegacyGUID as the target field to be updated, but it is simple to change the field.  In the code below, case_fields can be multiple fields if you want the grouping of data to be based on more than one field.

import arcpy
from itertools import groupby
from operator import itemgetter

fc = # path to feature class or table
case_fields = ["LegacyID"]
sort_field, sort_order = "OBJECTID", "ASC"
source_field = "GlobalID"
target_field = "LegacyGUID"

fields = case_fields + [sort_field, source_field, target_field]
sql_orderby = "ORDER BY {}, {} {}".format(
    ", ".join(case_fields), sort_field, sort_order
)

with arcpy.da.UpdateCursor(fc, fields, sql_clause=(None, sql_orderby)) as cur:
    case_func = itemgetter(*range(len(case_fields)))
    for key, group in groupby(cur, case_func):
        row = next(group)
        source = row[-2]
        row[-1] = source
        cur.updateRow(row)
        for row in group:
            row[-1] = source
            cur.updateRow(row)

del cur
JamesRender
New Contributor

Ah, not sure why i didn't think to just double sort in the sql clause...

To answer your question:

LegacyGUID is the field that is going to be updated with the GlobalID per group with the lowest Object ID's Global ID.  Maybe this will help

1) Sort by legacyID

2) Group all like LegacyID's together

3) For groups with more than 1 member, calculate into the LegacyGUID field the lowest Object ID's Global ID per group.

4) For groups with 1 member, simply calculate the GlobalID into the LegacyGUID

Regarding your code, can you explain:

Joshua Bixby wrote:


    case_func = itemgetter(*range(len(case_fields)))

 

More specifically, what *range(len(case_fields))) is doing as a function, then as the grouping argument? Where in your code is it pulling the lowest object ID's global ID of each group?

Thanks for your help.

0 Kudos