Search Cursor to sort, group data with itertools, then sort on groups in list on lowest OBID

JamesRender · ‎01-09-2019

Hi,

I have a challenge to group data based on an attribute, then sort the groups and assign the lowest object ID's global ID to another field for the items in that given group. This needs to be iterated as there will be dozens, if not hundreds of groups. This is what I've come up with so far as an outline. Any pointers on this would be greatly appreciated:

import arcpy, time, datetime, os, itertools

fields = ['OBJECTID', 'LegacyGUID', 'GlobalID', 'LEGACYID']
groups = []
ukeys = []


try:
     with arcpy.da.SearchCursor(fc, fields, sql_clause=('','ORDER BY LEGACYID')) as cursor:
          for k, g in itertools.groupby(cursor, lambda x: x[3]):
               groups.append(list(g))
               ukeys.append(k)          
except:
     e = sys.exc_info()[1]
     GroupFail = ('Grouping Operation Failed at ' + str(datetime.datetime.now()) + ' ' + (e.args[0]) + '\n')
     flog.write(GroupFail)

flog.write(str(groups))

#newgroups = sorted(groups, key=lambda x: x[0])‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

DarrenWiens2 · ‎01-09-2019

Perhaps you've considered and passed on this idea, but you can get the minimum OID for each group using Summary Statistics, then read those values directly as you cursor through the features, or join all at once rather than cursoring at all.

JamesRender · ‎01-09-2019

So create a table of the lowest object ID with it's corresponding legacy ID, and global ID for each group, then join the table to the original feature class and replace values with the UpdateCursor?

That might be the easiest way, but thought there may be some other operators with PY or ArcPY that could get me their without leaving the script.

DarrenWiens2 · ‎01-09-2019

You can run almost all ArcGIS functionality from Python, like:
Add Join—Data Management toolbox | ArcGIS Desktop

Summary Statistics—Help | ArcGIS for Desktop

JamesRender · ‎01-09-2019

I'm able to get these groups into a list, but now need to sort each group by object ID and complete the replacement of the other records legacyID, with the lowest object ID's global ID per group.

DarrenWiens2 · ‎01-09-2019

With Summary Statistics, you can get a table with `Group, Min_OID_for_each_group`. Join this table to your original table by OID & Min_OID_for_each_group. Save the table or read to Python dictionary so you have `Group, Min_OID_for_each_group, GlobalID`, then either join to your original table by Group, or cursor through matching by group to update the LegacyID.

JoshuaBixby · ‎01-09-2019

Do you want to update a feature class? Your current code is using a search cursor, which obviously can't update the feature class or table in question.

Where does LegacyGUID comes into play? At first glance, I don't see where it is being used.

JamesRender · ‎01-09-2019

Yes, clearly I'm not attempting to edit at this point in the code. But the end result is to replace the legacyGlobal ID per group of the lowest ob id's global id. So will need to edit after I have a table or list to match to the groups legacyID.

JoshuaBixby · ‎01-09-2019

You can do it all in one pass using an update cursor, if I understand what you are trying to do. I will post some code tomorrow.

UPDATE: In terms of specifics, there are still a couple of things I am unclear about. For example, where does LegacyGUID come into play. Also, what is the target field you want to update with the LegacyID of the lowest ObjectID of the group?

For the code below, I am using LegacyGUID as the target field to be updated, but it is simple to change the field. In the code below, case_fields can be multiple fields if you want the grouping of data to be based on more than one field.

import arcpy
from itertools import groupby
from operator import itemgetter

fc = # path to feature class or table
case_fields = ["LegacyID"]
sort_field, sort_order = "OBJECTID", "ASC"
source_field = "GlobalID"
target_field = "LegacyGUID"

fields = case_fields + [sort_field, source_field, target_field]
sql_orderby = "ORDER BY {}, {} {}".format(
    ", ".join(case_fields), sort_field, sort_order
)

with arcpy.da.UpdateCursor(fc, fields, sql_clause=(None, sql_orderby)) as cur:
    case_func = itemgetter(*range(len(case_fields)))
    for key, group in groupby(cur, case_func):
        row = next(group)
        source = row[-2]
        row[-1] = source
        cur.updateRow(row)
        for row in group:
            row[-1] = source
            cur.updateRow(row)

del cur‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

JamesRender · ‎01-10-2019

Ah, not sure why i didn't think to just double sort in the sql clause...

To answer your question:

LegacyGUID is the field that is going to be updated with the GlobalID per group with the lowest Object ID's Global ID. Maybe this will help

1) Sort by legacyID

2) Group all like LegacyID's together

3) For groups with more than 1 member, calculate into the LegacyGUID field the lowest Object ID's Global ID per group.

4) For groups with 1 member, simply calculate the GlobalID into the LegacyGUID

Regarding your code, can you explain:

Joshua Bixby wrote:


    case_func = itemgetter(*range(len(case_fields)))

More specifically, what *range(len(case_fields))) is doing as a function, then as the grouping argument? Where in your code is it pulling the lowest object ID's global ID of each group?

Thanks for your help.