Edit: here's the current code I've got so far:
itsct_nparr = arcpy.da.FeatureClassToNumPyArray(prItsct, ['FID_preproc','NAME','Shape_Area'])
#create a pandas DataFrame objects from the NumPy arrays
itsct_df = DataFrame(itsct_nparr, columns=['FID_preproc','NAME','Shape_Area'])
maxarea = itsct_df.groupby(['FID_preproc','NAME'], as_index=False).max()
maxarea.to_csv(csvout)
del itsct_nparr
I need to come up with a solution that allows me to summarize an input table, performing a GroupBy on 2 columns ("FID_preproc" and "Shape_Area") and keep all of the fields in the original table in the output/result. From there I'll be joining that result with another table and calculating a field using the NAME column.
From the example, I need to maintain the FID_preproc, and NAME fields (see the desired result). GroupBy on FID_preproc and MAX(Shape_Area).
Any method is acceptable (numpy, pandas, summarize table, da.SearchCursor, etc.) -- whatever you have I can use!
Input: | ||
FID_preproc | NAME | Shape_Area |
1340 | A | 25952.35775 |
1341 | A | 118099.5219 |
1341 | B | 305220.1244 |
1342 | A | 12053.13585 |
Desired Result: | ||
FID_preproc | NAME | Shape_Area |
1340 | A | 25952.35775 |
1341 | B | 305220.1244 |
1342 | A | 12053.13585 |
Solved! Go to Solution.
I was able to get your suggestion to run. However, it only outputs 2 columns and not what the desired result in my OP.
Can you post the code back that you are using? When I tested the code on my end, I was able to output all columns in the original table, including the original ObjectID. I am not sure why you would only get 2 columns.
Ah -- that was my goof. Yes, you are correct it generates the desired output.
Double award points for questioning my obvious flawed assessment!
As powerful as pandas is, I find getting data into and out of it cumbersome at times and unnecessary at other times. If the data is already stored where SQL can be used, like a geodatabase, I like pairing ArcPy Data Access cursors, itertools, and SQL to keep the data in place and do my analysis. My code from Deleting records from tabular intersect table based on 'PERCENTAGE' field can be easily modified to work here:
>>> from itertools import groupby
>>> from operator import itemgetter
>>>
>>> fc = # path to feature class or table
>>> case_fields = ["FID_preproc"]
>>> max_field = "Shape_Area"
>>> sql_orderby = "ORDER BY {}, {} DESC".format(", ".join(case_fields), max_field)
>>> with arcpy.da.UpdateCursor(fc, "*", sql_clause=(None, sql_orderby)) as cur:
... case_func = itemgetter(
... *(cur.fields.index(fld) for fld in case_fields)
... )
... for key, group in groupby(cur, case_func):
... next(group)
... for extra in group:
... cur.deleteRow()
...
>>> del cur
>>>
NOTE: the above code modifies the table in-place so make sure to test it on a copy.
I will be processing this feature class in the in_memory workspace -- not sure if that sql will work there. Your pandas solution above will work great.
This solution looks good too! But not sure if that ORDER BY clause will work outside of RDMBS sources.
I did a quick check using an in-memory table, and the code works the same as having the table in a file geodatabase. As I mentioned before, I am running 10.5.1, which shouldn't make a difference in terms of SQL support with in-memory workspaces, but one never knows. I know in-memory workspaces don't support as much SQL as file geodatabases, but ORDER BY appears to work fine.
Excellent! I'll give it a go as well.
If you want to create a table view of the table and select records, the following works without modifying the original table:
from itertools import groupby
from operator import itemgetter
tbl = # path to table
case_fields = ["FID_preproc"]
max_field = "Shape_Area"
sql_orderby = "ORDER BY {}, {} DESC".format(", ".join(case_fields), max_field)
vw = arcpy.MakeTableView_management(tbl, "tbl_vw").getOutput(0)
oidField = arcpy.Describe(vw).OIDFieldName
oids = []
with arcpy.da.SearchCursor(vw, "*", sql_clause=(None, sql_orderby)) as cur:
oidField = cur.fields.index(oidField)
case_func = itemgetter(
*(cur.fields.index(fld) for fld in case_fields)
)
for key, group in groupby(cur, case_func):
oids.append(next(group)[oidField])
continue
vw.setSelectionSet("NEW_SELECTION", oids)
I'll be using the grouped table to join back in with the feature class and calculating a field.