ArcPy: Find Maximum Value for each group.

6149
4
03-25-2014 02:53 PM
PeterWilson
Frequent Contributor
I'm new to python and and arcpy and would like to get some advice in generating a python script. I have a table with the following fields:

Field 01:  Quinary (txt)
Field 02:  Water Management Area (txt)
Field 03:  Percentage (Double)

The table represents the results based on the intersection of the quinary (polygon) feature class that was intersected with the Water Management Area (polygon) feature class. The Percentage Field represents the percentage area of the original Quinary Feature Class that falls within the Water Management Area.

The table has multiple records where Field 01 can have the same value, Field 02 can have the same value but Field 03 will be different. I need to identify the records based on Field 01 and Field 02 and identify the maximum value in Field 03.

ID  Field 01    Field 02    Field 03
01  Q1           W1          02.00
02  Q1           W1          48.00
03  Q1           W2          50.00
04  Q2           W3          90.00
05  Q2           W4          10.00


Results Reqiured:

ID  Field 01    Field 02    Field 03
03  Q1           W2          50.00
04  Q2           W3          90.00


Any help will be appreciated.

Regards
Tags (2)
0 Kudos
4 Replies
XanderBakker
Esri Esteemed Contributor
Hi Peter,

Looking at the desired result, you could use something like this:

import arcpy

# your data:
tbl = r'C:\Project\_Forums\_maxDict\fgdb\test.gdb\aTable'
fld_quin = 'Quinary'
fld_wma = 'WMA' # Water Management Area
fld_perc = 'Percentage'
fld_oid = arcpy.Describe(tbl).OIDfieldname # or specify ID fieldname

# create empty dictionary to store results
dct = {}

# loop through table
flds = (fld_oid, fld_quin, fld_wma, fld_perc)
with arcpy.da.SearchCursor(tbl, flds) as curs:
    for row in curs:
        oid, perc = row[0], row[3]
        quin, wma = row[1], row[2]
        if quin in dct:
            max_perc = dct[quin][2]
            if perc > max_perc:
                dct[quin] = (oid, wma, perc)
        else:
            dct[quin] = (oid, wma, perc)

# print results
for quin, val in dct.items():
    print val[0], quin, val[1], val[2]


This produces the following output:
3 Q1 W2 50.0
4 Q2 W3 90.0


The dictionary holds the Quinary as key value and a tuple of OID, WMA and Percentage as value. In this case the maximum percentage is stored per Quinary.


In case you are interested in determining the maximum percentage per combination of Quinary and Water Management Area, the code would slightly change:

import arcpy

# your data:
tbl = r'C:\Project\_Forums\_maxDict\fgdb\test.gdb\aTable'
fld_quin = 'Quinary'
fld_wma = 'WMA' # Water Management Area
fld_perc = 'Percentage'
fld_oid = arcpy.Describe(tbl).OIDfieldname

# create empty dictionary to store results
dct = {}

# loop through table
flds = (fld_oid, fld_quin, fld_wma, fld_perc)
with arcpy.da.SearchCursor(tbl, flds) as curs:
    for row in curs:
        oid, perc = row[0], row[3]
        quin, wma = row[1], row[2]
        key = (quin, wma)
        if key in dct:
            max_perc = dct[key][1]
            if perc > max_perc:
                dct[key] = (oid, perc)
        else:
            dct[key] = (oid, perc)

# print results
for key, val in dct.items():
    print val[0], key[0], key[1], val[1]


Kind regards,

Xander
0 Kudos
JamesCrandall
MVP Frequent Contributor
Alternative to ESRI stack is to use Pandas library for this kind of stuff.  Here I just pulled in your sample data into the Default.gdb as a table, but it could also be a feature class too -- doesn't really matter.  Or it could be just about any data source (.txt, .csv, cx_Oracle cursor, etc...)

Here's a GIS-centric solution using Pandas.

Source data sampdata.csv:
ID,Field01,Field02,Field03
01,Q1,W1,02.00
02,Q1,W1,48.00
03,Q1,W2,50.00
04,Q2,W3,90.00
05,Q2,W4,10.00



import pandas as pd
import numpy as np

#just pull in sample data into a gbd table
_dfsource = r'H:\Documents\ArcGIS\Default.gdb\sampdat'

#specify fields to use and convert the gdb tab to a numpy array
flds = ['ID', 'Field01', 'Field02', 'Field03']
_numparr = arcpy.da.TableToNumPyArray(_dfsource, flds)

#convert the numpy array to a pandas data frame for easy-peezy grouping functions
_dfarr = pd.DataFrame(_numparr, columns=['ID', 'Field01', 'Field02', 'Field03'])
_dfarrGrouped = _dfarr.groupby(['Field01'])['Field03'].max().reset_index()

print _dfarrGrouped



Result:

  Field01  Field03
0      Q1       50
1      Q2       90
0 Kudos
PeterWilson
Frequent Contributor
Hi Xander

The following is exactly what I'm trying to achieve. I've not worked with python dictionaries yet. Would you mind explaining to me how to convert the python dictionary into a File Geodatabase Table?

Thanks for all your help so far, its much appreciated.

Regards
0 Kudos
XanderBakker
Esri Esteemed Contributor
Hi Xander

The following is exactly what I'm trying to achieve. I've not worked with python dictionaries yet. Would you mind explaining to me how to convert the python dictionary into a File Geodatabase Table?

Thanks for all your help so far, its much appreciated.

Regards


Hi Peter,

It is nice to see that James Crandall has joined this thread with some panda and numpy magic. On previous occasions James has pointed out the strength of these modules. Since panda is not available by default on a ArcGIS system and numpy is, I will try to show you how you could write the dictionary to a table, by using numpy.  

If you want to use the arpy way of creating an empty table, adding field and using an insert cursor to fill the table, you can look at the thread below, where an example is shown:
http://forums.arcgis.com/threads/105303-Take-identical-fields-and-concatenate-additional-field?p=375...


Here the code to use numpy to store the table:

import arcpy, os, numpy
arcpy.env.overwriteOutput = True

# your data:
tbl = r'C:\Project\_Forums\_maxDict\fgdb\test.gdb\aTable'
fld_quin = 'Quinary'
fld_wma = 'WMA' # Water Management Area
fld_perc = 'Percentage'
fld_oid = arcpy.Describe(tbl).OIDfieldname # or specify ID fieldname

# extra vars for output
fld_id = 'ID'
out_ws = r'C:\Project\_Forums\_maxDict\fgdb\test.gdb'
tbl_name = 'aNewOutputTable'

# create empty dictionary to store results
dct = {}

# loop through table
flds = (fld_oid, fld_quin, fld_wma, fld_perc)
with arcpy.da.SearchCursor(tbl, flds) as curs:
    for row in curs:
        oid, perc = row[0], row[3]
        quin, wma = row[1], row[2]
        if quin in dct:
            max_perc = dct[quin][2]
            if perc > max_perc:
                dct[quin] = (oid, wma, perc)
        else:
            dct[quin] = (oid, wma, perc)

# create list for use in numpy
lst_out = [(val[0], quin, val[1], val[2]) for quin, val in dct.items()]

# create a numpy array
npa = numpy.array(lst_out, numpy.dtype([(fld_id, numpy.int32), (fld_quin, '|S8'), (fld_wma, '|S8'), (fld_perc, numpy.float)]))

# output list of fields
flds = (fld_id, fld_quin, fld_wma, fld_perc)

# store the table
arcpy.da.NumPyArrayToTable(npa, os.path.join(out_ws, tbl_name), flds)


Kind regards,

Xander
0 Kudos