Select MAX of a field by group

368
3
03-29-2023 12:38 PM
AlfredBaldenweck
MVP Regular Contributor

HI all,

I'm trying to figure out how to get the maximum value of a field within a group; e.g. I want the record corresponding to the most recent site visit.

ObjectIDSite_IDSpecCodeSStatusMon_Date
1A

Frog

OCCU

1/1/1999

2AFrogUNOC1/4/1999
3AFrogOCCU5/2/2008
4BToadUNOC3/6/2007
5BToadOCCU9/5/2014

In this case, I want to return the max date for each site, returning records #3 and #5.

I'd like to be able to do it in either select by attribute or a search cursor. 

I'm having trouble structuring my SQL clause(s) to make this happen.

How To: Select minimum and maximum values in the Select By Attributes window (esri.com)

This is my query I've tried for select by attribute:

(SpecCode = 'Toad' Or SpecCode = 'Frog') And 
(SStatus = 'OCCU') And 
(Mon_Date = (SELECT MAX(Mon_Date) from [Table Name] GROUP BY Site_ID))

I'd appreciate any pointers on this; I feel like it should be really easy and so far it hasn't been.

0 Kudos
3 Replies
by Anonymous User
Not applicable

Postfix groupings can get ugly and are not supported by some types of data storage formats.  I'd do it in a cursor if you could for portability and just do the max by comparison:

where = "(SpecCode = 'Toad' Or SpecCode = 'Frog') AND SStatus = 'OCCU'"

siteDict = {}
with arcpy.da.SearchCursor(tble, ['Site_ID', 'Mon_Date', 'SStatus', 'SpecCode'], where_clause=where) as sCur:
    for row in sCur:
        if not siteDict.get(row[0]):
            siteDict[row[0]] = row[1]
        else:
            if siteDict[row[0]] < row[1]:
                siteDict[row[0]] = row[1]

for k, v in siteDict.items():
    print(f'Site ID: {k} last visited: {v}')

 

0 Kudos
DavidPike
MVP Frequent Contributor

Trying a simple test I can only get a single selection even with a GROUP BY - which I guess is a scalar subquery result.  Trying to replicate your query with variations of ORs and ANDs just caused strange unexpected selections.  I don't think subqueries are the way to go with an FGDB.  Recommend Jeff's cursor.

0 Kudos
AlfredBaldenweck
MVP Regular Contributor

Doing more testing, I was able to mostly get it to work, with one issue remaining. The key was to use IN.

As it turns out, this data is not nearly as clean as it should be, and there are several cases of visit records with duplicate site# and Dates. I took care of that easily enough: 

 

OBJECTID In 
(select max(OBJECTID) 
from [table] 
WHERE Mon_Date In 
    (SELECT max(Mon_Date) 
     from [table] 
     Where Mon_Date is not null 
     Group BY Site_Id) 
GROUP BY Site_ID )

 

My issue now is that some of the dates are not correctly being evaluated by MAX(). Which means that this happens:

AlfredBaldenweck_1-1680293539585.png

I'm at a loss here.

 

 

0 Kudos