HI all,
I'm trying to figure out how to get the maximum value of a field within a group; e.g. I want the record corresponding to the most recent site visit.
ObjectID | Site_ID | SpecCode | SStatus | Mon_Date |
1 | A | Frog | OCCU | 1/1/1999 |
2 | A | Frog | UNOC | 1/4/1999 |
3 | A | Frog | OCCU | 5/2/2008 |
4 | B | Toad | UNOC | 3/6/2007 |
5 | B | Toad | OCCU | 9/5/2014 |
In this case, I want to return the max date for each site, returning records #3 and #5.
I'd like to be able to do it in either select by attribute or a search cursor.
I'm having trouble structuring my SQL clause(s) to make this happen.
How To: Select minimum and maximum values in the Select By Attributes window (esri.com)
This is my query I've tried for select by attribute:
(SpecCode = 'Toad' Or SpecCode = 'Frog') And
(SStatus = 'OCCU') And
(Mon_Date = (SELECT MAX(Mon_Date) from [Table Name] GROUP BY Site_ID))
I'd appreciate any pointers on this; I feel like it should be really easy and so far it hasn't been.
Postfix groupings can get ugly and are not supported by some types of data storage formats. I'd do it in a cursor if you could for portability and just do the max by comparison:
where = "(SpecCode = 'Toad' Or SpecCode = 'Frog') AND SStatus = 'OCCU'"
siteDict = {}
with arcpy.da.SearchCursor(tble, ['Site_ID', 'Mon_Date', 'SStatus', 'SpecCode'], where_clause=where) as sCur:
for row in sCur:
if not siteDict.get(row[0]):
siteDict[row[0]] = row[1]
else:
if siteDict[row[0]] < row[1]:
siteDict[row[0]] = row[1]
for k, v in siteDict.items():
print(f'Site ID: {k} last visited: {v}')
Trying a simple test I can only get a single selection even with a GROUP BY - which I guess is a scalar subquery result. Trying to replicate your query with variations of ORs and ANDs just caused strange unexpected selections. I don't think subqueries are the way to go with an FGDB. Recommend Jeff's cursor.
Doing more testing, I was able to mostly get it to work, with one issue remaining. The key was to use IN.
As it turns out, this data is not nearly as clean as it should be, and there are several cases of visit records with duplicate site# and Dates. I took care of that easily enough:
OBJECTID In
(select max(OBJECTID)
from [table]
WHERE Mon_Date In
(SELECT max(Mon_Date)
from [table]
Where Mon_Date is not null
Group BY Site_Id)
GROUP BY Site_ID )
My issue now is that some of the dates are not correctly being evaluated by MAX(). Which means that this happens:
I'm at a loss here.