Select records with most recent date for groups of points

13764
12
05-12-2017 02:49 PM
JasonFultz
New Contributor III

Hi all, 

I am working with a feature class (sample_Locations) in an ArcSDE Geodatabase and my table has a "Point" field and a "Date" field, the field types are string and date, respectively.  There are over 2000 features in the table and the point ID's are not unique, there are only ~100 point ID's, so there may be many points with the same Point ID.  I am trying to select records from the table with the most recent date for sets of values, where the field "Point" is the field that contains the values that define groups.

I have tried using the following subquery, but it did not select the correct records.  It selected a very random assortment of features.

[Date] in (SELECT max( [Date] ) FROM Sample_Locations GROUP BY [Point])

Can anyone provide help for performing this selection using a Python script?  Ideally, I would like to select the features and export these selected features to a standalone excel table.

Any help would be greatly appreciated!!

Jason

12 Replies
JoshuaBixby
MVP Esteemed Contributor

Is using Python an option or does it have to be SQL-based?  There are Python-based solutions to this question as well.

0 Kudos
JasonFultz
New Contributor III

Hi Joshua,  Thank you for the reply.  I would actually prefer to use a Python script as opposed to an SQL query, if possible. 

I will give the SQL query a try and let you know the result, as I am away from the data at the moment.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

There are several ways to go about this using Python, some are straight Python and others mix Python with some basic SQL for sorting the data set first.

One Python approach that avoids sorting the data first relies on using dictionaries to store and compare values:

lyr = # A layer or tableview object, usually retrieved using listing methods
      # in arcpy.mapping

from collections import defaultdict, namedtuple

defaultdate = datetime.datetime(1000, 1, 1)
record = namedtuple("record", "oid date")
max_date = defaultdict(lambda: record(None, defaultdate))

with arcpy.da.SearchCursor(lyr, ["OID@", "Point", "Date"]) as cur:
    for oid, point, date in cur:
        if max_date[point].date < (date or defaultdate):
            max_date[point] = record(oid, date)
            
oids = [rec.oid for rec in max_date.values() if rec.oid]
lyr.setSelectionSet("NEW", oids)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

defaultdicts are handy and make writing and reading code much easier in certain situations, like this one.  I find namedtuples are handy as well, and they make working with the code much easier.  For example, I find max_date[point].date much clearer than max_date[point][1].

Although your data appears to have date values for every record, the code above was written to handle records not having date values.  Unless you know the date field in the data set doesn't allow NULL, it is good to have your code accommodate the possibility in case one does show up.

If I find some time, I will show an example using straight Python that involves sorting the data set first.