Selecting the most recent records based on unique values in another field

SteveMasters · ‎01-02-2024

Hi,

I have built the model in the picture, which works fine. But the next step I want to do is for each of the new feature classes that are created, I want only the most recent record of each unique record from the species field. But I want all the associated fields to come through into the new feature class. See table picture below, for example I only want the most recent record of Sparrowhawk.

I am not sure how to do this and if I can add it into the model as part of the process?

Thanks

Bud · ‎01-02-2024

~~What kind of geodatabase is your data in? Have you considered using a database view instead?~~

Never mind. I didn't read the question close enough.

DavidPike · ‎01-03-2024

I think the database view is a good idea, would be some subquery with a GROUP BY species and MAX (date) as @RichardHowe suggests

SteveMasters · ‎01-04-2024

If I was to create a database view, just struggling with correct syntax to go through the feature class in the file geodatabase what do put after the FROM part see picture. Or am I completely way off!

Thanks

DavidPike · ‎01-04-2024

Ah I tried a FGDB view and couldn't subqueries with GROUP BY working. I'd really recommend what @RichardHowe suggested as the most practical way.

Bud · ‎01-12-2024

I haven't found a way to do it in a SQL expression. But I got a FGDB database view to work.

I thought I'd tried this approach previously without luck. But I guess I must have done something differently this time because it works now.

1. Load this sample spreadsheet into a FGDB:
https://community.esri.com/t5/arcgis-pro-ideas/fgdb-database-views-view-that-selects-from-an/idi-p/1...

2. Create this database view:

--species_records_latest_vw
select
    *
from
    species_records
inner join
    (select
        t_species,
        max(t_date) as t_date 
    from
        species_records 
    group by
        t_species) l
    on species_records.t_species = l.t_species 
       and species_records.t_date = l.t_date

For my notes, the SQL in brackets is a derived table, not a subquery. Is the SQL in brackets a subquery?

Result:

Unfortunately, the query doesn't break ties when there are multiple rows per species with the same date.

Video:

(view in My Videos)

To use the result in further analysis, it might be best to export the view as a table since FGDB views are buggy in ArcGIS Pro. For example, joining from a table to a view doesn't seem to work; the joined fields are all null. Idea: Join to a FGDB database view.

Update:

I got a FGDB database view working that breaks the ties.

Again, I thought I had tried this previously and failed. But it seems to be working now.

select
    *
from
    species_records s
inner join
    (
    select
        min(objectid) as objectid,
        min(t_unique_id) as t_unique_id,
        t_species,
        max(t_date) as t_date
    from
        (select
            *
        from
            species_records
        inner join
            (select
                t_species,
                max(t_date) as t_date 
            from
                species_records 
            group by
                t_species) l
            on species_records.t_species = l.t_species 
               and species_records.t_date = l.t_date
        )
    group by
        t_species
    ) l
    on s.t_unique_id = l.t_unique_id

Some of the fields have asterisks (*) beside the field names, indicating the field has an index.

I don't think I added those indexes intentionally. But maybe it happened when using a GP tool. I don't know if that helps these views work better or not. But it's something to keep in mind.

An observation:
Sometimes, SQL table aliases behave strangely in FGDB views. For example, SPECIES_RECORDS S and S.* . Field headers in the attribute table will occasionally show a S. prefix. Other times it won't show a prefix. Sometimes joined SQL fields with duplicate field names will show up in the attribute table, other times not. I don't know what the reasoning is behind that, but it tells me aliases and field names are buggy. If I'm having trouble creating a FGDB database view, then I'll fiddle with the aliases in the SQL. I'll change SELECT S.* to just SELECT * . Or something like that.
Maybe it helps, maybe it doesn't.

Esri Case 03526583 - FGDB aggregation database view with SQL field aliases: Field name in attribute table is incorrect when underlying table is added to map
Esri Case 03528630 - FGDB database view has joined fields with duplicate names; joined fields should not be excluded from attribute table
FGDB database view — Join fields missing from attribute table
BUG-000164420: Database View created in file geodatabase with Join definition does not create the required duplicate fields

Bud · ‎01-14-2024

One thing that has helped me when writing FGDB SQL is to do a parallel test in a different geodatabase type where I can use a SQL client, such as DBeaver for mobile GDBs or SQL Developer for Oracle EGDBs. I put exactly the same data in a mobile GDB, mock up the SQL in DBeaver, then try to get something similar working in FGDB SQL.

It really helps me to write SQL in a SQL client that has proper error messages; it lets me make sure my syntax is correct in a normal environment first. FGDB SQL is challenging enough as it is; typos on my part make it so much worse, so I want to fix them ahead of time before dealing with FGDB SQL.

SteveMasters · ‎01-26-2024

Great thank for this. The first SQL works perfectly, but as you say doesn't get rid of multiple species on same date record. In the second where does the t_unique_field come from? As its not in your original table.

Bud · ‎01-26-2024

I manually added the T_UNIQUE_ID field to the table after-the-fact. I think I just populated it using the field calculator by taking the OBJECTID and adding 99 (so that the first ID starts at 100). But you could use anything, including just copying the OBJECTID.

I honestly don't remember why I added the T_UNIQUE_ID field in the first place.

But, to my surprise, it seems to be necessary in order for the query to work in file geodatabases. If I were to replace the references to T_UNIQUE_ID in the query with OBJECTID, then the query would produce incorrect results. I'm not sure why that's happening since OBJECTID has unique values, similar to T_UNIQUE_ID. Both fields ought to be able to serve the same purpose in this scenario. To prove that point, using OBJECTID works fine in a mobile or enterprise (Oracle) geodatabase. It's just the file geodatabase that's returning the wrong rows.

So, I guess you'd need to use a unique ID column other than OBJECTID to make the query work. Having a unique ID column in a table other than OBJECTID is a good data management practice anyway, so maybe it's worth it.

The moral of the story here for me is that file geodatabase SQL is problematic. Use with caution.

SteveMasters · ‎01-29-2024

Thanks, just trying it and for some reason I keep getting the error message below. It didn't happen when I ran the first script. Not sure what is going on.