Hi,
I have built the model in the picture, which works fine. But the next step I want to do is for each of the new feature classes that are created, I want only the most recent record of each unique record from the species field. But I want all the associated fields to come through into the new feature class. See table picture below, for example I only want the most recent record of Sparrowhawk.
I am not sure how to do this and if I can add it into the model as part of the process?
Thanks
What kind of geodatabase is your data in? Have you considered using a database view instead?
Never mind. I didn't read the question close enough.
I think the database view is a good idea, would be some subquery with a GROUP BY species and MAX (date) as @RichardHowe suggests
If I was to create a database view, just struggling with correct syntax to go through the feature class in the file geodatabase what do put after the FROM part see picture. Or am I completely way off!
Thanks
Ah I tried a FGDB view and couldn't subqueries with GROUP BY working. I'd really recommend what @RichardHowe suggested as the most practical way.
I haven't found a way to do it in a SQL expression. But I got a FGDB database view to work.
I thought I'd tried this approach previously without luck. But I guess I must have done something differently this time because it works now.
1. Load this sample spreadsheet into a FGDB:
https://community.esri.com/t5/arcgis-pro-ideas/fgdb-database-views-view-that-selects-from-an/idi-p/1...
2. Create this database view:
--species_records_latest_vw select * from species_records inner join (select t_species, max(t_date) as t_date from species_records group by t_species) l on species_records.t_species = l.t_species and species_records.t_date = l.t_date
For my notes, the SQL in brackets is a derived table, not a subquery. Is the SQL in brackets a subquery?
Result:
Unfortunately, the query doesn't break ties when there are multiple rows per species with the same date.
Video:
To use the result in further analysis, it might be best to export the view as a table since FGDB views are buggy in ArcGIS Pro. For example, joining from a table to a view doesn't seem to work; the joined fields are all null. Idea: Join to a FGDB database view.
Update:
I got a FGDB database view working that breaks the ties.
Again, I thought I had tried this previously and failed. But it seems to be working now.
select * from species_records s inner join ( select min(objectid) as objectid, min(t_unique_id) as t_unique_id, t_species, max(t_date) as t_date from (select * from species_records inner join (select t_species, max(t_date) as t_date from species_records group by t_species) l on species_records.t_species = l.t_species and species_records.t_date = l.t_date ) group by t_species ) l on s.t_unique_id = l.t_unique_id
Some of the fields have asterisks (*) beside the field names, indicating the field has an index.
I don't think I added those indexes intentionally. But maybe it happened when using a GP tool. I don't know if that helps these views work better or not. But it's something to keep in mind.
An observation:
Sometimes, SQL table aliases behave strangely in FGDB views. For example, SPECIES_RECORDS S and S.* . Field headers in the attribute table will occasionally show a S. prefix. Other times it won't show a prefix. Sometimes joined SQL fields with duplicate field names will show up in the attribute table, other times not. I don't know what the reasoning is behind that, but it tells me aliases and field names are buggy. If I'm having trouble creating a FGDB database view, then I'll fiddle with the aliases in the SQL. I'll change SELECT S.* to just SELECT * . Or something like that.
Maybe it helps, maybe it doesn't.
Related:
BUG-000164420: Database View created in file geodatabase with Join definition does not create the required duplicate fields
One thing that has helped me when writing FGDB SQL is to do a parallel test in a different geodatabase type where I can use a SQL client, such as DBeaver for mobile GDBs or SQL Developer for Oracle EGDBs. I put exactly the same data in a mobile GDB, mock up the SQL in DBeaver, then try to get something similar working in FGDB SQL.
It really helps me to write SQL in a SQL client that has proper error messages; it lets me make sure my syntax is correct in a normal environment first. FGDB SQL is challenging enough as it is; typos on my part make it so much worse, so I want to fix them ahead of time before dealing with FGDB SQL.
Great thank for this. The first SQL works perfectly, but as you say doesn't get rid of multiple species on same date record. In the second where does the t_unique_field come from? As its not in your original table.
I manually added the T_UNIQUE_ID field to the table after-the-fact. I think I just populated it using the field calculator by taking the OBJECTID and adding 99 (so that the first ID starts at 100). But you could use anything, including just copying the OBJECTID.
I honestly don't remember why I added the T_UNIQUE_ID field in the first place.
But, to my surprise, it seems to be necessary in order for the query to work in file geodatabases. If I were to replace the references to T_UNIQUE_ID in the query with OBJECTID, then the query would produce incorrect results. I'm not sure why that's happening since OBJECTID has unique values, similar to T_UNIQUE_ID. Both fields ought to be able to serve the same purpose in this scenario. To prove that point, using OBJECTID works fine in a mobile or enterprise (Oracle) geodatabase. It's just the file geodatabase that's returning the wrong rows.
So, I guess you'd need to use a unique ID column other than OBJECTID to make the query work. Having a unique ID column in a table other than OBJECTID is a good data management practice anyway, so maybe it's worth it.
The moral of the story here for me is that file geodatabase SQL is problematic. Use with caution.