ArcGIS Pro 3.2.1; mobile geodatabase
I'm trying to learn about different ways to select the greatest n per group in ArcGIS Pro. I came across this old ArcMap article that talks about a possible technique:
Select minimum and maximum values in the Select By Attributes window in ArcMap
Use GROUP BY subquery clauses to return values from aggregate functions on sets of values.
Example #3:
Select the records from the table with the most recent date for sets of values, where <group_ID> is the field that contains the values that define the groups:
<date> = (SELECT MAX(date) FROM <table_name> GROUP BY <group_ID>)
I'll use the SQL expression above to try to select the latest row per species group. The result looks like this:
t_date = ( select max(t_date) from species_records group by t_species)
That's not the result I was looking for.
This would be the result I'm looking for:
t_date in ( select max(t_date) from species_records s2 where s2.t_species = species_records.t_species)
What am I missing in that article?
Notes:
Thanks.
Solved! Go to Solution.
Thanks! It's good to have solutions for both scenarios:
I'll follow up on your question and #2 in a separate reply.
But first, just to clarify regarding #1, are there any issues with the SQL expression I provided in the original post (the expression I wrote, not the one from the article).
t_date in ( select max(t_date) from species_records s2 where s2.t_species = species_records.t_species)
I've always had a bit of a mental block when it comes to subqueries in the WHERE clause. When I look at my solution, it seems like it shouldn't work. But maybe it does?
In other words, is the resultset from my expression any different than the resultset from yours?
Here's a mobile geodatabase technique for #2 - no ties.
species_records.objectid IN ( SELECT objectid FROM species_records r2 WHERE r2.t_species = species_records.t_species ORDER BY T_date DESC --, priority ASC LIMIT 1 )
And a similar technique that works for Oracle:
species_records.objectid IN ( SELECT objectid FROM species_records s2 WHERE s2.t_species = species_records.t_species ORDER BY t_date DESC --, priority ASC FETCH FIRST ROW ONLY )
For n=1, the results of your query and mine are the same. That said, there are differences. As I already mentioned, your query is only applicable for n=1 so it would not work in other n-cases. Your query relies on a correlated subquery whereas mine relies on a couple co-routine calls and a couple subquery scans. The correlated subquery is an expensive operation, but having more execution steps like in mine can add up as well. How the two approaches perform relative to data set size? I don't know.
Since SQLite supports a row value on the left-hand side of the IN operator, an alternative to n=1 that doesn't require a correlated subquery is:
(t_species, t_date) IN (
SELECT
t_species,
max(t_date)
FROM
species_records
GROUP BY
t_species
)
Since SQLite supports a row value on the left-hand side of the IN operator, an alternative to n=1 that doesn't require a correlated subquery is:
(t_species, t_date) IN ( SELECT t_species, max(t_date) FROM species_records GROUP BY t_species
)
Very interesting. Thanks.
Mobile Geodatabase/SQLite:
For my notes, I'm confirming that the approach works in a mobile geodatabase. Greatest 1 per group (with ties), as expected.
Oracle 18c:
I was pleasantly surprised to see it also works in Oracle 18c. I didn't think Oracle supported "a row value on the left-hand side of the IN operator". But it does.
Oracle 18c demo: https://dbfiddle.uk/GznqkwlQ
Edit: @JoshuaBixby, you might find this video on Oracle's KEEP clause interesting. The KEEP clause will KEEP your SQL queries SIMPLE!
File Geodatabase:
Unsurprisingly, it doesn't work in file geodatabases.
SQL Server:
I couldn't get it to work in SQL Server: https://dbfiddle.uk/dtiYKrf_
Postgres:
From a quick googling, it looks like that syntax isn't supported in PostgreSQL either.
It is supported in Postgres, it is called a row constructor on the left-hand side instead of row value, but the same syntax works for both:
https://www.postgresql.org/docs/current/functions-subquery.html#FUNCTIONS-SUBQUERY-IN
row_constructor IN (subquery)The left-hand side of this form of IN is a row constructor, as described in Section 4.2.13
Do you figure it's logically impossible to get the greatest 1 per group (ties or no ties) in a FGDB SQL expression using current FGDB SQL functionality?
Since none of the following are supported:
--works in normal databases, but not file geodatabases, due to limited concatenation functionality
t_species||' '||t_date in ( select t_species||' '||max(t_date) from species_records group by t_species )
I wouldn't say it is logically impossible but technically impossible because of Esri's SQL implementation.
Concatenating text with a date is possible, in certain places, with file geodatabases, but the date must be CAST to text before concatenating: SQL reference for query expressions used in ArcGIS—ArcGIS Pro | Documentation
CAST function
The CAST() function converts a value or an expression from one data type to another specified data type. The syntax is as follows:
CAST (expression AS data_type(length))
When casting a date to text in a file geodatabase, the outputted date format is:
File geodatabases support the use of a time in the date field, so this can be added to the expression:
Datefield = timestamp 'yyyy-mm-dd hh:mm:ss'
So, 2/2/2021 3:00 PM casted to text is '2021-02-02 15:00:00'.
Although dates can be cast to text and concatenated with additional text, e.g., a species name, it apparently isn't allowed within a subquery, so that takes it off the table in this case. The following SQL works as a file geodatabase view
SELECT
MAX(t_species || CAST(t_date AS CHAR(20)))
FROM
species_records
GROUP BY
t_species
but embedding that same SQL into a subquery (even a subquery in another file geodatabase view) generates an error. Note: The MAX function needs to be applied to the concatenated result and not just the date field.
Interesting.
I couldn't even get CAST() to work on its own in the WHERE clause in a simple view.
select * from species_records where cast(t_date as char(50)) is not null
ERROR 160195: An invalid SQL statement was used.
But that query works fine in Oracle and in a mobile geodatabase.
I'll report it to support as a FGDB bug.
Side note:
I left some feedback recently on the FGDB SQL page: https://pro.arcgis.com/en/pro-app/latest/help/data/geodatabases/manage-file-gdb/sql-reporting-and-an...
Esri wrote back today and said, "The page you reference has been mostly rewritten for the upcoming release, so you will see many changes in it later this year..."
Fingers crossed those docs have been improved.
Rewriting the documentation and rewriting the SQL support are very different, and I suspect they only mean the former and not the latter. That said, the documentation can definitely be polished up.
Regarding not being able to create a database view with CAST in the WHERE clause, that is interesting because CAST works in the Select By Attributes tool, which of course is the WHERE clause:
CAST(t_date AS CHAR(20)) IS NOT NULL