Select to view content in your preferred language

Select maximum values in Select By Attributes (greatest n per group)

10044
24
Jump to solution
01-11-2024 07:04 PM
Bud
by
Esteemed Contributor

ArcGIS Pro 3.2.1; mobile geodatabase

I'm trying to learn about different ways to select the greatest n per group in ArcGIS Pro. I came across this old ArcMap article that talks about a possible technique:

Select minimum and maximum values in the Select By Attributes window in ArcMap

Use GROUP BY subquery clauses to return values from aggregate functions on sets of values.

Example #3:

Select the records from the table with the most recent date for sets of values, where <group_ID> is the field that contains the values that define the groups:

<date> = (SELECT MAX(date) FROM <table_name> GROUP BY <group_ID>)

I'll use the SQL expression above to try to select the latest row per species group. The result looks like this:

Bud_2-1705026308386.png

t_date = (
    select 
        max(t_date) 
    from 
        species_records
    group by
        t_species)

That's not the result I was looking for.

This would be the result I'm looking for:

Bud_0-1705060479030.png

t_date in (
    select
        max(t_date)
    from
        species_records s2 
    where
        s2.t_species = species_records.t_species)


What am I missing in that article?


Notes:

  • I'm aware that there will be duplicate rows selected per group if there are multiple rows per species that have the same date. Breaking the tie (arbitrarily) would be ideal. But I don't know if that's possible with this particular approach.
  • I've attached sample Excel data that can be loaded into a geodatabase.

Thanks.

24 Replies
Bud
by
Esteemed Contributor

Thanks! It's good to have solutions for both scenarios:

  1. Has ties
  2. Or, no ties

I'll follow up on your question and #2 in a separate reply.

But first, just to clarify regarding #1, are there any issues with the SQL expression I provided in the original post (the expression I wrote, not the one from the article).

t_date in (
    select
        max(t_date)
    from
        species_records s2 
    where
        s2.t_species = species_records.t_species)

I've always had a bit of a mental block when it comes to subqueries in the WHERE clause. When I look at my solution, it seems like it shouldn't work. But maybe it does?

In other words, is the resultset from my expression any different than the resultset from yours? 

0 Kudos
Bud
by
Esteemed Contributor

Here's a mobile geodatabase technique for #2 - no ties.

species_records.objectid IN (
         SELECT objectid
           FROM species_records r2
          WHERE r2.t_species = species_records.t_species
       ORDER BY T_date DESC  --, priority ASC
          LIMIT 1
                )

Bud_1-1705204060948.png

And a similar technique that works for Oracle:

species_records.objectid IN (
         SELECT objectid
           FROM species_records s2
          WHERE s2.t_species = species_records.t_species
       ORDER BY t_date DESC --, priority ASC
          FETCH FIRST ROW ONLY
                )
0 Kudos
JoshuaBixby
MVP Esteemed Contributor

For n=1, the results of your query and mine are the same.  That said, there are differences.  As I already mentioned, your query is only applicable for n=1 so it would not work in other n-cases.  Your query relies on a correlated subquery whereas mine relies on a couple co-routine calls and a couple subquery scans.  The correlated subquery is an expensive operation, but having more execution steps like in mine can add up as well.  How the two approaches perform relative to data set size?  I don't know. 

Since SQLite supports a row value on the left-hand side of the IN operator, an alternative to n=1 that doesn't require a correlated subquery is:

(t_species, t_date) IN (
  SELECT 
    t_species, 
    max(t_date) 
  FROM 
    species_records 
  GROUP BY 
    t_species
)
Bud
by
Esteemed Contributor

Since SQLite supports a row value on the left-hand side of the IN operator, an alternative to n=1 that doesn't require a correlated subquery is:

(t_species, t_date) IN (
  SELECT 
    t_species, 
    max(t_date) 
  FROM 
    species_records 
  GROUP BY 
    t_species
)

Very interesting. Thanks.

 

Mobile Geodatabase/SQLite:

For my notes, I'm confirming that the approach works in a mobile geodatabase. Greatest 1 per group (with ties), as expected.

Bud_0-1705266753825.png

 

Oracle 18c:

I was pleasantly surprised to see it also works in Oracle 18c. I didn't think Oracle supported "a row value on the left-hand side of the IN operator". But it does.

Bud_1-1705266960006.png

Oracle 18c demo: https://dbfiddle.uk/GznqkwlQ 

Edit: @JoshuaBixby, you might find this video on Oracle's KEEP clause interesting. The KEEP clause will KEEP your SQL queries SIMPLE!

 

File Geodatabase:

Unsurprisingly, it doesn't work in file geodatabases.

Bud_3-1705267190789.png


SQL Server:

I couldn't get it to work in SQL Server: https://dbfiddle.uk/dtiYKrf_ 


Postgres:

From a quick googling, it looks like that syntax isn't supported in PostgreSQL either.

 

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

It is supported in Postgres, it is called a row constructor on the left-hand side instead of row value, but the same syntax works for both:

https://www.postgresql.org/docs/current/functions-subquery.html#FUNCTIONS-SUBQUERY-IN

row_constructor IN (subquery)

The left-hand side of this form of IN is a row constructor, as described in Section 4.2.13

 

 

Bud
by
Esteemed Contributor

Do you figure it's logically impossible to get the greatest 1 per group (ties or no ties) in a FGDB SQL expression using current FGDB SQL functionality?

Since none of the following are supported:

  1. Correlated subqueries
  2. A row value on the left-hand side of the IN operator
  3. Concatenating a string and a date into a poor man's multi-field join:
    --works in normal databases, but not file geodatabases, due to limited concatenation functionality
    t_species||' '||t_date in ( select t_species||' '||max(t_date) from species_records group by t_species )
0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I wouldn't say it is logically impossible but technically impossible because of Esri's SQL implementation.

Concatenating text with a date is possible, in certain places, with file geodatabases, but the date must be CAST to text before concatenating:  SQL reference for query expressions used in ArcGIS—ArcGIS Pro | Documentation

CAST function

The CAST() function converts a value or an expression from one data type to another specified data type. The syntax is as follows:

CAST (expression AS data_type(length))

When casting a date to text in a file geodatabase, the outputted date format is:

File geodatabases support the use of a time in the date field, so this can be added to the expression:

Datefield = timestamp 'yyyy-mm-dd hh:mm:ss'

So,  2/2/2021 3:00 PM casted to text is '2021-02-02 15:00:00'.

Although dates can be cast to text and concatenated with additional text, e.g., a species name, it apparently isn't allowed within a subquery, so that takes it off the table in this case.  The following SQL works as a file geodatabase view

 

SELECT 
  MAX(t_species || CAST(t_date AS CHAR(20))) 
FROM 
  species_records 
GROUP BY 
  t_species

 

but embedding that same SQL into a subquery (even a subquery in another file geodatabase view) generates an error.  Note:  The MAX function needs to be applied to the concatenated result and not just the date field.

Bud
by
Esteemed Contributor

Interesting.

I couldn't even get CAST() to work on its own in the WHERE clause in a simple view.

select
    *
from
    species_records
where
    cast(t_date as char(50)) is not null 

Bud_1-1705436912531.png

ERROR 160195: An invalid SQL statement was used.

But that query works fine in Oracle and in a mobile geodatabase.

I'll report it to support as a FGDB bug.

 

Side note:

I left some feedback recently on the FGDB SQL page: https://pro.arcgis.com/en/pro-app/latest/help/data/geodatabases/manage-file-gdb/sql-reporting-and-an...

Esri wrote back today and said, "The page you reference has been mostly rewritten for the upcoming release, so you will see many changes in it later this year..."

Fingers crossed those docs have been improved.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Rewriting the documentation and rewriting the SQL support are very different, and I suspect they only mean the former and not the latter.  That said, the documentation can definitely be polished up.

JoshuaBixby
MVP Esteemed Contributor

Regarding not being able to create a database view with CAST in the WHERE clause, that is interesting because CAST works in the Select By Attributes tool, which of course is the WHERE clause:

 

CAST(t_date AS CHAR(20)) IS NOT NULL