Select maximum values in Select By Attributes (greatest n per group)

2612
23
Jump to solution
01-11-2024 07:04 PM
Labels (1)
Bud
by
Notable Contributor

ArcGIS Pro 3.2.1; mobile geodatabase

I'm trying to learn about different ways to select the greatest n per group in ArcGIS Pro. I came across this old ArcMap article that talks about a possible technique:

Select minimum and maximum values in the Select By Attributes window in ArcMap

Use GROUP BY subquery clauses to return values from aggregate functions on sets of values.

Example #3:

Select the records from the table with the most recent date for sets of values, where <group_ID> is the field that contains the values that define the groups:

<date> = (SELECT MAX(date) FROM <table_name> GROUP BY <group_ID>)

I'll use the SQL expression above to try to select the latest row per species group. The result looks like this:

Bud_2-1705026308386.png

t_date = (
    select 
        max(t_date) 
    from 
        species_records
    group by
        t_species)

That's not the result I was looking for.

This would be the result I'm looking for:

Bud_0-1705060479030.png

t_date in (
    select
        max(t_date)
    from
        species_records s2 
    where
        s2.t_species = species_records.t_species)


What am I missing in that article?


Notes:

  • I'm aware that there will be duplicate rows selected per group if there are multiple rows per species that have the same date. Breaking the tie (arbitrarily) would be ideal. But I don't know if that's possible with this particular approach.
  • I've attached sample Excel data that can be loaded into a geodatabase.

Thanks.

1 Solution

Accepted Solutions
JoshuaBixby
MVP Esteemed Contributor

Wow, whoever wrote that Support article needs to brush up on their SQL.  The logic of that SQL query is flawed because it could select multiple records for a given set if a set had a non-maximal date that happened to be a maximal date for another set.  Usually when people are selecting maximum per group they want only the maximum selected (or multiple if maximum is shared), and not the maximum and maybe some others as well.

View solution in original post

23 Replies
JoshuaBixby
MVP Esteemed Contributor

Wow, whoever wrote that Support article needs to brush up on their SQL.  The logic of that SQL query is flawed because it could select multiple records for a given set if a set had a non-maximal date that happened to be a maximal date for another set.  Usually when people are selecting maximum per group they want only the maximum selected (or multiple if maximum is shared), and not the maximum and maybe some others as well.

Bud
by
Notable Contributor

Esri Case #03527950 - Article 000008936 Select minimum and maximum values in the Select By Attributes window in ArcMap — Example SQL is incorrect (submitted Jan 12, 2024)

As this documentation is retired the Esri Inc will not make changes to the documentation as this relates to currently unsupported versions which may reference retired coding workflows.
 
I did complete testing using the suggested arcade expression within the documentation and did have the expression provide inconsistent results. The expression did work correctly in a few scenarios. 
0 Kudos
Bud
by
Notable Contributor

Esri Support:

...we will not be able to move forward with this case as the documentation you have highlighted is for no longer supported versions of the software and therefore will not be updated. 

0 Kudos
DavidPike
MVP Frequent Contributor

I'm really struggling with subqueries in these FGDB views.  They seems temperamental with warnings about 'cannot be edited in clause mode' when I'm not in clause mode, or working then not working on a whim.  I thought I understood the limitation was only scalar results could be returned by the subquery, but  then I see otherwise.  

Bud
by
Notable Contributor

Yes, I’ve been struggling with FGDB views as well. I talk about the issues I’ve been having in my reply here:

Make Query Table — Use date field in join

Summary: Creating a “view on a view” (joining on multiple fields) in an attempt to get around the “no correlated subqueries” limitation didn’t work for me. Views on views don’t seem to work.

0 Kudos
Bud
by
Notable Contributor

One thing that has helped me when writing FGDB SQL is to do a parallel test in a different geodatabase type where I can use a SQL client, such as DBeaver for mobile GDBs or SQL Developer for Oracle EGDBs. I put exactly the same data in a mobile GDB, mock up the SQL in DBeaver, then try to get something similar working in FGDB SQL.

It really helps me to write SQL in a SQL client that has proper error messages; it lets me make sure my syntax is correct in a normal environment first. FGDB SQL is challenging enough as it is; typos on my part make it so much worse, so I want to fix them ahead of time before dealing with FGDB SQL.

0 Kudos
Bud
by
Notable Contributor

@DavidPike 

I'm really struggling with subqueries in these FGDB views.  They seems temperamental with warnings about 'cannot be edited in clause mode' when I'm not in clause mode

My understanding is that clause mode is the non-SQL mode. I think it's the mode where you construct a query using the UI pick lists.

If I'm writing a complex SQL query in SQL mode, then it makes sense to get the message, "The expression can't be edited in clause mode", since that complex SQL wouldn't be parsable into the UI pick lists. I just ignore that message.

 

Are you seeing something different?

0 Kudos
Bud
by
Notable Contributor

I'd like to ask a moderator to move this string of replies to our FGDB view post, since this current post is mostly about mobile geodatabases, not FGDBs. Is that ok with you?

Move to: Selecting the most recent records based on unique values in another field

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

The screenshot with the desired outcome appears to be greatest-n-per-group with "n" being 1 where the greatest has ties.  Are you interested in greatest-singular or greatest-n?  If the former, it simplifies the situation a bit.

UPDATE:  The following is a generalized greatest-n-per-group that supports ties in SQLite.  I can't speak to its performance, and it is a bit ugly with the nested subquery (nesting is forced with a window function), but it works and is a place to start:

ObjectID IN (
  SELECT
    ObjectID
  FROM
    (
      SELECT
        ObjectID,
        DENSE_RANK () OVER (
          PARTITION BY t_species
          ORDER BY t_date DESC
        ) AS date_rank
      FROM
        speciesrecords
    ) 
  WHERE 
    date_rank <= 1
)

 

 

The date_rank controls the "n" being selected.

 

0 Kudos