Geoprocessing tool to select greatest 1 per group with tiebreaker

488
4
12-09-2022 04:19 AM
Status: Open
Labels (1)
Bud
by
Notable Contributor

Idea: Need an OOTB geoprocessing tool that can be used to select the greatest 1 per group in a FGDB table.

For example, for a given ASSET_ID, select the road inspection that has the latest date.

Bud_0-1670588140311.png

I don’t want to create a new FC or create a database view. I want to make a selection in the attribute table of an existing FGDB table.

Related question: GP tool to select greatest 1 per group



Edit:

The tool should also let you break ties. So, if for a given asset there were multiple rows with the same date, then use a different field to determine what row to use (aka what row to break the tie).

Similar to what the SQL expression in this answer does: Using a WHERE clause subquery, select the greatest 1 per group with a tie-breaker

roadinsptable.objectid IN (
     SELECT objectid
       FROM roadinsptable r2
      WHERE r2.asset_id = roadinsptable.asset_id
   ORDER BY date_ DESC, condition DESC
      LIMIT 1
            )

That SQL expression (or similar) works for mobile GDBs and enterprise GDBs, but not for file GDBs. So we need a geoprocessing tool.


There are cases when the tiebreaker would be arbitrary. Any of the ties could be used since we don't always have a different field that can be used to break the tie.

But there are other cases where we do have an additional field that we want to use to break the tie (get the max or min of the tiebreaker field per group). So we need the option to use additional fields as tiebreakers.


The tool needs to work for standalone tables.

4 Comments
Bud
by

@DanLee and @SeanLim 

Is there any chance you'd be interested in mocking up a custom tool for this, similar to what we did with the Shrink Lines — GP Tool?

I'm currently knee-deep in trying to find a solution that works for file geodatabases. I imagine I'd be the right person to test such a tool. And the timing is good right now, especially since there's an active discussion on the topic in this thread: Selecting the most recent records based on unique values in another field. The folks in that post might be interested in contributing feedback too.

Thoughts? Thanks.


Sample Excel file that can be loaded into a geodatabase: https://community.esri.com/ccqpr47374/attachments/ccqpr47374/arcgis-pro-ideas/27986/1/species_record... 

DanLee
by Esri Regular Contributor

Hi Bud,

Thank you for thinking of me/us.  I did the following using existing tools:
 
1. I put your data into a gdb, added a new field MyDate (date type), and converted the information in t_date (a text field) into real date values into MyDate.
 
2. Run Summary Statistics to get the "max" date per species:
DanLee_0-1705425602731.png

3. Run Join Field to transfer MAX_MyDate back to the input table via the common field t_species:

DanLee_1-1705425676008.png

4. Run Select Layer By Attribute to get the expected selection:

DanLee_2-1705425860930.png

 

You can put the above into a model or a script.  Hope this gives you a lead.  Of course you can script without running the existing tools.  I won't have time to help you get that done, sorry.

Bud
by

Thanks Dan!

Notes:

  1. I had come up with a similar approach. Glad to see I was on the right track: 
    https://community.esri.com/t5/arcgis-pro-questions/selecting-the-most-recent-records-based-on-unique...
  2. For our records, your approach gets the greatest 1 per group (with ties). Whereas my approach gets the greatest 1 per group (no ties). Both scenarios are valid.
  3. You might find this related idea interesting: Summary Statistics fields — Tie fields to a primary field

Cheers.

DanLee
by Esri Regular Contributor

Yes, Bud I didn't address the tie breaker part.  It is possible to add analysis and a tie breaker field choice.  However, you would need to handle the ties when the tie breaker field contains ties.  If you already have a solution, that's wonderful.  

Thanks for sharing the links to other related topics. I will take a look when I have time. 🙂  

Cheers!