Select to view content in your preferred language

Join — Control what related record gets used

3253
20
12-03-2022 10:43 AM
Status: Open
Labels (1)
Bud
by
Honored Contributor

It would be helpful if we could control the logic used for getting the records on the many side of joins.

For example, choose an ORDER BY field: Grab the related record with the latest date, or something like that. Specify ascending or descending order.

Even better, make it a user-defined ORDER BY sql clause, so that we can use specialized logic like:


Related: Choose if join will be 1:1 or 1:Many

20 Comments
Bud
by

@BrantCarman 

I didn't even think to try your definition query. Any experience trying it with AGOL hosted Feature Layers? Or with setting the definition in a python script?

Sorry, no experience with either.

Edit: What kind of database is AGOL behind the scenes? PostgreSQL? If so, then the SQLite definition query I mentioned might work, since PostgreSQL seems to support the LIMIT clause, just like SQLite.

BrantCarman

Thanks @JonathanNeal I'll make a note to try this out, but unfortunatley entering the "busy season" and not much time to R&D on this project.

My hope is that this idea gets some more traction and ESRI implements sooner rather than later, because it's a great idea and would be an awesome functionality in ArcGIS Pro!

Bud
by

@BrantCarman Does this help you at all?

Use a SQL expression inside a Python or Arcade field calculation?

Interestingly, SQL is an option when using the field calculator on a hosted feature service.

I don't use hosted feature services (is that the same thing as AGOL?), so I don't know if it applies or not.

For example, could you do a SQL field calculation in ArcGIS Online, and use some SQL logic like I mentioned above to "grab the most recent record from the join table using ObjectID descending order"?

JonathanNeal

FYI, another option that is supported on file geodatabases are views:
https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/create-database-view.htm

Bud
by

@JonathanNeal I couldn't find a way to "control what related record is used" (such as get the latest inspection for each road) when using a FGDB database view, since FGDB SQL doesn't support correlated subqueries.

JonathanNeal

@Bud I group by the join Id and used an aggregate to choose my values of interest.

Select Name, max(Desc) as myD from JoinTable group by Name



Bud
by

@JonathanNeal 

There is a classic FGDB use case in this post: Selecting the most recent records based on unique values in another field.

I tried using your "group by in a database view" technique:

--My FGDB table is called: species_records.
--The FGDB view below will be called: latest_vw.

select t_species as v_species, max(t_date) as v_date from species_records group by t_species

The requirement is: Select the latest record for each species group. Include all fields from the original table.

When it comes to your group by/view technique, I don't know how to bring all the other fields into the query. In other words, I don't know how to link the original table to the group by view.

I've tried a number of different SQL techniques (see Joins based on multiple fields) and also a "view on a view" that work in a mobile geodatabase but don't work in a file geodatabase, due to various FGDB SQL bugs.

Any ideas? 

JonathanNeal

@Bud  (Untested) I didn't get to trying it yet but put comments on what I am hoping will happen.
SQL for reporting and analysis on file geodatabases—ArcGIS Pro | Documentation


--My FGDB table is called: species_records.
--The FGDB view below will be called: latest_vw.

select
    t_species as v_species,
    SUBSTRING(max(CONCAT(EXTRACT(YEAR FROM t_date), EXTRACT(MONTH FROM t_date), EXTRACT(DAY FROM t_date), OBJECTID)), 10, 10) as LastUpdatedOID 
    -- Cast date as text in the format YYYYMMDD
    -- Get Field of interest, OBJECTID in our case
    -- Skip the first 10 characters
from
    species_records
group by
    t_species



Bud
by

@JonathanNeal 

This is what I came up with. It's a FGDB database view.

Select the greatest 1 per group, with ties:

select
    *
from
    species_records
inner join
    (select
        t_species,
        max(t_date) as t_date 
    from
        species_records 
    group by
        t_species) l
    on species_records.t_species = l.t_species 
       and species_records.t_date = l.t_date

 

Or, select 1 per group, without ties:

select
    *
from
    species_records s
inner join
    (
    select
        min(objectid) as objectid,
        min(t_unique_id) as t_unique_id,
        t_species,
        max(t_date) as t_date
    from
        (select
            *
        from
            species_records
        inner join
            (select
                t_species,
                max(t_date) as t_date 
            from
                species_records 
            group by
                t_species) l
            on species_records.t_species = l.t_species 
               and species_records.t_date = l.t_date
        )
    group by
        t_species
    ) l
    on s.t_unique_id = l.t_unique_id


Source: Selecting the most recent records based on unique values in another field

I couldn't find a way to do it in a SQL expression. Only in a database view.

Sample Excel data: https://community.esri.com/t5/arcgis-pro-ideas/fgdb-database-views-view-that-selects-from-an/idi-p/1... 

Bud
by

I'm starting to wonder if my original subquery approach was flawed:

Oracle (FETCH):

roadinsptable.objectid IN (
     SELECT objectid
       FROM roadinsptable r2
      WHERE r2.asset_id = roadinsptable.asset_id
   ORDER BY date_ DESC, condition DESC
      FETCH FIRST ROW ONLY
            )

Reason:
In the join, Pro uses the subquery to pare down the entire query, not just pare down the join table. More info here: Add Join — Create independent definition query on join table (pre-filter)