Join — Control what related record gets used

2743
19
12-03-2022 10:43 AM
Status: Open
Labels (1)
Bud
by
Notable Contributor

It would be helpful if we could control the logic used for getting the records on the many side of joins.

For example, choose an ORDER BY field: Grab the related record with the latest date, or something like that. Specify ascending or descending order.

Even better, make it a user-defined ORDER BY sql clause, so that we can use specialized logic like:


Related: Choose if join will be 1:1 or 1:Many

19 Comments
JeffWard

I can get behind this.

ScottFedak2085

I feel like I've ran into other use cases via geoprocessing tools, python scripts, etc. where a functionality along these lines could be very useful. It's sometimes difficult to find documentation or troubleshoot what ESRI is actually utilizing to iterate over a collection of objects. You would think it'd be OBJECTID, but I feel like I've ran into situations where that didn't appear to be the case. Even if it were OBJECTID, that doesn't provide a ton of user-ability to control the iteration process.

I like the idea of a user defined clause, and I think that would be preferred as it could be integrated into automation. However, even a general radio button that would specify that a join, geoprocess, ArcPy function, etc. honors how a table or feature class is currently ordered would be a useful upgrade.

JonathanNeal

Very glad to see this idea come up here!
Currently in 3.1 we have added this ability for the Join Field tool.  The same matching code is used for Add Join.

We are considering/tracking this with a support issue for Add join to allow one to first joining.

We can easily implement a one-to-first joining that will use ObjectId to do the joining and require that both data be within the same workspace (on our near term radar). 

If there are enough upvotes and comments we may be able to get development time for adding the custom control to expose OrderBy clause as a control.

Bud
by

Here's a workaround for enterprise geodatabases and mobile geodatabases (not file geodatabases). It also works for Excel files since ArcGIS uses SQLite SQL for Excel file SQL expressions.

Create a definition query on the join table -- to pare down the rows so that there's only one row for each ID. Examples:

Oracle (FETCH):

roadinsptable.objectid IN (
     SELECT objectid
       FROM roadinsptable r2
      WHERE r2.asset_id = roadinsptable.asset_id
   ORDER BY date_ DESC, condition DESC
      FETCH FIRST ROW ONLY
            )

SQLite/mobile geodatabase and Excel files (LIMIT):

roadinsptable.objectid IN (
     SELECT objectid
       FROM roadinsptable r2
      WHERE r2.asset_id = roadinsptable.asset_id
   ORDER BY date_ DESC, condition DESC
      LIMIT 1
            )

(likely works for other databases too)

Those queries only return one row per ASSET_ID (the field I'll be joining on). If there is a tie, then it uses a different field, CONDITION, to break the tie.


Next, join to the table that has the definition query.

 

Result:

That lets us control what related record is used in the join. The join is now 1:1 instead of one-to-first or 1:M.

So that satisfies the original problem, but only for datatypes that have full SQL support. So it doesn't help us for file geodatabases or other datatypes that don't support correlated subqueries in SQL expressions. More info: Support correlated subqueries in file geodatabase SQL expressions.

 

Edit:

On closer inspection, there's a problem with this approach (ArcGIS Pro 3.0.3; mobile geodatabase).

When I try to create the join on the table with the definition query, I get an error:

Failed to refresh table. 
Error:
Underlying DBMS error [ambiguous column name:
objectid] [Roads][STATE_ID = 0]

Bud_0-1670784209383.png

I tried adding prefixes to all fields in the subquery, but that had no effect on the error:

objectid IN (
     SELECT r2.objectid
       FROM roadinsptable r2
      WHERE r2.asset_id = roadinsptable.asset_id
   ORDER BY r2.date_ DESC, r2.condition DESC
      LIMIT 1
            )


I'm not sure what the cause is.


I have the same problem in an Oracle 18c 10.7.1 enterprise gdb (ArcGIS Pro 2.6.8):

Failed to refresh table.
Error:
Underlying DBMS error [ORA-00918: column
ambiguously defined] [DBJoin1]

Bud_0-1670784956234.png

 

Solved:

I had to add a table prefix (aka "fully qualify the table name") for the objectid in the first line:

Before:

 objectid IN (

After:

 roadinsptable.objectid IN (

I've updated the original queries.

Idea: Join to table with definition query — column name should get prefixed (SQL expression with subquery)

BUG-000155819: Attribute table of joined layer in Mobile Geodatabase does not open when a definition query with a subquery is applied to the join table.

BenCunningham

I would love to see this as well, specifically the one-to-first for the latest date, as you said.

BrantCarman

I just found out that you can do this in ArcGIS Online join tool.  I thought I must have been missing something in Arc Pro all this time, but apparently not!  

I've been wishing we had this ability for years.  Can't believe it made it to AGOL before Pro!

My work around now is to create a hosted live Join View layer from join tool in map viewer classic, grabbing the most recent record from the join table using ObjectID descending order.  This gives me a live layer that I can symbolize from an attribute in my join table (from most recently created record).

I still have a need to get this attribute into my parent table, so now in Arc Pro I join the new live join layer from AGOL back to my parent layer/table and use field calculator to populate that most recent value back into my parent table.

Crazy round-about solution.

Bud
by

@BrantCarman Interesting. Yes, it's weird that you can do it in ArcGIS Online, but not ArcGIS Pro (at least not out of the box).

Out of curiosity, is there a reason why the definition query workaround I mentioned above wouldn't work for you?


SQLite/mobile geodatabase (likely works for other databases too):

roadinsptable.objectid IN (
     SELECT objectid
       FROM roadinsptable r2
      WHERE r2.asset_id = roadinsptable.asset_id
   ORDER BY date_ DESC, condition DESC
      LIMIT 1
            )

I thought that was kind of a neat solution. But it doesn't work for file geodatabases, since they don't have full SQL support.

BrantCarman

@Bud good question.  I'm working with Hosted Feature Layers primarily and plugging the layers into a Python script to automate some field calculations between join and input table.  I didn't even think to try your definition query.  Any experience trying it with AGOL hosted Feature Layers? Or with setting the definition in a python script?

JonathanNeal

@BrantCarman been working on this in 3.2 the definition queries transfering and updating is mostly correct/better than ArcMap.  Maybe my testing script below can be helpful to you in trying this approach.

 

lyr = arcpy.management.MakeFeatureLayer('country_geom', "lyr_with_def_query")[0]
lyr.updateDefinitionQueries([{'name': 'fc_query', 'sql': "country_geom.ISO_3DIGIT LIKE 'A%'", 'isActive': True}])
tab = arcpy.management.MakeTableView('country_tab', "tab_with_def_query", "ISO_3DIGIT LIKE '%B%'")[0]
tab.updateDefinitionQueries([{'name': 'tab_query', 'sql': "country_geom.ISO_3DIGIT LIKE '%B%'", 'isActive': True}]) #put your own subquery code here
result_with_and_2_def_query = arcpy.management.AddJoin(lyr, 'ISO_3DIGIT', tab, 'ISO_3DIGIT')[0]
dfqs = result_with_and_2_def_query.listDefinitionQueries()
print(dfqs)
result_less_join = arcpy.management.RemoveJoin(result_with_and_2_def_query)[0]
dfqs_remove = result_less_join.listDefinitionQueries() #Check to ensure that it is removed properly
print(dfqs_remove)

 

 

One TODO (Not fixed yet):
SubQueries still require manual intervention: https://community.esri.com/t5/arcgis-pro-ideas/join-to-table-with-definition-query-column-name/idi-p...