Select By Attributes on joined data — For rows that are 1:M, all rows in the join table get selected, despite selection criteria

Bud · ‎03-04-2024

Title edited. Originally "1:M Join — Select By Attributes handling nulls incorrectly."

ArcGIS Pro 3.2.2; File Geodatabase:

I've joined TABLE_A to TABLE_B via common ASSET_ID text fields using Add Join. The join is one-to-many: 1 row in TABLE_A to 4 rows in TABLE_B.

I want to select from TABLE_B.ROAD_NAME (text) where ROAD_NAME is null. However, Select By Attributes doesn't seem to work correctly. It selects all four rows, regardless if ROAD_NAME is null or not.

Video:

(view in My Videos)

Notes:

I have the same issue in ArcGIS Pro 2.6.8 in a file geodatabase and in an Oracle 18c 10.7.1 enterprise geodatabase.
I don't have this problem with definition queries in ArcGIS Pro 3.2.2, only with Select By Attributes. There is a separate issue in 2.6.8 with definition queries: Esri Case #03387420 - Definition query on join returns incorrect rows.

I don't understand how that issue has been around since ArcGIS Pro 2.6.8 (or earlier) and is also in 3.2.2. How could such a serious issue not have been fixed as of 3.2.2?

GeeteshSingh07 · ‎03-05-2024

@Bud , found this: https://support.esri.com/en-us/bug/the-select-by-attributes-tool-does-not-select-only-reco-bug-00015...

BUG-000154798: The 'Select By Attributes' tool does not select only records with specified field values added through a one-to-many join but instead selects all records with duplicate ObjectIDs.

--
Best Regards,
Geetesh
[email protected]

View solution in original post

GeeteshSingh07 · ‎03-04-2024

@Bud, that's strange.

I tried same workflow on 3.1.0 and couldn't reproduce this issue...I also used 'Add Join' tool..

What if you use 'Join Field' gp tool, does the issue occur?

What if before join you run 'Calculate Field' tool on <Null> fields and run the query:

Then, perform a join and run 'Select By Attribute' tool?

--
Best Regards,
Geetesh
[email protected]

GeeteshSingh07 · ‎03-04-2024

@Bud I realized that I performed 1:1 join, my bad!

Yes, the issue does occur with 1:M join:

It's should be logged as a BUG.

--
Best Regards,
Geetesh
[email protected]

GeeteshSingh07 · ‎03-05-2024

@Bud , found this: https://support.esri.com/en-us/bug/the-select-by-attributes-tool-does-not-select-only-reco-bug-00015...

BUG-000154798: The 'Select By Attributes' tool does not select only records with specified field values added through a one-to-many join but instead selects all records with duplicate ObjectIDs.

--
Best Regards,
Geetesh
[email protected]

Bud · ‎03-05-2024

Your testing in 3.1.0 and that bug are very helpful. Thanks! I’m investigating.

I noticed that bug was submitted in December 29, 2022. I’m surprised the issue hasn’t been fixed yet; I view it as high priority.

My users are starting to get the impression that ArcGIS Pro can’t be trusted when it comes to tabular analysis and are looking for alternative software solutions that can be trusted. Related issue: Add Join not keeping all target features and Esri Case #03387420 - Definition query on join returns incorrect rows.

If anyone from Esri sees this, can I suggest that this issue be made a priority? I imagine the last thing Esri wants is for users to not trust ArcGIS Pro when it comes to querying GIS data.

Bud · ‎03-06-2024

@GeeteshSingh07 As the bug suggests, the issue isn't specific to nulls. It applies to any value. The bug title described it fairly well.

The bug title states the problem is with the duplicate ObjectIDs, which seems accurate. ArcGIS Pro is very dependent on ObjectIDs, so best guess is: it's making the selection using a given ObjectID. Since there are duplicate ObjectIDs, it's selecting all rows that have that ObjectID, which is incorrect.

Here's an example of Select By Attributes working correctly on joined data, since for all rows, the join is 1:1:

GeeteshSingh07 · ‎03-07-2024

@Bud , I agree..and should be fixed!

--
Best Regards,
Geetesh
[email protected]

Bud · ‎05-07-2024

It turns out the issue doesn't only apply to Select By Attributes. It also applies to manual selections.

With a 1:m join, if I manually select one of the rows that is 1:M, then all rows in the 1:M group will get selected, which is incorrect.

Video (using different sample data):

(view in My Videos)

Bud · ‎04-30-2024

Bud · ‎05-07-2024

Since this issue is specific to 1:M joins, a possible workaround is to make the join 1:1 instead of 1:M:

Use the Make Query Table tool to hide the OBJECTID field in the join table. Join tables that don't have an OBJECTID field are automatically treated as 1:1 (one-to-first) joins.
https://community.esri.com/t5/arcgis-pro-ideas/choose-if-join-will-be-one-to-first-or-1-many/idc-p/1...
Or, possibly join to a query layer or an unregistered database view, since I assume a join to an unregistered item will be one-to-first, similar to joining to an unregistered table: https://community.esri.com/t5/arcgis-pro-ideas/1-m-join-using-unregistered-tables-query-layers/idi-p...
Alternatively, ArcGIS Pro 3.3.0 has an OOTB option to choose if the join will be 1:1 or 1:M. So in 3.3+, choose 1:1 when making the join. https://community.esri.com/t5/arcgis-pro-ideas/choose-if-join-will-be-one-to-first-or-1-many/idc-p/1...

Or, to control what related record gets used in a 1:1 join, use this as the SQL expression in a Make Query Table (screenshot) :

table_b.objectid IN (SELECT objectid FROM table_b t2 WHERE t2.b_id = table_b.B_id ORDER BY b_text_field ASC FETCH FIRST ROW ONLY)

SQL expression source: Join — Control what related record gets used

The Make Query Table should include the OBJECTID field, since the SQL expression above references the OBJECTID.

Then, join to the Make Query Table.

The benefit of this approach is that it avoids issues with joining to a table that has a definition query, such as Add Join not keeping all target features and other bugs. The Make Query Table hides/embeds the SQL expression within the layer (it doesn't create a definition query), so that's how the usual "joining to a table with definition query" issues are avoided.

Also, as mentioned in a previous comment, definition queries don't have this problem. The problem is specific to selections. So in some cases, maybe a definition query could be used instead of a selection.

Other alternatives:

Do the join using SQL in a query layer or database view (the input table won't be editable, unlike Add Join). Ensure there is a truly unique ID field, possibly by using something like Oracle’s ROWNUM pseudo column, or the equivalent in your database.
Or, pare down the related table in a query layer/database view, so that joining to the query layer/database view is a 1:1 join. Editing the input table will be possible. But the join will be slow for medium to large datasets.
Edit: With that said, performance might have improved in 3.3.0:
"The geoprocessing framework uses workspace caching to improve performance when running scripts or repeated processes. This improvement applies particularly to remote data sources, such as enterprise geodatabases."
Or, export the data to the same enterprise geodatabase, or to a file geodatabase or mobile geodatabase. Do the selections there.
Or, for standalone tables, switch from joining from TABLE_A to TABLE_B, to joining from TABLE_B to TABLE_A. That might change the join from being 1:M to M:1 (1:1), which would eliminate the selection issue. But editing might be a problem, since only the input table can be edited in a join, not the join table.