Select to view content in your preferred language

Is ArcGIS Pro the right tool for tabular/join-based geodatabase analysis?

449
5
04-28-2024 10:40 PM
Labels (2)
Bud
by
Notable Contributor

ArcGIS Pro 2.9.5; Oracle 18c 10.7.1 eGDB:

My unit has recently completed a months-long project that involved extensive tabular analysis of our EGDB data. The analysis involved complex queries, including non-spatial joins.

Ultimately, my coworkers found they couldn't use ArcGIS Pro for the analysis and ended up using Excel as a temporary workaround. Now that the project is complete, they've asked the following question:

Going forward, is ArcGIS Pro the right tool for tabular/join-based analysis of our enterprise geodatabase data?

Based on the issues listed below, we are coming to the conclusion that the answer is, unfortunately, no.

  1. Select By Attributes on joined data: For rows that are 1:M, all rows in the join table get selected, despite the selection SQL expression.
    1. BUG-000154798
  2. No virtual attribute functionality
    1. Whereas, tools like Excel can have real-time calculated fields (formulas) in true Excel tables (or Power Query) for ad hoc analysis.
  3. Trust issues with query layers: Query Layers omit rows from the underlying query without any warning.
    1. Here is a specific example, but there are other cases too: Idea - Query Layers — Include features with null shape in attribute table
  4. Slow joins to query layers
    1. BUG-000160320
    2. Esri Case: 03356933
  5. Definition query on join table will effectively result in an inner join
    1. Bug/Article-000007166, BUG-000153808, BUG-000131999, BUG-000129607, BUG-000153950, BUG-000164109
  6. Fixed in 3.x?
    1. Definition query on join returns incorrect rows
    2. Export join to FGDB: Join table ID field is null
      • Esri Case: 03616469
      • BUG-000146098
  7. Join with definition query: Export has more rows than input (definition query is ignored)
    1. BUG-000135505
    2. Esri Case: 03619440
  8. Can't sort on join table field in attribute table
    1. BUG-000157874
    2. Esri Case: 03616093
  9. Can’t create 1:M join using unregistered tables, query layers, and database views
  10. It’s not always clear why a join is one-to-first vs. one-to-many.
  11. There are likely other issues that I’ve missed.

It's unfortunate that we feel Pro isn't the right tool for tabular analysis of geodatabase data. We've been die-hard ArcGIS Pro users up to this point. But ultimately, users have developed trust issues and have come up against too many walls when doing tabular analysis in Pro.

What tools do experienced practitioners use for complex tabular analysis of enterprise geodatabase data?

The current use case is: analysis of construction project GIS data, with no room for error. And many other use cases.

5 Replies
Bud
by
Notable Contributor

Some quotes from colleagues from other organizations:

Biology:

It is this kind of thing that has led me to not trust joins in Arc, especially with bigger datasets. When I need to join tables, I typically use FME or R as they seem more stable with large datasets and produce more predicable results.

Public Works:

Pro can be flakey when you have joins. Lots of unexpected behaviour. I use SQL in the backend instead.

Engineering/Consulting:

I've come across similar issues and limitations. I'm researching alternatives as well.

I'm under the impression that there aren't any immediate plans to fix the issues mentioned in the post above (other than the issues in #6, which I think are fixed in 3.x).

If that's the case, then what would be a good alternative to Pro for tabular/join-based analysis of real-time EGDB data?

  • I'm comfortable using SQL in Oracle SQL Developer. But that doesn't help my coworkers, who aren't SQL specialists.
  • Excel (as a real-time front-end to the EGDB via ODBC) isn't appealing for obvious reasons, even with improved functionality like Power Query and the XLookup function.
  • MS Access might have been ok back in the day (as a real-time front-end to the EGDB via ODBC). But at this point, it is very dated, clunky, and limited. For example, IIF statements are very cumbersome in Access compared to normal SQL CASE functionality.
0 Kudos
Eugene_Adkins
Occasional Contributor III

I would say, as you’ve discovered, the answer is Pro until it isn’t. It’s tricky at times and frustrating because it seems as if when a bug gets addressed in one version, another bug is created.

I use Pro for temporary joins fairly often (usually weekly) but I have to keep the issues in mind and try to pay attention to results that do not make sense, which isn’t always easy to do. When I want to double check, my answer is Excel; and for some tasks I use Excel before I use Pro.

Bud
by
Notable Contributor

Thanks. This sounds very familiar. It helps to know we’re not the only ones facing these challenges.

0 Kudos
Bud
by
Notable Contributor

Related: Modern Data Analytics in Excel: Using Power Query, Power Pivot, and More for Enhanced Data Analytics by George Mount

0 Kudos
JonathanNeal
Esri Contributor

@Bud Yes don't do analysis on a joined layer.  You persist it to disk first.
When I wore an analyst hat...
1. Open my Project template with a Query layer to my enterprise database.
2. Copy the data locally with a date suffix
3. Do my work.

Details:

Agree about your list, looks just like my own, but has some checkboxes completed beyond 2.9.
 - Definition Query.  If those are needed you should move to 3.2.
  Note the seemingly inner join behavior is not something we addressed.  Definition queries are separate from joins.  We do not want to alter definition queries to guess at user intent.  What we did do is add an informational banner to inform users they may wish to update.
 - One-To-First vs One-To-Many, ArcMap vs Pro behavior.  Many Pro users came from ArcMap and did not want the default join to change their scripts.
 - Trust issue is still outstanding, but we did add a checkbox in 3.3 to rebuild the index.

Work to do for running "analysis" on a joined layer:

Tools work on a record.  Analysis tools keep things organized by an objectId.  Since the objectId is duplicated with a 1:many join, we need to make a decision on how to process these.  Summations are obvious (and work), we can bring those to a single objectId.  Intersections and other geometry tools need to have a unique record id to properly process the data; so instead of giving incorrect results we error quickly to avoid a lengthy weight for bad output.  Hence why we recommend persisting your data before you do analysis.

0 Kudos