ArcGIS Pro 2.9.5; Oracle 18c 10.7.1 eGDB:
My unit has recently completed a months-long project that involved extensive tabular analysis of our EGDB data. The analysis involved complex queries, including non-spatial joins.
Ultimately, my coworkers found they couldn't use ArcGIS Pro for the analysis and ended up using Excel as a temporary workaround. Now that the project is complete, they've asked the following question:
Going forward, is ArcGIS Pro the right tool for tabular/join-based analysis of our enterprise geodatabase data?
Based on the issues listed below, we are coming to the conclusion that the answer is, unfortunately, no.
BUG-000171969: Inconsistent Join Operation behaviour when joining tables with definition query in ArcGIS Pro
It's unfortunate that we feel Pro isn't the right tool for tabular analysis of geodatabase data. We've been die-hard ArcGIS Pro users up to this point. But ultimately, users have developed trust issues and have come up against too many walls when doing tabular analysis in Pro.
What tools do experienced practitioners use for complex tabular analysis of enterprise geodatabase data?
The current use case is: analysis of construction project GIS data, with no room for error. And many other use cases.
Some quotes from colleagues from other organizations:
Biology:
It is this kind of thing that has led me to not trust joins in Arc, especially with bigger datasets. When I need to join tables, I typically use FME or R as they seem more stable with large datasets and produce more predicable results.
Public Works:
Pro can be flakey when you have joins. Lots of unexpected behaviour. I use SQL in the backend instead.
Engineering/Consulting:
I've come across similar issues and limitations. I'm researching alternatives as well.
I'm under the impression that there aren't any immediate plans to fix the issues mentioned in the post above (other than #6 and #11, which I think are fixed in 3.x).
If that's the case, then what would be a good alternative to Pro for tabular/join-based analysis of real-time EGDB data?
I would say, as you’ve discovered, the answer is Pro until it isn’t. It’s tricky at times and frustrating because it seems as if when a bug gets addressed in one version, another bug is created.
I use Pro for temporary joins fairly often (usually weekly) but I have to keep the issues in mind and try to pay attention to results that do not make sense, which isn’t always easy to do. When I want to double check, my answer is Excel; and for some tasks I use Excel before I use Pro.
Thanks. This sounds very familiar. It helps to know we’re not the only ones facing these challenges.
Related: Modern Data Analytics in Excel: Using Power Query, Power Pivot, and More for Enhanced Data Analytics by George Mount
@Bud Yes don't do analysis on a joined layer. You persist it to disk first.
When I wore an analyst hat...
1. Open my Project template with a Query layer to my enterprise database.
2. Copy the data locally with a date suffix
3. Do my work.
Details:
Agree about your list, looks just like my own, but has some checkboxes completed beyond 2.9.
- Definition Query. If those are needed you should move to 3.2.
Note the seemingly inner join behavior is not something we addressed. Definition queries are separate from joins. We do not want to alter definition queries to guess at user intent. What we did do is add an informational banner to inform users they may wish to update.
- One-To-First vs One-To-Many, ArcMap vs Pro behavior. Many Pro users came from ArcMap and did not want the default join to change their scripts.
- Trust issue is still outstanding, but we did add a checkbox in 3.3 to rebuild the index.
Work to do for running "analysis" on a joined layer:
Tools work on a record. Analysis tools keep things organized by an objectId. Since the objectId is duplicated with a 1:many join, we need to make a decision on how to process these. Summations are obvious (and work), we can bring those to a single objectId. Intersections and other geometry tools need to have a unique record id to properly process the data; so instead of giving incorrect results we error quickly to avoid a lengthy weight for bad output. Hence why we recommend persisting your data before you do analysis.