Why are spatial SQL queries so much slower than Select By Location?

1294
4
03-31-2023 09:18 AM
Bud
by
Notable Contributor

ArcGIS Pro 2.6.8; Oracle 18c 10.7.1 EGDB

In the Spatial Definition Query ArcGIS Pro Idea, @JoshuaBixby said:

While I agree that such functionality would be handy, it does come with some risk. Spatial predicates are much more expensive than other predicates in DBMSs, and that is true when spatial indexes exist and are well tuned. If data sets either don't have spatial indexes, don't have them well-tuned, or the DBMS chooses not to use them; well, the performance can be downright miserable.

It's a common theme I've heard from multiple experts using a variety of databases. But I haven't found an explanation about why spatial predicates are so inefficient.

In a related post, I mentioned I did a test that demonstrated the issue: (a similar test here)

  1. ArcGIS Pro Select By Location tool (processes in RAM): 1 second
  2. SDO_GEOMETRY database query: 10 seconds
  3. SDE.ST_GEOMTERY database query: 50 seconds

That test was pretty clean. The tables were newly created (unversioned) using ArcGIS Pro. So the spatial indexes were created properly and I think the statistics are up-to-date too.

Question:

Why are spatial database queries so much slower than Select By Location in ArcGIS Pro?

As mentioned, spatial queries are relatively slow even when the database is set up correctly: when spatial indexes are being utilized in the query (as shown in the explain plan) and the table statistics are up to date.

I understand that ArcGIS Pro Select By Location likely has less overhead/bottlenecks to deal with. The application can do the analysis directly on the data in RAM -- all in one place. Whereas the database query might have more layers of complexity in terms of network mechanisms, security, etc. But it still doesn't quite add up to me. It still seems like databases are just bad at spatial queries.

In other words:
How can it be that my mediocre office computer is outperforming our enterprise database server? (in comparison, the DB is lightning-fast for other complex, large, non-spatial queries)
Is the Select By Location math really that much more efficient than database math? You'd think they'd be similar.

4 Replies
MattWilkie1
Occasional Contributor II

Yes please! This topic is worth exploring.

I'm also interested in whether this spatial queries have the same performance profile, in general, across DB vendor types. For example are PostGIS and SQL Spatial using their native spatial types equally slower than in-client location queries? Assuming they are similar, what exactly is happening to slow things down?

Bud
by
Notable Contributor
Bud
by
Notable Contributor

For anyone who's interested, this is the fastest spatial query for Oracle that I've found:

Query Optimization: Select polygons that intersect points

JoshuaBixby
MVP Esteemed Contributor

This information shared here so far isn't really sufficient to dive into a deeper discussion of what may be going on with your systems and these tests.  There are so many details that matter and aren't provided yet.  For example, what exactly is the workspace for the Pro test?  Are the datasets in a memory workspace?  How big are the datasets and what spatial types are involved.  Is the data in the enterprise databases part of a geodatabase?  What about versioning of the data if part of an enterprise geodatabase?  If versioned, what does the state tree look like?  What exactly is the spatial SQL you are using, and how are you executing it against the database?

Overall, spatial predicates are expensive on the database side compared to other operators and predicates because they are much more complex mathematically.  Do get an example of the complexity, check out jts/modules/core/src/main/java/org/locationtech/jts at master · locationtech/jts · GitHub to see the algorithms the JTS project uses to implement spatial operators.