Intersect — Get "first instance only" when using SQL spatial relational functions

Bud · ‎03-27-2023

I have polygons and points SDO_Geometry tables in an Oracle 18c 10.7.1 EGDB.

Fake sample data:

CREATE TABLE polygons (objectid NUMBER(4,0), shape SDO_GEOMETRY);
INSERT INTO polygons  (objectid,shape) 
       VALUES (1,SDO_GEOMETRY(2003, 26917, NULL, sdo_elem_info_array(1, 1003, 1), 
       sdo_ordinate_array(668754.6396, 4869279.7913, 668782.1453, 4869276.1585, 668790.9678, 4869344.6631, 668762.4242, 4869346.22, 668754.6396, 4869279.7913)));

CREATE TABLE points (objectid NUMBER(4,0), shape SDO_GEOMETRY);
INSERT INTO  points (objectid,shape) VALUES (1,SDO_GEOMETRY(2001, 26917, sdo_point_type(668768.133,  4869255.3995, NULL), NULL, NULL));
INSERT INTO  points (objectid,shape) VALUES (2,SDO_GEOMETRY(2001, 26917, sdo_point_type(668770.2088, 4869306.259,  NULL), NULL, NULL));
INSERT INTO  points (objectid,shape) VALUES (3,SDO_GEOMETRY(2001, 26917, sdo_point_type(668817.9545, 4869315.0815, NULL), NULL, NULL));
INSERT INTO  points (objectid,shape) VALUES (4,SDO_GEOMETRY(2001, 26917, sdo_point_type(668782.1134, 4869327.1634, NULL), NULL, NULL));

db<>fiddle

And I have a query that selects polygons that intersect at least one point.

  --SDO_Geometry
  SELECT poly_objectid,
         pnt_objectid
    FROM (
          SELECT poly.objectid as poly_objectid,
                 pnt.objectid  as pnt_objectid,
                 row_number() over(partition by poly.objectid order by null) rn
            FROM polygons poly
      CROSS JOIN points pnt
           WHERE sdo_anyinteract(poly.shape, pnt.shape) = 'TRUE'
         )
   WHERE rn = 1

   POLY_OBJECTID PNT_OBJECTID
   ------------- ------------
               1            2

The query only selects one row per polygon using row_number() and WHERE rn = 1 .

Reason:
I want to mimic what Select By Location does in ArcGIS Pro. Select By Location only selects a single row per input polygon feature. It doesn't propagate duplicate polygon rows for each point that intersects a given polygon. Whereas database queries do propagate duplicate polygon rows via the join. That's a typical difference between ArcGIS and full-blown SELECT queries with joins. All of which is expected.

So, like I said, the query above only selects one row per polygon.

The sample data and query above are for SDO_Geometry. But the same thing applies to SDE.ST_Geometry:

   --ST_Geometry
  SELECT poly_objectid,
         pnt_objectid
    FROM (
          SELECT poly.objectid as poly_objectid,
                 pnt.objectid as pnt_objectid,
                 row_number() over(partition by poly.objectid order by null) rn
            FROM polygons poly
      CROSS JOIN points pnt
           WHERE sde.st_intersects (poly.shape, pnt.shape) = 1
         )
   WHERE rn = 1

   POLY_OBJECTID PNT_OBJECTID
   ------------- ------------
               1            2

This is how the queries work:

1. Inner query: Selects multiple polygon rows; one for each intersecting point.

POLY_OBJECTID PNT_OBJECTID
------------- ------------
            1            2
            1            4

2. Outer query: Only keep one row per polygon. The tie-breaker is arbitrary; it doesn't matter what duplicate row is kept (see WHERE rn = 1 and order by null ). Related post here.

POLY_OBJECTID PNT_OBJECTID
------------- ------------
            1            2

Question:

While the above technique works, it's occurred to me that it might not be the most efficient way to do it. Why initially select multiple intersection matches, when the duplicates will be ignored anyway?

Is there a way to only get the first instance of an intersection, and then stop looking and move on to the next polygon?

Bud · ‎03-27-2023

For example, could EXISTS be used instead of a cross join/cartesian product?

IN vs. EXISTS
The IN clause scans all records fetched from the given subquery column, whereas EXISTS clause evaluates true or false, and the SQL engine quits the scanning process as soon as it has found a match.

To be honest, I don't know how EXISTS could be used in this case. And if it can be used, I'm skeptical that a subquery with EXISTS would perform better than a join, since joins are usually considered to be better performing than subqueries.

Any insight would be appreciated.

Bud · ‎03-29-2023

Related: Select polygons that intersect points (improve performance)