Performance on SDE 10.1 vs 9.3.1 (Oracle)

SimonLynch · ‎05-27-2013

Hello, Our GIS workgroup is testing a new spatial database which utilises SDE 10.1. Our GIS workgroup has found issues with the performance of redisplay of geospatial data on the test database â?? thus far. We have defaults install. Our aim is for at least similar display performance with the new database. Unfortunately, we have found redisplay times for spatial datasets are up to 2-3 times slower under the new database setup when compared to our â??oldâ?? (SDE 9.3.1) setup.

Environment:
ArcGIS 10.1 (build 3035) on Microsoft Win XP v2002 SP3. 3.3Ghz with 2 G RAM.
Oracle 11r2 on Sun Solaris (Geodatabase enabled SDE schema v 10.1)
Oracle 11r2 on Sun Solaris with SDE 9.3.1 (Note: This â??oldâ?? environment forms the â??baselineâ?? for performance testing against the new system/environment).

The performance tests:
â?¢ A bookmarked extent sourced from a PC (specs above) with ArcMAP 10.1 installed.
â?¢ Polygon datatypes ~400,000 spatial records:
o ST_Geometry (10.1)
o Low Resolution ESRI binary (9.3.1)

Find below a copy of the two queries (using OEM) as received at the database:

Query on Oracle-SDE 10.1 database:
SELECT 1 SHAPE, VEGGROUP, TEST_TV2_ST2.OBJECTID, TEST_TV2_ST2.SHAPE.points,TEST_TV2_ST2.SHAPE.numpts,TEST_TV2_ST2.SHAPE.entity,TEST_TV2_ST2.SHAPE.minx,TEST_TV2_ST2.SHAPE.miny,TEST_TV2_ST2.SHAPE.maxx,TEST_TV2_ST2.SHAPE.maxy,TEST_TV2_ST2.rowid
FROM BASE.TEST_TV2_ST2 TEST_TV2_ST2
WHERE SDE.ST_EnvIntersects(TEST_TV2_ST2.SHAPE,:1,:2,:3,:4) = 1

Query on Oracle-SDE 9.3.1 database:
SELECT /*+ LEADING INDEX(S_ S945_IX1) INDEX(SHAPE F945_UK1) INDEX(TASVEG_VEGETATION_ONLY A945_IX1) */ SHAPE, VEGGROUP ,S_.eminx,S_.eminy,S_.emaxx,S_.emaxy ,SHAPE.fid,SHAPE.numofpts,SHAPE.entity,SHAPE.points,SHAPE.rowid
FROM
(SELECT /*+ INDEX(SP_ S945_IX1) */ DISTINCT sp_fid, eminx, eminy, emaxx, emaxy
FROM SIPS_DBA.S945 SP_
WHERE ((SP_.gx >= :1 AND SP_.gx <= :2 AND SP_.gy >= :3 AND SP_.gy <= :4 ) OR (SP_.gx >= :5 AND SP_.gx <= :6 AND SP_.gy >= :7 AND SP_.gy <= :8)) AND SP_.eminx <= :9 AND SP_.eminy <= :10 AND SP_.emaxx >= :11 AND SP_.emaxy >= :12) S_ , SIPS_DBA.TASVEG_VEGETATION_ONLY , SIPS_DBA.F945 SHAPE
WHERE S_.sp_fid = SHAPE.fid AND S_.sp_fid = SIPS_DBA.TASVEG_VEGETATION_ONLY.SHAPE

Findings:
Query on SDE10.1:
â?¢ takes 70 seconds to redisplay in ArcMAP 10.1.
â?¢ Analysing the Explain Plan of the 10.1 SDE query shows it has a very high cost.
â?¢ took 58 seconds to process on the Server.
â?¢ utilises the layer geometry to perform the sub-selection.
Query on SDE9.3.1:
â?¢ takes 30 seconds to redisplay in ArcMAP 10.1
â?¢ Analysing the Explain Plan of the 9.3.1 SDE query shows it has a low cost.
â?¢ takes 4 seconds to process on the Server.
â?¢ utilises Oracle Optimiser Hints.
â?¢ uses the layers Spatial index.

We have some questions and concerns:
1. Why does the query hit the database with different syntax - when the Client application is the same (ArcGIS 10.1), as is the (bookmarked) extent?
a. Why are Oracle optimiser hints not being used in SDE 10.1?
b. Why is the query in SDE 10.1 utilising feature geometry layer instead of the spatial index to perform the query? (Note spatial index is available and built)

Looking forward to your insights on this matter.

Regards, Simon

VinceAngelo · ‎05-27-2013

Different geometry storage is queried differently. ST_GEOMETRY uses LOB, while
SDEGEOMETRY uses LONG RAW (which is basically your 2.5x difference right there).
Spatial index preference would be determined by information you haven't provided
(the size of the search window, the envelopes of the layers, and the number of
features returned by the query).

- V

SimonLynch · ‎05-29-2013

Different geometry storage is queried differently. ST_GEOMETRY uses LOB, while
SDEGEOMETRY uses LONG RAW (which is basically your 2.5x difference right there).

Hello, Thanks for you're response... it has prompted discussion... and continued testing.

You mention SDEGEOMETRY (ESRI Binary) uses Long Raw and therein lies the 2.5 times performance difference to ST_GEOMETRY. I have since tested another datatype - 'SDELOB' (also LOB): I found very 'similar' redisplay performance to the (depricated?) 'ESRI binary' (SDEGEOMETRY) format mentioned above (ie Long Raw = LOB). I am a little confused as to why such a significant redisplay difference might exist bw LOB formats.

Spatial index preference would be determined by information you haven't provided
(the size of the search window, the envelopes of the layers, and the number of
features returned by the query).

Re: Spatial Index Preference of ST_GEOMETRY: The search window size is exactly the same for all queries (its bookmarked). The query window (envelope?) is quite 'large' and returns 100,000 records (out of a possible 400,000). Incidently, the query at larger scales (ie zoomed in closer) DOES use the Spatial Index and performance is good. I am somewhat confused as to why SDE/ArcMAP v10.1 would NOT use the Spatial Index to search at smaller (zoomed out) scales? Makes me wonder whether the use of the Spatial Index configurable?

Thanks for your input.

VinceAngelo · ‎05-29-2013

s/SDEGEOMETRY/SDEBINARY/g (Doh!)

One quarter of the features is probably past the point where a spatial index
would be appropriate. The exact ratio may have been changed over time.
It's not "configurable", but if you toy with the layer envelope extent, you
might see a difference. You can also toy with the ATTRIBUTE_FIRST
query flag with scale-dependent layers to force full table scan simple
shape filtering.

I take steps to avoid ever needing to render 100k features from any table
in any one query, ever, so this benchmark isn't exactly unbiased.

- V

SimonLynch · ‎06-04-2013

Thanks Vince, our testing is continuing.

One quarter of the features is probably past the point where a spatial index
would be appropriate. The exact ratio may have been changed over time.

Re above: I am still a little confused as to what i expect would be going on with these spatial queries. As I would have thought the most efficient way to query a large polygon geodatabase is (almost always?) using a spatial index? Perhaps points and lines may not perform as well with a SI (intuitively)? Depends on many factors i suppose? But, it should be the preferrred way for polygons - almost always. But possibly this is only my lack of understanding shining through!

Incidentally, I reran the same spatial queries from ArcGIS 10.1 on the following datatypes :
* SDO geometry (SDE 10.1) uses its Spatial Index query - 65 secs to refresh display (~100k records).
* ST_Geometry (SDE 10.1) does not utilise the Spatial Index query - 90 secs to refresh display (~100k records)
* SDELOB (SDE 10.1) uses its Spatial Index query - 35 secs to refresh display (~60k records)
* SDE Binary (SDE 9.3.1) uses its Spatial Index query - 36 secs to refresh display (~60k records)

This runs counter to what i would expect -> ST-Geometry should be faster/est should it not?
Any ideas on what is going on or what i might have missed? -> Like i do wonder if this *new* database is fully 'configured'.
Simon

VinceAngelo · ‎06-04-2013

Re above: I am still a little confused as to what i expect would be going on with these spatial queries. As I would have thought the most efficient way to query a large polygon geodatabase is (almost always?) using a spatial index?

Why would you expect that? Index efficacy is an established science. There's nothing
special about polygons which would suddenly improve the cost/benefit relationship of
index I/O to full table scan I/O; in fact, I would expect polygons to be *more* expensive
than simpler features.

You've left out too many details to begin to evaluate why this one query performs in the
manner it does. As stated earlier, I try to avoid any query which returns a significant
fraction of a large table, so most of my efforts are spent optimizing for small random
searches.

- V

SimonLynch · ‎06-05-2013

Why would you expect that? Index efficacy is an established science. There's nothing
special about polygons which would suddenly improve the cost/benefit relationship of
index I/O to full table scan I/O; in fact, I would expect polygons to be *more* expensive
than simpler features.

You misunderstand a lot of what i have stated. I am not questioning the vurtues of Spatial Indexing? If you read my posting more closely I said "I would have thought the most efficient way to query a large polygon geodatabase is (almost always?) using a spatial index? Perhaps points and lines may not perform as well with a SI "! I am saying polygons are slower than SI. ie I *agree* with you.

I try to avoid any query which returns a significant
fraction of a large table, so most of my efforts are spent optimizing for small random
searches.

Our Organisation utilises a lot of large GIS data. Our business requires results of large queries displayed in GIS/applications. Optimising for small window searches is not required (performance is acceptable) - The performance of Large areal window searches is where the problem is. Hence this posting.

You've left out too many details to begin to evaluate why this one query performs in the
manner it does.

This one query is an example of the systematic poor and unsatisfactory performance we get with the ST_Geometry data type. This runs contrary to what we have read in these forums and elsewhere!?
Thanks for your attempts to assist. Has anyone else experienced poor spatial query of an SDE 10.1 db for larger queries with ST_Geometry? If so - i would appreciate hearing from you.
Regards, Simon

VinceAngelo · ‎06-05-2013

I *am* questioning the virtue of spatial indexes, especially with respect to large
result sets. It is not possible for an index to "almost always" be the most efficient
way to access data. The principles involved are the same for why an index on age
then gender would be more efficient than gender then age, but when the query is
"locate males aged 20 to 60" you'll be in for a wait.

Any spatial query returning 25% of 400k rows is likely to process over a million
index tile hits, most of which are redundant. And once the records are identified,
they'll still need to be extracted from the table, whose pages may be cluttered
with rows that don't match, or don't match yet. This is where the full impact of
spatial fragmentation rears its ugly head -- index query IO is compounded by
multiple reads of the same blocks which fail to cache.

The thing is, there isn't much you can do to change the horrific I/O cost. You
can reduce the storage precision, which can lop an order of magnitude off the
storage. You can spatially defragment the data. And you can increase the
grid size to reduce duplicates (at the cost of false positives). After that, you're
down to pinning the table in RAM (or using the standard techniques to avoid
large rowsets).

- V

SimonLynch · ‎06-06-2013

Our investigations back up some of what you say - V.

We have tested query formulation and performance on the database:
1) The 'ESRI binary' (SDE 9.3.1) datatype performs the query using Optimiser Hints and the Spatial Index (SI) - It has negligable 'performance cost to the db'.
2) The SDELOB (SDE 10.1) query too is very efficient/fast in terms of I/O. It formulates queries using Oracle Optimiser Hints incorporating the SI - very fast! Interesting! However,
3) the ST_Geometry datatype query uses 'ST_EnvIntersects' in the query and as a result must presumably do a full table scan to return results. It does *not* use any Oracle Optimiser Hints in the query formulation and has a *huge* (measured) database-cost to run.

All these queries are on the same geographic extent. Our tests reveal vast differences with the level of I/O for different data types/versions of SDE. Queries formulated using optimiser hints and SI are the most efficient *by far*. Backing up what Vince mentions in his previous post about query formulation.

I think (???) the problem i have may be related to this KB 38019. I note this KB artical does not seem to apply to SDE 10.1 though? But appears similar.

Our investigations seem to indicate the ST_Geometry is doing a full table scan to resolve "ST_EnvIntersects". The database-cost of this query is much, much greater than that of formulating the query using the using Optimser Hints and SI (for large returns).

ESRI - Why does ST_Geometry not invoke the Oracle Optimiser Hints in query formulation (as do earlier/other ESRI data types)?

Regards, Simon

VinceAngelo · ‎06-07-2013

ST_EnvIntersects normally *does* use an index. The way to ask Esri questions
is to start a Tech Support incident.

- V