Select to view content in your preferred language

Non-versioned data slowing way down

2032
16
Jump to solution
04-21-2014 06:21 AM
MattLane
Frequent Contributor
I have two unversioned featureclasses in a dataset that our organization adds and deletes up to 10s of thousands of records daily. One is over 3million points and the other is over 500k lines. Each have an organizational ID field with an index and a GlobalID field with an index. We will be versioning and replicating these in the near future. They are getting slower and slower over time, to the point where just deleting a couple records could take 8 minutes and getting a table feature count using the toolbox tool takes 3 minutes. The processing that used to take 1 to 2 hours now takes over 12.

Details:

  • SQL Server 2008 R2

  • SDE Release 101010

  • Desktop 10.1 sp1

  • Each spatial index is built by ESRI (16 cells per object, Medium on all levels)

  • Data is not in SDE schema, but a separate one

  • There isn't a lot of other data in this database, although there is a single versioned featureclass in a different dataset that has topology rules applied to it.


What I've tried:

  • I rebuild and reorganize all indexes in this database as needed according to Microsoft documentation

  • Checked the Bounding Boxes on the spatial indexes

  • Rebuilt statistics (esri tools)

  • Compressed

  • Shrunk database (our DBA does this periodically)

  • Restarted server (memory doesn't seem to be the issue)

0 Kudos
1 Solution

Accepted Solutions
George_Thompson
Esri Notable Contributor
Matt,

I wonder if the geoprocessing history for those two feature classes is getting very large. Every time that you perform a GP process it adds to the metadata. The metadata is stored in a geodatabase system table.

You may want to look into this add-in: Delete Geoprocessing History from Metadata

Got the link from this post: http://forums.arcgis.com/threads/43399-metadata-geoprocessing-history

I would only do this after you have a verified back of the geodatabase.

Try it on the two feature classes in question

-George
--- George T.

View solution in original post

0 Kudos
16 Replies
MattLane
Frequent Contributor
This has happened in the past and the only way I could find to fix it was to export and recreate the two layers completely. That worked well for about 4 months. This option won't be desirable once we start replicating daily to other agencies.
0 Kudos
WilliamCraft
MVP Regular Contributor
How are those features being added and deleted?  Is this being done inside an edit session or outside an edit session?  If you're deleting and adding tens of thousands of features on a daily basis, you might want to consider re-building the spatial indexes on those feature classes accordingly each time the number of records changes drastically.  The need for doing this is rare, but it can happen depending on how the features are being added/deleted.  I also believe this is more likely to happen when the changes are made in an edit session.  As an example, a spatial index calculated for 50,000 lines might look much different than a spatial index for 140,000 lines in terms of grid size.  Check out these articles to understand a bit more about spatial indexes:

http://resources.arcgis.com/en/help/main/10.1/index.html#//003n0000001r000000
http://resources.arcgis.com/en/help/main/10.1/index.html#//003n0000001s000000

So, your workflow might need to change to something like this at a high level:

  1. Start edit session.

  2. Delete 50,000 lines.

  3. Recalculate the spatial index.

  4. After several hours, add 80,000 lines.

  5. Recalculate the spatial index. 

0 Kudos
MattLane
Frequent Contributor
Thanks for the response.

The records are being deleted by the delete tool and added by the append tool outside of an edit session (python scripted process).

Like I mentioned, I'm monitoring the indexes and when a spatial index's fragmentation is > 30% I'm rebuilding. If the fragmentation is > 5% and < 30% I'm reorganizing.

I've tried using ArcCatalog to Recalculate as well as delete and recreate. It seems to do what the SQL Server functions do and neither improve access and editing times.
0 Kudos
VinceAngelo
Esri Esteemed Contributor
A couple of things to consider: 

  • GUID columns, by their very nature, are always randomly distributed;

  • this makes for a badly fragmented index, and poor performance if you query
    on them.
  • If the feature data has been gathered over time it is likely badly spatially

  • fragmented.  One of the hallmarks of spatial fragmentation is poor spatial
    query performance, not matter how artfully the spatial index has been tuned.
    There are many possible ways to defragment spatially, but the easiest is
    to do an ORDER BY query on some attribute that imposes spatial order
    (county code, etc).  More exotic queries are available if you can use the
    ST functions to group features into bands in X (or Y), then order by Y (or X).
    Note that spatial defragmentation will hurt join performance on other
    query constraints (like time), so that may need to be mitigated.
  • It's unlikely that the default spatial index tuning parameters are best for

  • these datasets.  You should spend some time getting the best performance
    you can from the defragmented data, even to the point of going through
    all tuning combinations if necessary.
- V
0 Kudos
MattLane
Frequent Contributor


  • If the feature data has been gathered over time it is likely badly spatially

  • fragmented.  One of the hallmarks of spatial fragmentation is poor spatial
    query performance, not matter how artfully the spatial index has been tuned.
    There are many possible ways to defragment spatially, but the easiest is
    to do an ORDER BY query on some attribute that imposes spatial order
    (county code, etc).  More exotic queries are available if you can use the
    ST functions to group features into bands in X (or Y), then order by Y (or X).
    Note that spatial defragmentation will hurt join performance on other
    query constraints (like time), so that may need to be mitigated.


Thanks Vince, I'd like to look into how I can reorder the data without reloading it. How do I go about using an ORDER BY like you mentioned to defragment spatially? Are you referring to enforcing order during spatial index creation or just on queries to the db?
0 Kudos
VinceAngelo
Esri Esteemed Contributor
You can't defragment without reloading -- it's the load order which makes it fragmented.

One possible methodology is to create a backup table with
CREATE TABLE mytab_backup AS
SELECT * FROM mytab_backup


Then TRUNCATE the original, and execute  something like
INSERT INTO mytab
SELECT * FROM mytab_backup 
ORDER BY county_fips


- V
0 Kudos
MattLane
Frequent Contributor
I'm still unable to get the two layers back to where they were a few weeks ago. They continue to get worse by the week 😞

I've followed your suggestion, ordering the geometries by the districts they are in, proved by using the TOP keyword in SQL Server. After rebuilding all the indexes the issue persists.

The display of the data in ArcMap is actually really fast. But adding it to ArcMap takes minutes. Just selecting it in ArcCatalog freezes the application for minutes. Other layers in the database, even fairly large ones, are instantaneous. This all makes me think there is some cruft in the SDE tables, or at least something wrong in them.

I've tried copy/pasting them into a new layer or importing into a new layer (even without the GlobalID field) and the new layers are just as slow.
0 Kudos
VinceAngelo
Esri Esteemed Contributor
Are you adding these layers as geodatabase feature classes or as Query Layers?

How many total features exist in the table?

- V
0 Kudos
MattLane
Frequent Contributor
I have two unversioned featureclasses in a dataset that our organization adds and deletes up to 10s of thousands of records daily. One is over 3million points and the other is over 500k lines

These were created in ESRI software and are edited with python tools outside of an edit session. They are not versioned (yet) but do have GlobalIDs. They will be replicated. They exist within a dataset, but exporting them out of the dataset into the main directory does not help. They were not like this 3 or 4 weeks ago, and there haven't been significant total record increases. Thanks for following up with me.
0 Kudos