Database maintenance: correct useage of the Analyze Datasets tool against Feature Datasets and FC's?

Brownschuh · ‎12-09-2020

Our organization performs database maintenance using a python script against our enterprise geodatabase on a daily basis (well at least during workdays). While the script we use has been in place for a couple of years, a colleague and I decided to re-look at the script to see if we could improve any performance or make any adjustments to make it better. In a nutshell our script works like this:

Block connection to database > disconnect users > reconcile versions > analyze datasets > compress database > rebuild indexes > accept connections to database

The issue we are having at the moment is the proper usage of analyze datasets tool; admittedly the correct usage of this tool has always baffled me. But in any event, right now the we way have it set up is that runs against the following: tables, feature classes that reside at the root level of the database, feature datasets, and all feature classes that reside within each feature dataset.

My main question is this: running the analyze datasets against the feature datasets and the feature classes that reside in the feature datasets separately, is that overkill?

For instance, say we simply run the tool against the feature datasets ... does that also analyze all the feature classes that live within it? Or is the way we have it set up the correct way? The way we have it set up is definitely the slower way. So if in fact what we are doing is overkill, we would like to modify appropriately in the interest of time and cut down on any superfluous tasks.

Robert_LeClair · ‎12-09-2020

I would say Analyze Datasets is a critical database management tool to generate statistics for feature classes, tables and such using your Data Owner connections. In addition, running it with your SDE connection to build statistics on repository tables is again critical for the performance of your eGDB due to the fact that versioning can exact a performance toll on the versioning tables.

The Analyze GP tool works on a FC or FDS basis rather than a *.sde connection file unlike Analyze Datasets (which uses the *.sde file for the Input Database Connection parameter). If it was me, I'd leave this part in else you risk degrading your eGDB performance. One thing I would add to your workflow is Analyze Datasets AFTER the compress. This will help with eGBD optimization.

Brownschuh · ‎12-10-2020

Thank you for the reply. Glad you mentioned switching the order of where we have analyze datasets; after reading some resources over the past week my colleague and I planned on moving the analyze portion after the compress.

And based on your response, it seems as though we are doing what you recommend: running analyze on all feature classes (regardless of where they live), feature datasets, and tables. While this basically covers everything in the database, one thing we've noticed is that this really affects how long the script runs.

However, from what I understand, analyze and even rebuilding indexes should really be occurring on layers that have had edits performed on them. Just for clarification, we have hundreds of layers in our database. Some are edited heavily; many edits on a daily basis. And other layers are never edited at all (basemaps, if you will). Is there a way we could add some logic to our python scripts to run analyze/rebuild indexes on those feature classes/tables that have been edited since the last time the script was run?

While the goal of our nightly task is to be as thorough as possible, I also don't want to waste time and machine resources with a script that takes hours to run.

Robert_LeClair · ‎12-10-2020

You're welcome - happy to assist. With regard to workflow order, I'd recommend keeping Analyze Datasets before AND after the Compress GP tool. This is what I teach in my eGDB classes and seems to be best practice. You are correct that many of the database maintenance GP tools are best used on edited FC's, FDS and non-spatial tables. The non-edited one's really don't change much as you mentioned. There "could" be a way to set up an "if-then-else" logic for running the script on edited tables but I'm not aware of one built. Do you run the script nightly? And do you know how long it takes to complete?

Brownschuh · ‎12-10-2020

In the past the script would generally take about 45 mins to an hour. However, after examining the script a bit closer this week, my colleague and I determined that the loop for the analyze and rebuilding of indexes was probably not working properly. Having made some adjustments to run the aforementioned tasks on all FC's, FDS and non-spatial tables, the runtime was well over 5 hours (rebuilding of the indexes was the one task that took the longest). Now granted this may be because analyzing and rebuilding of indexes was not being performed on some layers; perhaps we will (hopefully) discover a much shorter runtime tonight with our testing.

I would greatly like to use 'if-then-else' logic to determine whether or not a layer has been edited, but like you mentioned, I am not aware of an examples to determine this. But in the meantime I suppose one workaround I could deploy is to use a list of those layers that are known to be edited on a regular basis and have those tools run against only those layers. Not the best solution if we don't include those layers that get edited rarely, but could still work.

Robert_LeClair · ‎12-10-2020

Thank you for the details. I wonder why rebuilding indexes takes the longest. I presume your using the Rebuild Indexes GP tool to accomplish this task. One test to consider is to use the Remove Spatial Index tool followed by the Add Spatial Index Tool to delete/add spatial grid indexes to see if that speeds things up. If you wanted to go a step further, you could test attribute indexes performance by using the Remove Attribute Index tool followed by the Add Attribute Index tool. It does adds more steps in your script but would be interesting to see if this improves or degrades processing time to complete.

Brownschuh · ‎12-11-2020

We ran the script again last night; took 20 more minutes than the previous run the night before. We definitely need to look into a way of optimizing the task.

To address some items that you mentioned, we do indeed use the Rebuild Indexes tool. As for the testing of Remove Spatial Indexes and Add Spatial Index plus Remove Attribute Index and Add Attribute Index, we simply just haven't had time to mess with/test those tools yet.

At the moment, my colleague and I exploring the idea of using editor tracking to determine whether a feature class needs to be analyzed/rebuilding indexes. The only caveat with that is that we have a decent number of feature classes that get edited updated nightly via python script; those don't have editor tracking enabled. And we wouldn't want to overlook those layers. Suppose a quick solution to that problem would be to add them to a list of layers to always run the analyze/rebuild index tools against.

In any event, I'll keep our progress updated on this thread. It may be beneficial for other organizations facing similar issues.

Robert_LeClair · ‎12-11-2020

Scott - do you have any database views created in your eGDB? There's some chatter about DBV causing slowness of the Analyze Datasets GP tool. The analyst and the customer used SDEINTERCEPT to diagnose the issue. 3 DBV's were determined to be processing very slowly. The user deleted the database views, ran the Analyze Dataset and recreated the DBV's after the tool completed. The customer saw a marked increase in performance.

Brownschuh · ‎01-04-2021

Apologies for the delayed response, what with the holidays and other projects going on we got somewhat sidetracked.

To answer your question, we don't have any database views in our eGDB, so we can rule that out. In fact, we actually went ahead and implemented the 'list approach'; basically just have a list of feature classes/datasets/tables hardcoded in the scripts that the analyze/rebuild index tools will run against. While admittedly not the best approach, it did cut back on the nightly processing time. One of the main issues is keeping up with the list to have the script run against; sometimes it's a bit of a guessing game with some of the lesser edited layers. But I'm thinking something like this database summary dashboard may assist us with that task.