Proper use of Rebuild Indexes and Analyze Datasets tools

17470
20
07-22-2015 09:14 AM
BlakeTerhune
MVP Regular Contributor

I'm trying to build some SDE maintenance Python scripts and found this help document very informative.

ArcGIS Help 10.2 - Using Python scripting to batch reconcile and post versions

However, I noticed that the detailed section on Rebuild Indexes and Analyze Datasets does not uses the NO_SYSTEM option for include_system, but the full example code at the bottom does. Could someone clarify exactly what the include_system option does and which is the recommended way to use it? Maybe a Python​ expert can help?

20 Replies
WesMiller
Regular Contributor III

I don't know if this helps or not

>>> import arcpy

>>> help(arcpy.RebuildIndexes_management)

Help on function RebuildIndexes in module arcpy.management:

RebuildIndexes(input_database=None, include_system=None, in_datasets=None, delta_only=None)

    RebuildIndexes_management(input_database, include_system, {in_datasets;in_datasets...}, {delta_only})

   

       Updates indexes of datasets and system tables stored in an enterprise

       geodatabase. This tool is used in enterprise geodatabases to rebuild existing

       attribute or spatial indexes. Out-of-date indexes can lead to poor geodatabase

       performance.

   

    INPUTS:

     input_database (Workspace):

         The enterprise database that contains the data to be updated.

     include_system (Boolean):

         Indicates whether indexes will be rebuilt on the states and state lineages

         tables.You must be the geodatabase administrator for this option to be executed

         successfully.This option only applies to geodatabases. If the input workspace is

         a database

         this option will be ignored.

   

         * NO_SYSTEM— Indexes will not be rebuilt on the states and state lineages table.

         This is the default.

   

         * SYSTEM— Indexes will be rebuilt on the states and state lineages tables.

     in_datasets {String}:

         Names of the datasets that will have their indexes rebuilt. Dataset names use

         paths relative to the input workspace; full paths are not accepted as input.

     delta_only {Boolean}:

         Indicates how the indexes will be rebuilt on the selected datasets. This option

         has no effect if input_datasets is empty.This option only applies to

         geodatabases. If the input workspace is a database

         this option will be ignored.

   

         * ALL—Indexes will be rebuilt on all indexes for the selected datasets. This

         includes spatial indexes as well as user-created attribute indexes and any

         geodatabase-maintained indexes for the dataset.

   

         * ONLY_DELTAS—Indexes will only be rebuilt for the delta tables of the selected

         datasets. This option can be used for cases where the business tables for the

         selected datasets are not updated often and there are a high volume of edits in

         the delta tables. This is the default.

BlakeTerhune
MVP Regular Contributor

I appreciate the reply, Wes, but I think that's all the same info I've been able to find in the online Help documentation on these tools. I'm really looking for advice from the community on what settings you use with this tools as part of regularly scheduled (nightly) SDE maintenance with a compress operation.

0 Kudos
BlakeTerhune
MVP Regular Contributor

Here's what I've come up with so far for Rebuild Indexes and Analyze Datasets.

  1. Looks at SDE.Table_Registry to get all schema owners and compiles it into a Python list.
  2. Creates a temporary directory in which to create temporary connection files as each schema owner. If the owner is SDE, then it uses an existing connection and sets the include_system parameter to gathered on the states and state lineages tables (which can only be done as an admin. Otherwise the other owners use the NO_SYSTEM option.
  3. Once a connection file is made, it gathers the name of all visible tables for that owner into a Python list.
  4. Then it runs Rebuild Indexes and Analyze Datasets using the visible tables list as input for the in_datasets parameter.

import arcpy
from contextlib import contextmanager
import os
import shutil
import tempfile

def main():
    try:
        sdeInstance = "GTEST"

        # Get all schema owners from SDE table registry
        sde_sdeconn = r"C:\GISConnections\SDE@{}.sde".format(sdeInstance)
        in_table = os.path.join(sde_sdeconn, "SDE.TABLE_REGISTRY")
        sql_prefix = "DISTINCT OWNER"
        schemaOwners = [
            row[0] for row in arcpy.da.SearchCursor(
                in_table,
                "OWNER",
                sql_clause=(sql_prefix, None)
            )
        ]

        # Rebuild indexes and update statistics for all objects under each schema owner
        with makeTempDir() as temp_dir:
            for owner in sorted(schemaOwners):
                print "\n---\n"
                ## Set passwords, options, and workspace with owner
                if owner.upper() == "SDE":
                    ## Must be geodatabase administrator to gather statistics on
                    ## the states and state lineages tables (include_system option).
                    include_system = "SYSTEM"
                    arcpy.env.workspace = sde_sdeconn
                else:
                    ## Create connection file with owner
                    arcpy.CreateDatabaseConnection_management(
                        temp_dir,  ## out_folder_path
                        owner+".sde",  ## out_name
                        "ORACLE",  ## database_platform
                        sdeInstance+".WORLD",  ## instance
                        "DATABASE_AUTH",  ## account_authentication
                        owner,  ## username
                        ownerPass,  ## password
                    )
                    print arcpy.GetMessages()
                    include_system = "NO_SYSTEM"
                    arcpy.env.workspace = os.path.join(temp_dir, owner+".sde")

                # Get a list of all the objects owner has access to
                workspace = arcpy.env.workspace
                tables = arcpy.ListTables(owner+'.*')
                tables += arcpy.ListFeatureClasses(owner+'.*')
                tables += arcpy.ListRasters(owner+'.*')
                for dataset in arcpy.ListDatasets(owner+'.*'):
                    tables += arcpy.ListFeatureClasses(feature_dataset=dataset)

                try:  # Rebuild Indexes
                    arcpy.RebuildIndexes_management(
                        workspace,  ## input_database
                        include_system,
                        tables,
                        "ALL"
                    )
                except Exception as err:
                    ## Print exception error only if it's not from Esri
                    if not arcpy.GetMessages(2):
                        print err
                finally:
                    if arcpy.GetMessages():
                        print arcpy.GetMessages()

                try:  # Analyze Datasets
                    arcpy.AnalyzeDatasets_management(
                        workspace,  ## input_database
                        include_system,
                        tables,
                        "ANALYZE_BASE",
                        "ANALYZE_DELTA",
                        "ANALYZE_ARCHIVE"
                    )
                except Exception as err:
                    ## Print exception error only if it's not from Esri
                    if not arcpy.GetMessages(2):
                        print err
                finally:
                    ## Print the full details of the last Esri message
                    if arcpy.GetMessages():
                        print arcpy.GetMessages()
                arcpy.ClearWorkspaceCache_management()
    except Exception as err:
        if err.message in arcpy.GetMessages(2):
            print arcpy.GetMessages()
        else:
            print unicode(err).encode("utf-8")
    finally:
        # Cleanup
        arcpy.ClearWorkspaceCache_management()


@contextmanager
def makeTempDir():
    temp_dir = tempfile.mkdtemp()
    try:
        yield temp_dir
    finally:
        shutil.rmtree(temp_dir)


if __name__ == '__main__':
    main()

It seems to be doing its job, but I don't see any entries in the SDE.SDE_LAYER_STATS table; should I? Eventually this will get rolled into our existing SDE maintenance script that disconnects users then reconciles and compresses.

BenVan_Kesteren1
Occasional Contributor III

HI Blake,

thanks very much for sharing your code, I have got a somewhat similar system automated in out system (no where near as clean cut as yours, mine is very 'backyard' code).

I would like to ask an additional question, I see you analyze Datasets on all your tables, can someone claify if this is necessary? I understand this is handy right after performing a compress on the database as it does something (i am unsure what) to all the system tables, but I am unsure what it does with regards to standard feature classes? So i'm not sure if it is necessary for me to run every week on each of my feature classes within my SDE Database. Can you comment on why you run it on your entire system?

One other question i just thought of, I notice you run your rebuild index BEFORE you run the Analyze Datasets, is there a reason you have decided to run in this order? I cant find it now, but believe in the past I read that they should be run in the other order, directly after compressing the database. Im hoping someone can set me straight with that one too...

0 Kudos
BlakeTerhune
MVP Regular Contributor
I see you analyze Datasets on all your tables, can someone claify if this is necessary?

I talked to Esri support to try and confirm I was on the right track with my process here. I explained very clearly I'm doing the rebuild and analyze on every table and feature class. They confirmed that I was doing things correctly but noted that it could take a long time to do everything. However, they did't say I should not do it.

I notice you run your rebuild index BEFORE you run the Analyze Datasets, is there a reason you have decided to run in this order?

I think I did it in that order only because that's the way the original documentation did it. Somewhere in the documentation it also mentioned possibly running one (or both of these) before the compress as well as after, so it's a little confusing in which scenarios you would do something like that.

EDIT:

I just stumbled across this post that recommends running analyze datasets before reconcile/post/compress.

While this script has been running successfully for a while now, I'm noticing that it takes about 12 minutes longer to complete every night. This has been a steady increase and it now takes hours to finish where it originally took less than 30 minutes. My plan is to adjust the script so it will only rebuild and analyze the tables that were modified. Here is a sample query (Oracle) I will adjust and simplify to find the modified tables.

SELECT TR.OWNER || '.' || TR.TABLE_NAME AS LAYER_NAME,
    MM.REGISTRATION_ID,
    COUNT(distinct(S.STATE_ID)) AS AFFECTED_STATES,
    COUNT((SL.LINEAGE_ID)) AS LINEAGE_LENGTH
FROM SDE.STATES S
    INNER JOIN SDE.MVTABLES_MODIFIED MM
        ON S.STATE_ID = MM.STATE_ID
    JOIN SDE.TABLE_REGISTRY TR
        ON MM.REGISTRATION_ID = TR.REGISTRATION_ID
    JOIN SDE.STATE_LINEAGES SL
        ON S.LINEAGE_NAME = SL.LINEAGE_NAME
GROUP BY TR.OWNER, TR.TABLE_NAME, MM.REGISTRATION_ID
ORDER BY LAYER_NAME;
BlakeTerhune
MVP Regular Contributor

Here's the query I'm using to identify which tables have edits (and who made the edits).

SELECT DISTINCT
    tr.owner as TABLE_OWNER, tr.table_name,
    COUNT (mm.registration_id) AS STATE_COUNT,
    s.owner AS STATE_OWNER
FROM sde.mvtables_modified mm
    LEFT OUTER JOIN  sde.states s
        ON mm.state_id = s.state_id
    LEFT OUTER JOIN sde.table_registry tr
        ON mm.registration_id = tr.registration_id
GROUP BY tr.owner, tr.table_name, s.owner
ORDER BY tr.owner, tr.table_name, s.owner

Turns out the slowness I was experiencing with rebuild indexes and analyze datasets was from excessive geoprocessing history logged in the geodatabase metadata. Using arcpy.SetLogHistory(False) will prevent the geoprocessing history generated in the script to be logged. Here's the script I used to clean out the geoprocessing history from the geodatabase. Once the geoprocessing history was cleaned up, our maintenance script ran very quickly again.

import arcpy
from contextlib import contextmanager
import os
import shutil
import tempfile

def main():
    sde_sdeconn = r"C:\GISConnections\SDE@GPRO.sde"
    xslt_path = r"C:\arcgis\Desktop10.2\Metadata\Stylesheets\gpTools\remove geoprocessing history.xslt"
    try:
        arcpy.ClearWorkspaceCache_management()
        if os.path.exists(xslt_path):
            with makeTempDir() as temp_dir:
                ## Get unique temporary XML file name
                with tempfile.NamedTemporaryFile(
                    dir=temp_dir, suffix=".xml", delete=True
                )as temporary_file:
                    temporary_file_name = temporary_file.name
                ## Export metadata without geoprocessing history
                arcpy.XSLTransform_conversion(
                    sde_sdeconn,  ## source
                    xslt_path,  ## xslt
                    temporary_file_name  ## output
                )
                print arcpy.GetMessages()
                print ""
                ## Import metadata that has geoprocessing history removed
                arcpy.MetadataImporter_conversion(
                    temporary_file_name,  ## source
                    sde_sdeconn  ## target
                )
                print arcpy.GetMessages()
        else:
            raise IOError("File not found.\n{}".format(xslt_path))
    except Exception as err:
        if err.message in arcpy.GetMessages(2):
            ## All of the messages returned by the last ArcPy tool
            displayErr = arcpy.GetMessages()
        else:
            ## Non-ArcPy error message
            displayErr = unicode(err).encode("utf-8")
        print displayErr
    finally:
        # Cleanup
        arcpy.ClearWorkspaceCache_management()


@contextmanager
def makeTempDir():
    """Creates a temporary folder and returns the full path name.
    Use in with statement to delete the folder and all contents on exit.
    Requires contextlib contextmanager, shutil, and tempfile modules.
    """
    temp_dir = tempfile.mkdtemp()
    try:
        yield temp_dir
    finally:
        shutil.rmtree(temp_dir)


if __name__ == '__main__':
    main()

Geoprocessing history is also logged in the metadata of feature datasets and feature classes, and tables. You can modify this script to walk the geodatabase and remove the geoprocessing history for each of those things if you like. Here is some further reading from Esri Support:

Editing metadata for many ArcGIS items

http://resources.arcgis.com/en/help/main/10.1/index.html#//003t00000026000000

XSLT Transformation (Conversion)

http://resources.arcgis.com/en/help/main/10.1/index.html#//001200000017000000

Automate the process of deleting geoprocessing history

http://support.esri.com/en/knowledgebase/techarticles/detail/41026

AndresCastillo
MVP Regular Contributor

Hi Blake,

how come you didn't use the script mentioned here?

https://support.esri.com/en/technical-article/000011751

BlakeTerhune
MVP Regular Contributor

The Esri support tech I spoke with did not show me this technical article. This looks like a better solution, thank you for posting.

EDIT:

I just tested the isNotSpatialView() function for Oracle and it isn't reliable. At least in our 10.5.1 enterprise geodatabase on 12c. Almost none of our spatial views are actually registered with the database in the user_views table so it doesn't validates correctly.

MonicaRuiz
New Contributor

Hi Blake! Question: which user are you using to run the analyze tool? I tried to run it with the owner and did not have problems, but I could not run it with the sde user. Thanks!

0 Kudos