crrose

Processing time for Python Script using pythonwin vs. Script Tool

Discussion created by crrose on May 25, 2016
Latest reply on May 25, 2016 by crrose

I am wondering if anyone has experienced a dramatic difference in the time it takes to execute a Python script as a script tool vs. directly in Python or Pythonwin IDE? I run my script in Pythonwin (Python 2.7.8, 32-bit) and it takes about 17 minutes but when I run the same script as a script tool in ArcCatalog (version 10.3.1) Elapsed Time: 7 hours 39 minutes 44 seconds. That is an incredible difference and I'm wondering what the issue is. This is in our Citrix environment so the programs are not running on my desktop PC but on a Citrix Blade. What I thought was a script that provides a workable solution is not if when I distribute it in a toolbox, it takes magnitudes longer to run (like all day).

 

I have a script that does the following (I can post the fullcode if that might be helpful, though it is fairly long and this is a general question):

Summary: Creates one-way replicas to a file gdb from our corporate Enterprise (Oracle) SDE instance to our data center filing system and then pushes it out to our Network Attached Storage device using Robocopy and then sends an email of the log file.

Steps:

Initiates logging module to logfile

logging.basicConfig(level=logging.INFO,

                        format='    %(asctime)s|%(levelname)-8s|%(message)s',

                        datefmt='%Y-%m-%d %H:%M:%S',

                        filename=LOGFILE_NAME,

                        filemode='w')

Checks SDE inputs for replication requirements (datasets are versioned and feature classes/tables contain GlobalIDs)

     if schema == "S_R10_CNF" and str(child.isVersioned) == 'False':

            logging.info("Child {0} is not Registered as versioned".format(child.Name))

            logging.info("Attempting to register...")

            try:

                arcpy.RegisterAsVersioned_management(child.Name, "NO_EDITS_TO_BASE")

                logging.info("Registered {0} as Versioned successfully".format(child.Name))

            except Exception as e:

                logging.warning("Unable to Register As Versioned due to Exception: {0}".format(e.message))

                logging.warning("{0} is NOT registered as versioned and will NOT be replicated".format(child.Name))

and

    from collections import defaultdict

    myDict = defaultdict(list)

        for fc in arcpy.ListFeatureClasses("","",ds):

            gid = arcpy.ListFields(fc,"",field_type="GlobalID")

            if not gid:   

                myDict[ds].append(fc)

    for tb in arcpy.ListTables('S_R10_CNF.*'):

        gid = arcpy.ListFields(tb,"",field_type="GlobalID")

        if not gid:

            print('{0} is missing GLOBALID in Table {1}'.format(tb))

            myDict['TABLE'].append(tb)

    for i in myDict:

        logging.info("{0} is missing GlobalIDs! Attempting to add them...".format(i))

        try:

            if i == "TABLE":

                arcpy.AddGlobalIDs_management([myDict[i]])

            else:

                arcpy.AddGlobalIDs_management(i)

            logging.info("Added GlobalIDs to Dataset: {0}".format(i))

         except Exception as e:

            logging.warn("Unable to add GlobalIDs to dataset {0} \n  due to Exception: {1}".format(i, e.message))

            logging.warn("The following do not have GlobalIDs and will not be replicated: {0}".format(i))

            for val in myDict[i]:

                printmsg(val)

                logging.warn(val)

Creates the output file GDB if it doesn't already exist

if not arcpy.Exists(gdbName):

        arcpy.CreateFileGDB_management(outPath, gdbName)

        logging.info("Created target child GDB " + gdbName)

Generates a name for the Replica based on the dataset name

replicaName = replicaBase + "_NAS1Way"

    truncLen = 32 - len("_NAS1Way")

    if len(replicaName) > 32:   #replica name can only be 32 characters or less or replica will fail with a vague error message

        replicaName = "{0}_NAS1Way".format(replicaBase[0:truncLen])

Checks if the replica exists already, if so it synchronizes and copies metadata, if not, it creates the replica

# Check replica names in both parent and child

    replicaList = [x.name.partition(".")[2] for x in arcpy.da.ListReplicas()]

    replicaListGDB = [x.name for x in arcpy.da.ListReplicas(outGDB)]

    if replicaName in replicaList:

        try:

            if replicaName in replicaListGDB:

                arcpy.SynchronizeChanges_management(outGDB, "DBO."+replicaName, arcpy.env.workspace, "FROM_GEODATABASE2_TO_1")

                logging.info("Synchronized One Way Replica for: {1}".format(replicaName, DS))

            else:

                logging.warn("Replica Name {0} found in parent SDE but not in child GDB".format(replicaName))

                logging.warn("     Unregister replica {0}, delete child data, and run again!".format(replicaName))

        except Exception as e:

            logging.error("Unable to synchronize replica {0}\n   due to Exception: {1}".format(replicaName,e.message))

        try:

            arcpy.env.workspace = outGDB

            status='Synchronizing metadata for {0}'.format(DS)

            logging.info(status)

            if DS == "CNFTables":

                for item in features:

                    arcpy.MetadataImporter_conversion(os.path.join(parentGDB,item), item.split(".")[1])

            else:

                for item in features:

                    arcpy.MetadataImporter_conversion(os.path.join(parentGDB,DS,item), os.path.join(DS.split(".")[1],item.split(".")[1]))

        except Exception as e:

            logging.error("Unable to synchronize metadata {0}\n   due to Exception: {1}".format(replicaName,e.message))

           

    else:

        try:

            arcpy.env.workspace = parentGDB

            status='Creating 1 Way Replica for {0}'.format(DS)

            arcpy.CreateReplica_management(featureList, "ONE_WAY_REPLICA", outGDB, replicaName, "FULL", "", "ALL_ROWS", "", "GET_RELATED")

            logging.info("Created One Way Replica for: {0}".format(DS))

            logging.info("    Replica is named: {0}".format(replicaName))

            logging.info("    With data: ")

            for f in features:

                logging.info("         {0}".format(f))

        except Exception as e:

            logging.error("Unable to create replica for dataset {0}\n   due to Exception: {1}".format(DS,e.message))

Calls Robocopy through subprocess module to copy the file GDB over to the NAS drive

    subprocess.call(r'NET USE Z: /del')

    subprocess.call(r'NET USE Z: {0} /user:<username> pwd'.format(destPath))

     status='Pushing replica GDB changes to NAS Drive '.format(destPath)

     logging.info(status)

    subprocess.call(["robocopy", gdbPath, "Z:", "*.*", "/e", "/z", "/dcopy:T", "/purge", "/xo", "/xf", "*.lock", "{0}".format(os.path.join(gdbPath, "robocopy_log.txt")), "/fft", "/log:{0}".format(os.path.join(gdbPath, "robocopy_log.txt")), "/tee"])

Sends an email of the log file

     mailserver.starttls()

    mailserver.ehlo()

    mailserver.sendmail(fromEmail, toEmailList, msg)

    mailserver.close()

 

 

 

The slowest parts of the script seem to be running arcpy.SynchronizeChanges_management when it's in a Toolbox Script Tool, even when there are NO changes to synchronize! The box is checked to Run Python Script in Process.

I am looking into using multiprocessing as there are 16 cores on the Citrix server in order to run some of the checks and creating replicas to speed things up. Does anyone have any idea why the same script takes ridiculously longer as a Script Tool than directly in Python without the ArcGIS overhead?

 

Any tips would be greatly appreciated!

Outcomes