Processing time for Python Script using pythonwin vs. Script Tool

2733
6
05-25-2016 10:26 AM
ChristalHigdon
New Contributor III

I am wondering if anyone has experienced a dramatic difference in the time it takes to execute a Python script as a script tool vs. directly in Python or Pythonwin IDE? I run my script in Pythonwin (Python 2.7.8, 32-bit) and it takes about 17 minutes but when I run the same script as a script tool in ArcCatalog (version 10.3.1) Elapsed Time: 7 hours 39 minutes 44 seconds. That is an incredible difference and I'm wondering what the issue is. This is in our Citrix environment so the programs are not running on my desktop PC but on a Citrix Blade. What I thought was a script that provides a workable solution is not if when I distribute it in a toolbox, it takes magnitudes longer to run (like all day).

I have a script that does the following (I can post the fullcode if that might be helpful, though it is fairly long and this is a general question):

Summary: Creates one-way replicas to a file gdb from our corporate Enterprise (Oracle) SDE instance to our data center filing system and then pushes it out to our Network Attached Storage device using Robocopy and then sends an email of the log file.

Steps:

Initiates logging module to logfile

logging.basicConfig(level=logging.INFO,

                        format='    %(asctime)s|%(levelname)-8s|%(message)s',

                        datefmt='%Y-%m-%d %H:%M:%S',

                        filename=LOGFILE_NAME,

                        filemode='w')

Checks SDE inputs for replication requirements (datasets are versioned and feature classes/tables contain GlobalIDs)

     if schema == "S_R10_CNF" and str(child.isVersioned) == 'False':

            logging.info("Child {0} is not Registered as versioned".format(child.Name))

            logging.info("Attempting to register...")

            try:

                arcpy.RegisterAsVersioned_management(child.Name, "NO_EDITS_TO_BASE")

                logging.info("Registered {0} as Versioned successfully".format(child.Name))

            except Exception as e:

                logging.warning("Unable to Register As Versioned due to Exception: {0}".format(e.message))

                logging.warning("{0} is NOT registered as versioned and will NOT be replicated".format(child.Name))

and

    from collections import defaultdict

    myDict = defaultdict(list)

        for fc in arcpy.ListFeatureClasses("","",ds):

            gid = arcpy.ListFields(fc,"",field_type="GlobalID")

            if not gid:   

                myDict[ds].append(fc)

    for tb in arcpy.ListTables('S_R10_CNF.*'):

        gid = arcpy.ListFields(tb,"",field_type="GlobalID")

        if not gid:

            print('{0} is missing GLOBALID in Table {1}'.format(tb))

            myDict['TABLE'].append(tb)

    for i in myDict:

        logging.info("{0} is missing GlobalIDs! Attempting to add them...".format(i))

        try:

            if i == "TABLE":

                arcpy.AddGlobalIDs_management([myDict])

            else:

                arcpy.AddGlobalIDs_management(i)

            logging.info("Added GlobalIDs to Dataset: {0}".format(i))

         except Exception as e:

            logging.warn("Unable to add GlobalIDs to dataset {0} \n  due to Exception: {1}".format(i, e.message))

            logging.warn("The following do not have GlobalIDs and will not be replicated: {0}".format(i))

            for val in myDict:

                printmsg(val)

                logging.warn(val)

Creates the output file GDB if it doesn't already exist

if not arcpy.Exists(gdbName):

        arcpy.CreateFileGDB_management(outPath, gdbName)

        logging.info("Created target child GDB " + gdbName)

Generates a name for the Replica based on the dataset name

replicaName = replicaBase + "_NAS1Way"

    truncLen = 32 - len("_NAS1Way")

    if len(replicaName) > 32:   #replica name can only be 32 characters or less or replica will fail with a vague error message

        replicaName = "{0}_NAS1Way".format(replicaBase[0:truncLen])

Checks if the replica exists already, if so it synchronizes and copies metadata, if not, it creates the replica

# Check replica names in both parent and child

    replicaList = [x.name.partition(".")[2] for x in arcpy.da.ListReplicas()]

    replicaListGDB = [x.name for x in arcpy.da.ListReplicas(outGDB)]

    if replicaName in replicaList:

        try:

            if replicaName in replicaListGDB:

                arcpy.SynchronizeChanges_management(outGDB, "DBO."+replicaName, arcpy.env.workspace, "FROM_GEODATABASE2_TO_1")

                logging.info("Synchronized One Way Replica for: {1}".format(replicaName, DS))

            else:

                logging.warn("Replica Name {0} found in parent SDE but not in child GDB".format(replicaName))

                logging.warn("     Unregister replica {0}, delete child data, and run again!".format(replicaName))

        except Exception as e:

            logging.error("Unable to synchronize replica {0}\n   due to Exception: {1}".format(replicaName,e.message))

        try:

            arcpy.env.workspace = outGDB

            status='Synchronizing metadata for {0}'.format(DS)

            logging.info(status)

            if DS == "CNFTables":

                for item in features:

                    arcpy.MetadataImporter_conversion(os.path.join(parentGDB,item), item.split(".")[1])

            else:

                for item in features:

                    arcpy.MetadataImporter_conversion(os.path.join(parentGDB,DS,item), os.path.join(DS.split(".")[1],item.split(".")[1]))

        except Exception as e:

            logging.error("Unable to synchronize metadata {0}\n   due to Exception: {1}".format(replicaName,e.message))

           

    else:

        try:

            arcpy.env.workspace = parentGDB

            status='Creating 1 Way Replica for {0}'.format(DS)

            arcpy.CreateReplica_management(featureList, "ONE_WAY_REPLICA", outGDB, replicaName, "FULL", "", "ALL_ROWS", "", "GET_RELATED")

            logging.info("Created One Way Replica for: {0}".format(DS))

            logging.info("    Replica is named: {0}".format(replicaName))

            logging.info("    With data: ")

            for f in features:

                logging.info("         {0}".format(f))

        except Exception as e:

            logging.error("Unable to create replica for dataset {0}\n   due to Exception: {1}".format(DS,e.message))

Calls Robocopy through subprocess module to copy the file GDB over to the NAS drive

    subprocess.call(r'NET USE Z: /del')

    subprocess.call(r'NET USE Z: {0} /user:<username> pwd'.format(destPath))

     status='Pushing replica GDB changes to NAS Drive '.format(destPath)

     logging.info(status)

    subprocess.call(["robocopy", gdbPath, "Z:", "*.*", "/e", "/z", "/dcopy:T", "/purge", "/xo", "/xf", "*.lock", "{0}".format(os.path.join(gdbPath, "robocopy_log.txt")), "/fft", "/log:{0}".format(os.path.join(gdbPath, "robocopy_log.txt")), "/tee"])

Sends an email of the log file

     mailserver.starttls()

    mailserver.ehlo()

    mailserver.sendmail(fromEmail, toEmailList, msg)

    mailserver.close()

The slowest parts of the script seem to be running arcpy.SynchronizeChanges_management when it's in a Toolbox Script Tool, even when there are NO changes to synchronize! The box is checked to Run Python Script in Process.

I am looking into using multiprocessing as there are 16 cores on the Citrix server in order to run some of the checks and creating replicas to speed things up. Does anyone have any idea why the same script takes ridiculously longer as a Script Tool than directly in Python without the ArcGIS overhead?

Any tips would be greatly appreciated!

6 Replies
DanPatterson_Retired
MVP Emeritus

Running the script directly will always take less time.  This example is really mind-blowing...have you profiled or timed the various blocks to see where the bottle necks are?  It might be worthwhile to throw some time markers within your script and collect the results.

import time

t0 = time.perf_counter() # best for windows, check the docs for other operating systems

do stuff

t1 = time.perf_counter()

print("\nTime...{}".format(t1-t0)) # print or log the results

ChristalHigdon
New Contributor III

Thanks, my logging tracks the time for each step as follows:

    2016-05-24 09:53:51|INFO    |Target child GDB ChugachData.gdb already exists

    2016-05-24 09:53:51|INFO    |

    2016-05-24 09:53:51|INFO    |SKIPPING dataset S_R10_CNF.Cultural. It will NOT be replicated

    2016-05-24 09:53:51|INFO    |SKIPPING dataset S_R10_CNF.SUP. It will NOT be replicated

    2016-05-24 09:53:51|INFO    |

    2016-05-24 10:13:25|INFO    |Synchronized One Way Replica for: S_R10_CNF.Hydrogaphy

    2016-05-24 10:13:25|INFO    |Synchronizing metadata for S_R10_CNF.Hydrogaphy

    2016-05-24 10:20:56|INFO    |Synchronized One Way Replica for: S_R10_CNF.Off_Forest

    2016-05-24 10:20:56|INFO    |Synchronizing metadata for S_R10_CNF.Off_Forest

    2016-05-24 10:33:03|INFO    |Synchronized One Way Replica for: S_R10_CNF.LRMP_2002

    2016-05-24 10:33:03|INFO    |Synchronizing metadata for S_R10_CNF.LRMP_2002

    2016-05-24 10:47:40|INFO    |Synchronized One Way Replica for: S_R10_CNF.Landnet

    2016-05-24 10:47:40|INFO    |Synchronizing metadata for S_R10_CNF.Landnet

    2016-05-24 10:57:31|INFO    |Synchronized One Way Replica for: S_R10_CNF.Activities

    2016-05-24 10:57:31|INFO    |Synchronizing metadata for S_R10_CNF.Activities

    2016-05-24 11:19:23|INFO    |Synchronized One Way Replica for: S_R10_CNF.CNF_Boundary

    2016-05-24 11:19:23|INFO    |Synchronizing metadata for S_R10_CNF.CNF_Boundary

    2016-05-24 11:31:34|INFO    |Synchronized One Way Replica for: S_R10_CNF.CNF_Vegetation

    2016-05-24 11:31:34|INFO    |Synchronizing metadata for S_R10_CNF.CNF_Vegetation

    2016-05-24 12:00:41|INFO    |Synchronized One Way Replica for: S_R10_CNF.Landstat

    2016-05-24 12:00:41|INFO    |Synchronizing metadata for S_R10_CNF.Landstat

    2016-05-24 12:08:06|INFO    |Synchronized One Way Replica for: S_R10_CNF.Legacy_Water_Features

    2016-05-24 12:08:06|INFO    |Synchronizing metadata for S_R10_CNF.Legacy_Water_Features

    2016-05-24 12:32:17|INFO    |Synchronized One Way Replica for: S_R10_CNF.Recreation

    2016-05-24 12:32:17|INFO    |Synchronizing metadata for S_R10_CNF.Recreation

    2016-05-24 12:44:28|INFO    |Synchronized One Way Replica for: S_R10_CNF.Reference

    2016-05-24 12:44:28|INFO    |Synchronizing metadata for S_R10_CNF.Reference

    2016-05-24 12:49:26|INFO    |Synchronized One Way Replica for: S_R10_CNF.Resource_Protection_Areas

    2016-05-24 12:49:26|INFO    |Synchronizing metadata for S_R10_CNF.Resource_Protection_Areas

    2016-05-24 12:59:11|INFO    |Synchronized One Way Replica for: S_R10_CNF.Infrastructure

    2016-05-24 12:59:11|INFO    |Synchronizing metadata for S_R10_CNF.Infrastructure

    2016-05-24 13:57:37|INFO    |Synchronized One Way Replica for: S_R10_CNF.KenaiPeninsulaFeatureClasses

    2016-05-24 13:57:37|INFO    |Synchronizing metadata for S_R10_CNF.KenaiPeninsulaFeatureClasses

    2016-05-24 14:12:42|INFO    |Synchronized One Way Replica for: S_R10_CNF.Other_Transportation_Features

    2016-05-24 14:12:42|INFO    |Synchronizing metadata for S_R10_CNF.Other_Transportation_Features

    2016-05-24 14:15:39|INFO    |Synchronized One Way Replica for: S_R10_CNF.Contour

    2016-05-24 14:15:39|INFO    |Synchronizing metadata for S_R10_CNF.Contour

    2016-05-24 14:20:59|INFO    |Synchronized One Way Replica for: S_R10_CNF.Terrestrial

    2016-05-24 14:20:59|INFO    |Synchronizing metadata for S_R10_CNF.Terrestrial

    2016-05-24 14:28:52|INFO    |Synchronized One Way Replica for: S_R10_CNF.Fire

    2016-05-24 14:28:52|INFO    |Synchronizing metadata for S_R10_CNF.Fire

    2016-05-24 14:39:19|INFO    |Synchronized One Way Replica for: S_R10_CNF.Geology

    2016-05-24 14:39:19|INFO    |Synchronizing metadata for S_R10_CNF.Geology

    2016-05-24 14:52:47|INFO    |Synchronized One Way Replica for: S_R10_CNF.SceneryManagementSystem

    2016-05-24 14:52:47|INFO    |Synchronizing metadata for S_R10_CNF.SceneryManagementSystem

    2016-05-24 15:54:25|INFO    |Synchronized One Way Replica for: S_R10_CNF.Wildlife

    2016-05-24 15:54:25|INFO    |Synchronizing metadata for S_R10_CNF.Wildlife

    2016-05-24 16:33:15|INFO    |Synchronized One Way Replica for: S_R10_CNF.Special_Administration

    2016-05-24 16:33:15|INFO    |Synchronizing metadata for S_R10_CNF.Special_Administration

    2016-05-24 16:45:27|INFO    |Synchronized One Way Replica for: S_R10_CNF.TimberType

    2016-05-24 16:45:27|INFO    |Synchronizing metadata for S_R10_CNF.TimberType

    2016-05-24 17:19:18|INFO    |Synchronized One Way Replica for: S_R10_CNF.Transportation

    2016-05-24 17:19:18|INFO    |Synchronizing metadata for S_R10_CNF.Transportation

    2016-05-24 17:30:48|INFO    |Synchronized One Way Replica for: CNFTables

    2016-05-24 17:30:48|INFO    |Synchronizing metadata for CNFTables

    2016-05-24 17:31:05|INFO    |Pushing replica GDB changes to NAS Drive

  2016-05-24 17:33:33|INFO    |NAS Drive copying complete. For more information, see the robocopy_log.txt

I do remember reading about the cProfile module and will try that out, but the code runs just fine directly in Python... it seems ArcMap is somehow causing the slow down.

0 Kudos
DanPatterson_Retired
MVP Emeritus

Next time you run it from arcmap, that is where to see the time differences, You can use the profiling tools, but that is only really useful if you are looking at calls to functions etc.  If you are interested in blocks of time, a log file or the time options works well.  I use python decorator functions for this purpose if you are interested, ... let me know, they are on my blog

0 Kudos
ChristalHigdon
New Contributor III

I found it, looks neat. I will check it out. Thanks!

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

When running the Python script as a tool, does it reside in a regular Toolbox (TBX) or a Python Toolbox (PYT)?  If the former, does it make a difference if you move it into the latter?

0 Kudos
ChristalHigdon
New Contributor III

It's a regular Toolbox. I am trying a pyt now and it's .........running. Hasn't moved past the check if the output file GDB exists (which it does) in over 20 minutes, so I'm not very hopeful that a Python Toolbox would make a difference.

0 Kudos