Processing time for Python Script using pythonwin vs. Script Tool

ChristalHigdon · ‎05-25-2016

I am wondering if anyone has experienced a dramatic difference in the time it takes to execute a Python script as a script tool vs. directly in Python or Pythonwin IDE? I run my script in Pythonwin (Python 2.7.8, 32-bit) and it takes about 17 minutes but when I run the same script as a script tool in ArcCatalog (version 10.3.1) Elapsed Time: 7 hours 39 minutes 44 seconds. That is an incredible difference and I'm wondering what the issue is. This is in our Citrix environment so the programs are not running on my desktop PC but on a Citrix Blade. What I thought was a script that provides a workable solution is not if when I distribute it in a toolbox, it takes magnitudes longer to run (like all day).

I have a script that does the following (I can post the fullcode if that might be helpful, though it is fairly long and this is a general question):

Summary: Creates one-way replicas to a file gdb from our corporate Enterprise (Oracle) SDE instance to our data center filing system and then pushes it out to our Network Attached Storage device using Robocopy and then sends an email of the log file.

Steps:

Initiates logging module to logfile

logging.basicConfig(level=logging.INFO,

format=' %(asctime)s|%(levelname)-8s|%(message)s',

datefmt='%Y-%m-%d %H:%M:%S',

filename=LOGFILE_NAME,

filemode='w')

Checks SDE inputs for replication requirements (datasets are versioned and feature classes/tables contain GlobalIDs)

if schema == "S_R10_CNF" and str(child.isVersioned) == 'False':

logging.info("Child {0} is not Registered as versioned".format(child.Name))

logging.info("Attempting to register...")

try:

arcpy.RegisterAsVersioned_management(child.Name, "NO_EDITS_TO_BASE")

logging.info("Registered {0} as Versioned successfully".format(child.Name))

except Exception as e:

logging.warning("Unable to Register As Versioned due to Exception: {0}".format(e.message))

logging.warning("{0} is NOT registered as versioned and will NOT be replicated".format(child.Name))

and

from collections import defaultdict

myDict = defaultdict(list)

for fc in arcpy.ListFeatureClasses("","",ds):

gid = arcpy.ListFields(fc,"",field_type="GlobalID")

if not gid:

myDict[ds].append(fc)

for tb in arcpy.ListTables('S_R10_CNF.*'):

gid = arcpy.ListFields(tb,"",field_type="GlobalID")

if not gid:

print('{0} is missing GLOBALID in Table {1}'.format(tb))

myDict['TABLE'].append(tb)

for i in myDict:

logging.info("{0} is missing GlobalIDs! Attempting to add them...".format(i))

try:

if i == "TABLE":

arcpy.AddGlobalIDs_management([myDict])

else:

arcpy.AddGlobalIDs_management(i)

logging.info("Added GlobalIDs to Dataset: {0}".format(i))

except Exception as e:

logging.warn("Unable to add GlobalIDs to dataset {0} \n due to Exception: {1}".format(i, e.message))

logging.warn("The following do not have GlobalIDs and will not be replicated: {0}".format(i))

for val in myDict:

printmsg(val)

logging.warn(val)

Creates the output file GDB if it doesn't already exist

if not arcpy.Exists(gdbName):

arcpy.CreateFileGDB_management(outPath, gdbName)

logging.info("Created target child GDB " + gdbName)

Generates a name for the Replica based on the dataset name

replicaName = replicaBase + "_NAS1Way"

truncLen = 32 - len("_NAS1Way")

if len(replicaName) > 32: #replica name can only be 32 characters or less or replica will fail with a vague error message

replicaName = "{0}_NAS1Way".format(replicaBase[0:truncLen])

Checks if the replica exists already, if so it synchronizes and copies metadata, if not, it creates the replica

# Check replica names in both parent and child

replicaList = [x.name.partition(".")[2] for x in arcpy.da.ListReplicas()]

replicaListGDB = [x.name for x in arcpy.da.ListReplicas(outGDB)]

if replicaName in replicaList:

try:

if replicaName in replicaListGDB:

arcpy.SynchronizeChanges_management(outGDB, "DBO."+replicaName, arcpy.env.workspace, "FROM_GEODATABASE2_TO_1")

logging.info("Synchronized One Way Replica for: {1}".format(replicaName, DS))

else:

logging.warn("Replica Name {0} found in parent SDE but not in child GDB".format(replicaName))

logging.warn(" Unregister replica {0}, delete child data, and run again!".format(replicaName))

except Exception as e:

logging.error("Unable to synchronize replica {0}\n due to Exception: {1}".format(replicaName,e.message))

try:

arcpy.env.workspace = outGDB

status='Synchronizing metadata for {0}'.format(DS)

logging.info(status)

if DS == "CNFTables":

for item in features:

arcpy.MetadataImporter_conversion(os.path.join(parentGDB,item), item.split(".")[1])

else:

for item in features:

arcpy.MetadataImporter_conversion(os.path.join(parentGDB,DS,item), os.path.join(DS.split(".")[1],item.split(".")[1]))

except Exception as e:

logging.error("Unable to synchronize metadata {0}\n due to Exception: {1}".format(replicaName,e.message))

else:

try:

arcpy.env.workspace = parentGDB

status='Creating 1 Way Replica for {0}'.format(DS)

arcpy.CreateReplica_management(featureList, "ONE_WAY_REPLICA", outGDB, replicaName, "FULL", "", "ALL_ROWS", "", "GET_RELATED")

logging.info("Created One Way Replica for: {0}".format(DS))

logging.info(" Replica is named: {0}".format(replicaName))

logging.info(" With data: ")

for f in features:

logging.info(" {0}".format(f))

except Exception as e:

logging.error("Unable to create replica for dataset {0}\n due to Exception: {1}".format(DS,e.message))

Calls Robocopy through subprocess module to copy the file GDB over to the NAS drive

subprocess.call(r'NET USE Z: /del')

subprocess.call(r'NET USE Z: {0} /user:<username> pwd'.format(destPath))

status='Pushing replica GDB changes to NAS Drive '.format(destPath)

logging.info(status)

subprocess.call(["robocopy", gdbPath, "Z:", "*.*", "/e", "/z", "/dcopy:T", "/purge", "/xo", "/xf", "*.lock", "{0}".format(os.path.join(gdbPath, "robocopy_log.txt")), "/fft", "/log:{0}".format(os.path.join(gdbPath, "robocopy_log.txt")), "/tee"])

Sends an email of the log file

mailserver.starttls()

mailserver.ehlo()

mailserver.sendmail(fromEmail, toEmailList, msg)

mailserver.close()

The slowest parts of the script seem to be running arcpy.SynchronizeChanges_management when it's in a Toolbox Script Tool, even when there are NO changes to synchronize! The box is checked to Run Python Script in Process.

I am looking into using multiprocessing as there are 16 cores on the Citrix server in order to run some of the checks and creating replicas to speed things up. Does anyone have any idea why the same script takes ridiculously longer as a Script Tool than directly in Python without the ArcGIS overhead?

Any tips would be greatly appreciated!

DanPatterson_Retired · ‎05-25-2016

Running the script directly will always take less time. This example is really mind-blowing...have you profiled or timed the various blocks to see where the bottle necks are? It might be worthwhile to throw some time markers within your script and collect the results.

import time

t0 = time.perf_counter() # best for windows, check the docs for other operating systems

do stuff

t1 = time.perf_counter()

print("\nTime...{}".format(t1-t0)) # print or log the results

ChristalHigdon · ‎05-25-2016

Thanks, my logging tracks the time for each step as follows:

2016-05-24 09:53:51|INFO |Target child GDB ChugachData.gdb already exists

2016-05-24 09:53:51|INFO |

2016-05-24 09:53:51|INFO |SKIPPING dataset S_R10_CNF.Cultural. It will NOT be replicated

2016-05-24 09:53:51|INFO |SKIPPING dataset S_R10_CNF.SUP. It will NOT be replicated

2016-05-24 09:53:51|INFO |

2016-05-24 10:13:25|INFO |Synchronized One Way Replica for: S_R10_CNF.Hydrogaphy

2016-05-24 10:13:25|INFO |Synchronizing metadata for S_R10_CNF.Hydrogaphy

2016-05-24 10:20:56|INFO |Synchronized One Way Replica for: S_R10_CNF.Off_Forest

2016-05-24 10:20:56|INFO |Synchronizing metadata for S_R10_CNF.Off_Forest

2016-05-24 10:33:03|INFO |Synchronized One Way Replica for: S_R10_CNF.LRMP_2002

2016-05-24 10:33:03|INFO |Synchronizing metadata for S_R10_CNF.LRMP_2002

2016-05-24 10:47:40|INFO |Synchronized One Way Replica for: S_R10_CNF.Landnet

2016-05-24 10:47:40|INFO |Synchronizing metadata for S_R10_CNF.Landnet

2016-05-24 10:57:31|INFO |Synchronized One Way Replica for: S_R10_CNF.Activities

2016-05-24 10:57:31|INFO |Synchronizing metadata for S_R10_CNF.Activities

2016-05-24 11:19:23|INFO |Synchronized One Way Replica for: S_R10_CNF.CNF_Boundary

2016-05-24 11:19:23|INFO |Synchronizing metadata for S_R10_CNF.CNF_Boundary

2016-05-24 11:31:34|INFO |Synchronized One Way Replica for: S_R10_CNF.CNF_Vegetation

2016-05-24 11:31:34|INFO |Synchronizing metadata for S_R10_CNF.CNF_Vegetation

2016-05-24 12:00:41|INFO |Synchronized One Way Replica for: S_R10_CNF.Landstat

2016-05-24 12:00:41|INFO |Synchronizing metadata for S_R10_CNF.Landstat

2016-05-24 12:08:06|INFO |Synchronized One Way Replica for: S_R10_CNF.Legacy_Water_Features

2016-05-24 12:08:06|INFO |Synchronizing metadata for S_R10_CNF.Legacy_Water_Features

2016-05-24 12:32:17|INFO |Synchronized One Way Replica for: S_R10_CNF.Recreation

2016-05-24 12:32:17|INFO |Synchronizing metadata for S_R10_CNF.Recreation

2016-05-24 12:44:28|INFO |Synchronized One Way Replica for: S_R10_CNF.Reference

2016-05-24 12:44:28|INFO |Synchronizing metadata for S_R10_CNF.Reference

2016-05-24 12:49:26|INFO |Synchronized One Way Replica for: S_R10_CNF.Resource_Protection_Areas

2016-05-24 12:49:26|INFO |Synchronizing metadata for S_R10_CNF.Resource_Protection_Areas

2016-05-24 12:59:11|INFO |Synchronized One Way Replica for: S_R10_CNF.Infrastructure

2016-05-24 12:59:11|INFO |Synchronizing metadata for S_R10_CNF.Infrastructure

2016-05-24 13:57:37|INFO |Synchronized One Way Replica for: S_R10_CNF.KenaiPeninsulaFeatureClasses

2016-05-24 13:57:37|INFO |Synchronizing metadata for S_R10_CNF.KenaiPeninsulaFeatureClasses

2016-05-24 14:12:42|INFO |Synchronized One Way Replica for: S_R10_CNF.Other_Transportation_Features

2016-05-24 14:12:42|INFO |Synchronizing metadata for S_R10_CNF.Other_Transportation_Features

2016-05-24 14:15:39|INFO |Synchronized One Way Replica for: S_R10_CNF.Contour

2016-05-24 14:15:39|INFO |Synchronizing metadata for S_R10_CNF.Contour

2016-05-24 14:20:59|INFO |Synchronized One Way Replica for: S_R10_CNF.Terrestrial

2016-05-24 14:20:59|INFO |Synchronizing metadata for S_R10_CNF.Terrestrial

2016-05-24 14:28:52|INFO |Synchronized One Way Replica for: S_R10_CNF.Fire

2016-05-24 14:28:52|INFO |Synchronizing metadata for S_R10_CNF.Fire

2016-05-24 14:39:19|INFO |Synchronized One Way Replica for: S_R10_CNF.Geology

2016-05-24 14:39:19|INFO |Synchronizing metadata for S_R10_CNF.Geology

2016-05-24 14:52:47|INFO |Synchronized One Way Replica for: S_R10_CNF.SceneryManagementSystem

2016-05-24 14:52:47|INFO |Synchronizing metadata for S_R10_CNF.SceneryManagementSystem

2016-05-24 15:54:25|INFO |Synchronized One Way Replica for: S_R10_CNF.Wildlife

2016-05-24 15:54:25|INFO |Synchronizing metadata for S_R10_CNF.Wildlife

2016-05-24 16:33:15|INFO |Synchronized One Way Replica for: S_R10_CNF.Special_Administration

2016-05-24 16:33:15|INFO |Synchronizing metadata for S_R10_CNF.Special_Administration

2016-05-24 16:45:27|INFO |Synchronized One Way Replica for: S_R10_CNF.TimberType

2016-05-24 16:45:27|INFO |Synchronizing metadata for S_R10_CNF.TimberType

2016-05-24 17:19:18|INFO |Synchronized One Way Replica for: S_R10_CNF.Transportation

2016-05-24 17:19:18|INFO |Synchronizing metadata for S_R10_CNF.Transportation

2016-05-24 17:30:48|INFO |Synchronized One Way Replica for: CNFTables

2016-05-24 17:30:48|INFO |Synchronizing metadata for CNFTables

2016-05-24 17:31:05|INFO |Pushing replica GDB changes to NAS Drive

2016-05-24 17:33:33|INFO |NAS Drive copying complete. For more information, see the robocopy_log.txt

I do remember reading about the cProfile module and will try that out, but the code runs just fine directly in Python... it seems ArcMap is somehow causing the slow down.

DanPatterson_Retired · ‎05-25-2016

Next time you run it from arcmap, that is where to see the time differences, You can use the profiling tools, but that is only really useful if you are looking at calls to functions etc. If you are interested in blocks of time, a log file or the time options works well. I use python decorator functions for this purpose if you are interested, ... let me know, they are on my blog

ChristalHigdon · ‎05-25-2016

I found it, looks neat. I will check it out. Thanks!

JoshuaBixby · ‎05-25-2016

When running the Python script as a tool, does it reside in a regular Toolbox (TBX) or a Python Toolbox (PYT)? If the former, does it make a difference if you move it into the latter?

ChristalHigdon · ‎05-25-2016

It's a regular Toolbox. I am trying a pyt now and it's .........running. Hasn't moved past the check if the output file GDB exists (which it does) in over 20 minutes, so I'm not very hopeful that a Python Toolbox would make a difference.