Advice on running multiple python scripts each night

B-L · ‎01-21-2022

Hi all!

My Question:

Any advice on running multiple python scripts each evening to automate geoprocessing tools?

My Problem:

I am trying to improve my organizations python scripts. We have multiple SDEs and multiple scripts connecting to these SDEs all running nightly via a batch file and windows task scheduler. We have arrived at a point where we do not have any more time in the evening to add more scripts.

Each script establishes a lock with one or more SDEs, so only one script can run at a time. This seems really inefficient to me, but i am not sure how to do things otherwise. We would receive errors almost every night relating to 'lock' issues before we added the portion in the script that disconnects other users and denies connections to the databases (until the script ends).

We have arcPro, and i have heard that you can create models in there and have them run nightly as well, but i am uncertain if this would be more efficient or just a lateral move.

The complexity of each script varies but they all follow a general outline which is:

1. import modules

2. create log

3. disconnect users from sde to establish lock

4. run geoprocessing logic

5. allow connections again.

here is some sample code:

import arcpy
import logging
import datetime
import time
import sys
import os


# create log
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
formatter = logging.Formatter("%(asctime)s:%(levelname)s:%(message)s")
file_handler = logging.FileHandler("C:....<location of log file>logFile.log")
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)

arcpy.env.overwriteOutput = True

# Current Day
Day = time.strftime("%m-%d-%Y", time.localtime())
# Current Time
Time = time.strftime("%I:%M:%S %p", time.localtime())


# Local variables:
sde_1 = r"C:..<path to sde>...FIRST_SDE.sde"
sde_2 = r"C:..<path to sde>...SECOND_SDE.sde"

input_fc = r"C:..<path to sde>...FIRST_SDE.sde\FeatureClassInput"
output_fc = r"C:..<path to sde>...FIRST_SDE.sde\FeatureClassOutput"


logger.info("\n"*4)

try:

    # Get a list of connected users to sde_1 and sde_2
    userList_sde_1 = arcpy.ListUsers(sde_1)
    userList_sde_2 = arcpy.ListUsers(sde_2)

    # Pull names of current connections
    userNames_sde_1 = [uL.Name for uL in userList_sde_1]
    logger.info("These users are connected to sde_1.sde: " +
                str(userNames_sde_1))

    # Pull names of current connections
    userNames_sde_2 = [uL.Name for uL in userList_sde_2]
    logger.info("These users are connected to sde_2.sde: " +
                str(userNames_sde_2))

    # No longer accept connections to the databases

    arcpy.AcceptConnections(sde_1, False)
    logger.info("sde_1.sde is no longer accepting connections")

    arcpy.AcceptConnections(sde_2, False)
    logger.info("sde_2.sde is no longer accepting connections")

    # Disconnect all users from the databases
    arcpy.DisconnectUser(sde_1, "ALL")
    print("disconnecting sde_1 users")
    logger.info("Disconnecting sde_1 users" + str(userNames_sde_1))

    arcpy.DisconnectUser(sde_2, "ALL")
    print("disconnecting sde_2 users")
    logger.info("Disconnecting sde_2 users" + str(userNames_sde_2))


except:
    logger.exception(
        "There is an error with disconnecting users from databases")


try:
    # Start Time
    logger.info('Process Started at ' + str(Day) + " " + str(Time))

    # arcpy buffer tool
    arcpy.Buffer_analysis(input_fc, output_fc, "50 METERS")
    logger.info("Select Parcels \n")

    #other geoprocessing tools go here
    #....
    #....
    #....
    #....
    #....
    #....


except:

    logger.exception("There is an error with the arcpy portion of the script")

    #Check to see if it was a lock issue....
    
    userList_sde_1_afterDisconnect = arcpy.ListUsers(sde_1)
    userList_sde_2_afterDisconnect = arcpy.ListUsers(sde_2)

    # Check connections again for sde_1
    userNames_sde_1 = [uL.Name for uL in userList_sde_1_afterDisconnect]
    logger.info(
        "These users are connected to sde_1 despite disconnect code: " + str(userNames_sde_1))

    # Check connections again for sde_2
    userNames_sde_2 = [uL.Name for uL in userList_sde_2_afterDisconnect]
    logger.info(
        "These users are connected to sde_2 despite disconnect code: " + str(userNames_sde_2))

try:
    # Accept new connections to the database

    arcpy.AcceptConnections(sde_1, True)
    logger.info("sde_1 is accepting connections")

    arcpy.AcceptConnections(sde_2, True)
    logger.info("sde_2 is accepting connections")


except:
    logger.exception(
        "There is an error with allowing the databases to accept connections again")


logger.info("End of Script")

file_handler.close()

Kara_Shindle · ‎01-21-2022

We started with a 12 hour hour script that was designed for once a month but needed to be done nightly. I shaved 5 hours of by moving some date conversion to the SQL side. I also reviewed all of it and found ways to use multiprocessing. We generate a few thousand PDFs nightly with map series and multiprocessing makes that possible in less then 5 hours usually. I'm also planning to implement some attribute rules that will shave some time off of the script as well. I'd recommend a review of everything to make sure it is al still necessary.

I would also recommend against the Pro model shift. It's a lateral move - our original model broke ArcMap model Builder and they told me it had to be moved to Python. Upgrading the script from 2.7 to 3.x also helped with some speed issues. I'm planning to move to at least one service like suggested up above myself.

View solution in original post

jcarlson · ‎01-21-2022

Is it necessary to run these against a direct connection? We have our own collection of nightly scripts, but since they run against published services, they don't run into lock issues, and we can run them concurrently if necessary.

- Josh Carlson
Kendall County GIS

B-L · ‎01-21-2022

Hi Josh,

Thanks for the fast reply! That is a great workflow that I had not considered. I will certainly need to investigate which scripts this will work for. I know that there are several that utilize non-published FCs exclusively and some scripts that utilize both published services and non-published FCs. I imagine that I will still need to use the direct SDE connection for those. Consolidating the others though (and removing the Lock code) should improve things greatly.

Anonymous User · ‎01-21-2022

We run anywhere from 1-19 tasks (depending on daily, weekly, monthly triggers) on our environment. One thing that helped with time was checking if the tasks needs to be ran where possible. If there were no changes to source datasets, there is no need to rebuild xyz dataset. I am not sure if your SDE's are all on the same server, but we also have specific tasks separated out and ran independently. Like at 7 pm its safe to upload any changes made during the day that other tasks will source when they run later at night.

B-L · ‎01-21-2022

I really like the workflow of breaking things out into daily, weekly, monthly triggers. Thank you JeffK!

Kara_Shindle · ‎01-21-2022

We started with a 12 hour hour script that was designed for once a month but needed to be done nightly. I shaved 5 hours of by moving some date conversion to the SQL side. I also reviewed all of it and found ways to use multiprocessing. We generate a few thousand PDFs nightly with map series and multiprocessing makes that possible in less then 5 hours usually. I'm also planning to implement some attribute rules that will shave some time off of the script as well. I'd recommend a review of everything to make sure it is al still necessary.

I would also recommend against the Pro model shift. It's a lateral move - our original model broke ArcMap model Builder and they told me it had to be moved to Python. Upgrading the script from 2.7 to 3.x also helped with some speed issues. I'm planning to move to at least one service like suggested up above myself.

B-L · ‎01-21-2022

Thank you @Kara_Shindle this is really helpful! I really appreciate you sharing your experience with model builder and the insight that it is a lateral move. We just finished a long process of upgrading our scripts to python 3 (just the syntax differences, we did not change any logic to boost performance at the time, although hindsight is 20/20) and I believe that did speed things up a bit for us. Offloading some of the processes like date conversions to SQL and utilizing attribute rules are two ideas i had not considered. I will definitely see where I can implement those tactics. I am unfamiliar with multiprocessing, do you happen to have any example code that you would feel comfortable sharing or perhaps any articles on the subject that were helpful to you?

Kara_Shindle · ‎01-24-2022

@B-L multiprocessing is a specific-use kind of thing - but you basically take lots of tasks and divide them up to run on multiple processors at one time. In my case, I use them to generate and stitch together several thousand PDFs of parcels based off of several map series every night.

Googling multiprocessing will get you lots of information, here is one: Multiprocessing

Penn State University geography dept puts some of their classes online, but I had the privilege of taking this course, and the Lesson 1.6.3.2 material on Multiprocessing &GIS is worth a read. They also provide a list of resources at the end.

Kara_Shindle · ‎01-24-2022

I learned how to do this a specific way, which is what I am describing. I'm sure someone can come up with a much better way then me, but here is what I have.

First, I have a function set up in a separate file called my "Worker". This is the function that will get passed to my pool of workers. This particular function is for creating a one page PDF based off of a map series of parcels. I used this function in lots of other scripts.

import os, sys
import arcpy
arcpy.env.overwriteOutput = True

#(Path to your MXD, Path you want the document to be extracted to, name of the page used in the map series)
def publicWorker(mxdPath, outFolder, name):
	"""
		This is the function that gets called and does the work of exporting the DDP for the UPI specified.
		Note that this function does not try to write to arcpy.AddMessage() as nothing is ever displayed.  If the cexport succeeds then it returns TRUE else FALSE.
	"""
	try:
		mxdFile = arcpy.mp.ArcGISProject(mxdPath)
		cleanName = name.replace("/", "_") #Removes the slash due to it being an escape character
		l = mxdFile.listLayouts()[0]
		if not l.mapSeries is None:
			ms = l.mapSeries
			if ms.enabled:
				ms.currentPageNumber = ms.getPageNumberFromName(name)
				file = os.path.join(outFolder, cleanName)
				ms.exportToPDF(file, "CURRENT", resolution=300)
				ms.refresh()

		del mxdFile
		return True
	except arcpy.ExecuteError:
		# Geoprocessor threw an error
		arcpy.AddError(arcpy.GetMessages(2))
		report.write("Execute Error:", arcpy.ExecuteError)

	except Exception as e:
		tb = sys.exc_info()[2]
		report.write("Failed at Line %i \n" % tb.tb_lineno)
		report.write("Error: {} \n".format(e))
	return False

Then I have the main multiprocessing script that creates the pool of workers and assigns them my tasks.

Here is a portion of the code - Before line 8, my code creates a list of the variables called 'idList' from a feature class that has all of the parcels that were edited and need their PDFs updated. That list is used to create the jobs for the worker pool.

import arcpy
import multiprocessing
from PublicWorker import publicWorker

mxdPath = 
outFolder = 
def createmaps_handler():
		jobs = []
		for name in idList:
			jobs.append((mxdPath,outFolder,name))                  
		arcpy.AddMessage("Public job list has " + str(len(jobs)) + " elements.")
		# Create and run multiprocessing pool.
		multiprocessing.set_executable(*installpath*, 'pythonw.exe')) # make sure Python environment is used for running processes, even when this is run as a script tool

		arcpy.AddMessage("Sending to pool")

		cpuNum = multiprocessing.cpu_count()  # determine number of cores to use
		print("there are: " + str(cpuNum) + " cpu cores on this machine")

		with multiprocessing.Pool(processes=cpuNum) as pool: 
			res = pool.starmap(publicWorker, jobs)  # run jobs in job list; res is a list with return values of the worker function
		pool.close()
		pool.join()
		# If an error has occurred report it		 # count how many times False appears in the list with the return values
		failed = res.count(False) # count how many times False appears in the list with the return values
		if failed > 0:
			arcpy.AddError("{} workers failed!".format(failed))
1))
		#makes sure code of the if-statement only executed for main process we start by running script, not the suprocesses created when using multiprocessing in this file

	except arcpy.ExecuteError:
		# Geoprocessor threw an error
		arcpy.AddError(arcpy.GetMessages(2))
		print("Execute Error:", arcpy.ExecuteError)
	except Exception as e:
		# Capture all other errors
		arcpy.AddError(str(e))
		print("Exception:", e)

if __name__ == '__main__':
	createmaps_handler()

Eventually, I'd like to improve the error reporting in this, if possible, and I also have been working on a section for reporting how long it takes to run a batch of jobs. Hope this helps! A lot of this was adapted from lessons I learned during the class I have listed above.

B-L · ‎01-25-2022

Thank you so much @Kara_Shindle ! This is extremely helpful