I'm trying to implement a fairly straightforward multiprocessing script that takes every polygon in a shapefile, tessellates into a grid, takes the center points of each grid square, and then clips the points by the original feature (since the tessellation generates a minimum bounding rectangle). The tessellation feature class is never created so next step (FeatureToPoint) fails. I don't understand - can someone help?
Even if I comment out all the arcpy commands except GenerateTessellation, the output is never actually created.
import multiprocessing
import os
import time
import arcpy
startTime = time.time()
basedir = r"C:\Users\Jay\Documents\ArcGIS\Projects\DataPrep"
scratchGDB = 'C:\\Jay\\data\data.gdb'
output = os.path.join(basedir, "grids.gdb")
features = os.path.join(basedir,"features.shp")
def multifunction(ft):
import arcpy
name = ft[0]
extent = ft[1].extent
print('Working on {}'.format(name))
fl = arcpy.management.MakeFeatureLayer(features, "{}fl".format(name), where_clause=f"Name='{name}'")
gdb = os.path.join(basedir, "{}.gdb".format(name))
arcpy.management.CreateFileGDB(basedir, "{}.gdb".format(name))
output=os.path.join(gdb, "{}grid".format(name))
grid = arcpy.management.GenerateTessellation(output, Extent=extent, Shape_Type="SQUARE", Size="900 SquareMeters")
print('tellessation')
pointgrid = arcpy.management.FeatureToPoint(grid, os.path.join(scratchGDB, "{}pointgrid".format(apt)))
arcpy.analysis.Clip(pointgrid, fl, os.path.join(output, f"{apt}Points"))
arcpy.management.Delete(fl)
arcpy.management.Delete(grid)
arcpy.management.Delete(pointgrid)
def main():
processList = [feature for feature in arcpy.da.SearchCursor(features, ['Name', 'SHAPE@'])]
pool = multiprocessing.Pool(1)
pool.map(multifunction, processList[0:10])
pool.close()
pool.join()
if __name__ == "__main__":
print("Running")
main()
executionTime = (time.time() - startTime)
print('Execution time in seconds: ' + str(executionTime))
Hello Jay,
Try putting double back slashes in your basedir and raise 'r' at the beginning of your scratchGDB... make sure it follows format with double back slashes and double quotes around path.
Thanks for the tip - just did that, but it still didn't fix the issue.
for output and features variables use the full paths for basdir
Even if I make those lines the below it still doesn't work
output = "C:\\Users\\Jay\\{}.shp".format(name)
print(output)
grid = arcpy.management.GenerateTessellation(output, Extent=extent, Shape_Type="SQUARE", Size="900 SquareMeters", Spatial_Reference=arcpy.SpatialReference(4326))
Check your extent and area parameters first with a print statement
Extent—ArcGIS Pro | Documentation
Generate Tessellation (Data Management)—ArcGIS Pro | Documentation
And you didn't indicate whether the process works with just one, skipping the multiprocessing stuff
Correct me if I am wrong, but I don't think that File Geodatabases support concurrent writing. I, at least have never been able to do so with multiprocessing and from experience in trying to, the worker fails at that point. Writing to directory works concurrently. It should at least write the first one to the gdb, and maybe another if the lock is released in time for another worker to write. Another thing that I see that may be an issue is if the feature (specifically the geometry) you are passing is able to be pickled. Wrapping the code in a try/except can provide some clues.
Running it once may provide false positives due to the file geodatabase limitation so I would use some breakpoints and debug it. I would move the worker to a new file and wrap the process in a try/except to return either success or the exception to the pool results list. Then you can print out the results and see what is failing in your processes.
When you map to the function that is located in the same script, it loads the whole script again so keeping your worker in a separate file will just load the function and not go through the global scope code and if name is main name check stuff again.
This is an example of some processes that I have set to work in multiprocessing. The script checks to see if the datasets needs to be exported (source date > destination date) that returns a dictionary of basic messages. In a separate script is the worker:
# IMPORTS
from arcpy import CopyFeatures_management
from arcpy import env
def export_to_shp(fcname, fc, shp):
"""
Function to copy features to shapefile
"""
try:
env.overwriteOutput = True
CopyFeatures_management(fc, shp)
return {'fcname': f'{fcname}', 'status': True, 'exmessage': 'was exported!'}
except Exception as ex:
return {'fcname': f'{fcname}', 'status': False, 'exmessage': f'{ex}'}
The main script looks like this. I removed a lot of the variables that are in house to us, so this wont be a copy/run but just to get an idea of structure and how you can return useful information from the multiprocessing.
# IMPORTS
import datetime as dt
import os
import sys
import time as t
from multiprocessing import Pool
import arcpy
from multiprocesses import ExportToShp
# ---------------------------------------------------------------------------
def weekly_export(sde_view):
"""
Function to export featureclasses to shapefiles if they were changed in the past 7 days
:param sde_view: connection class to connect to sde and output directories
"""
arcpy.env.overwriteOutput = True
startTime = t.time()
try:
# Featureclasses variables:
themesDir = r'path to output folder'
fcDict = {
'zoning': [sde_view.fc_connection("admin_CLC", "zoning"),
os.path.join(themesDir, 'zoning.shp')],
'row': [sde_view.fc_connection("property", "row"),
os.path.join(themesDir, 'row.shp')],
'physftr': [sde_view.fc_connection("property", "physftr"),
os.path.join(themesDir, 'physftr.shp')],
'streets': [sde_view.fc_connection("transportation", "streets"),
os.path.join(themesDir, 'streets.shp')]
}
fcList = []
for fc, fcPaths in fcDict.items():
# mt is an in house maintenance package, its just checking dates and feature counts here.
if any([mt.last_edited_date_check(fcPaths[0], "last_edited_date", 7),
mt.get_number_of_features(fcPaths[1]) > mt.get_number_of_features(fcPaths[0])]):
fcList.append((fc, fcPaths[0], fcPaths[1]))
# If more than one dataset needs to be exported, do it in parallel
if len(fcList) > 1:
with Pool(processes=4) as pool:
result = pool.starmap(ExportToShp.export_to_shp, fcList)
for res in result:
print(f'{res["fcname"]} {res["exmessage"]}')
if not res['status']:
print(f'{res["fcname"]} {res["exmessage"]}')
elif len(fcList) == 1:
result = ExportToShp.export_to_shp(fcList[0][0], fcList[0][1], fcList[0][2])
print(f'{result["fcname"]} {result["exmessage"]}')
if not result['status']:
print(f'{result["fcname"]} {result["exmessage"]}')
else:
print('No updates were made and nothing was exported')
except Exception as err:
print(
f'{sys._getframe().f_code.co_name} - FAILED in {dt.timedelta(seconds=t.time() - startTime)}! See error log...')
print(f'{sys._getframe().f_code.co_name} FAIL! - {err}')
if __name__ == '__main__':
mTime = t.time()
try:
weekly_export(con)
print(f'Export Weekly - COMPLETED in {dt.timedelta(seconds=t.time() - mTime)} ...')
except Exception as err:
print(
f'{sys._getframe().f_code.co_name} - FAILED in {dt.timedelta(seconds=t.time() - mTime)}! See error log...')
print(f'{sys._getframe().f_code.co_name} FAIL! - {err}')
I have simplified everything to minimize the number of lines of code, split out my multiprocessing function into a separate file, tested without using multiprocessing, trying to catch errors, and still can't get it to work. Could this have something to do with the GenerateTessellation method?
First my file containing the function to be passed into the multiprocessing Pool:
def multifunction(airport):
from arcpy import GenerateTessellation_management
import os
apt = airport[0]
extent = airport[1].extent
print('Working on {}'.format(apt))
outputDir = "C:\\Jay\\data\\{}".format(apt)
os.mkdir(outputDir)
output = "C:\\Jay\\data\\{}\\{}.shp".format(apt, apt)
print(output)
try:
GenerateTessellation_management(output, Extent=extent, Shape_Type="SQUARE", Size="900 SquareMeters")
except Exception as err:
print("error", err)
return output
Next the main file:
import multiprocessing
import os
import time
import arcpy
from func import multifunction
startTime = time.time()
basedir = "C:\\Users\\jay\\Documents\\ArcGIS\\Projects\\DataPrep"
features = os.path.join(basedir,"features.shp")
def main():
processList = [ft for ft in arcpy.da.SearchCursor(features, ['Name', 'SHAPE@'])]
#THESE LINES DO WHAT I WANT
# for row in processList[0:10]:
# multifunction(row)
pool = multiprocessing.Pool()
pool.map(multifunction, processList[0:10])
pool.close()
pool.join()
if __name__ == "__main__":
print("Running")
main()
executionTime = (time.time() - startTime)
print('Execution time in seconds: ' + str(executionTime))
If I comment out the GenerateTessellation line and replace it with