Multiprocessing won't create feature classes

Jay_Gregory · ‎04-01-2022

I'm trying to implement a fairly straightforward multiprocessing script that takes every polygon in a shapefile, tessellates into a grid, takes the center points of each grid square, and then clips the points by the original feature (since the tessellation generates a minimum bounding rectangle). The tessellation feature class is never created so next step (FeatureToPoint) fails. I don't understand - can someone help?

Even if I comment out all the arcpy commands except GenerateTessellation, the output is never actually created.

import multiprocessing
import os
import time
import arcpy

startTime = time.time()

basedir = r"C:\Users\Jay\Documents\ArcGIS\Projects\DataPrep"
scratchGDB = 'C:\\Jay\\data\data.gdb'
output = os.path.join(basedir, "grids.gdb")
features = os.path.join(basedir,"features.shp")

def multifunction(ft):
    import arcpy
    name = ft[0]
    extent = ft[1].extent
    print('Working on {}'.format(name))
    fl = arcpy.management.MakeFeatureLayer(features, "{}fl".format(name), where_clause=f"Name='{name}'")
    gdb = os.path.join(basedir, "{}.gdb".format(name))
    arcpy.management.CreateFileGDB(basedir, "{}.gdb".format(name))
    output=os.path.join(gdb, "{}grid".format(name))
    grid = arcpy.management.GenerateTessellation(output, Extent=extent, Shape_Type="SQUARE", Size="900 SquareMeters")
    print('tellessation')
    pointgrid = arcpy.management.FeatureToPoint(grid, os.path.join(scratchGDB, "{}pointgrid".format(apt)))
    arcpy.analysis.Clip(pointgrid, fl, os.path.join(output, f"{apt}Points"))
    arcpy.management.Delete(fl)
    arcpy.management.Delete(grid)
    arcpy.management.Delete(pointgrid)

def main():
    processList = [feature for feature in arcpy.da.SearchCursor(features, ['Name', 'SHAPE@'])]
    pool = multiprocessing.Pool(1)
    pool.map(multifunction, processList[0:10])
    pool.close()
    pool.join()

if __name__ == "__main__":
    print("Running")
    main()
    executionTime = (time.time() - startTime)
    print('Execution time in seconds: ' + str(executionTime))

ABishop · ‎04-01-2022

Hello Jay,

Try putting double back slashes in your basedir and raise 'r' at the beginning of your scratchGDB... make sure it follows format with double back slashes and double quotes around path.

Amanda Bishop, GISP

Jay_Gregory · ‎04-01-2022

Thanks for the tip - just did that, but it still didn't fix the issue.

ABishop · ‎04-01-2022

for output and features variables use the full paths for basdir

Amanda Bishop, GISP

Jay_Gregory · ‎04-01-2022

Even if I make those lines the below it still doesn't work

output = "C:\\Users\\Jay\\{}.shp".format(name)
print(output)
grid = arcpy.management.GenerateTessellation(output, Extent=extent, Shape_Type="SQUARE", Size="900 SquareMeters", Spatial_Reference=arcpy.SpatialReference(4326))

DanPatterson · ‎04-01-2022

Check your extent and area parameters first with a print statement

Extent—ArcGIS Pro | Documentation

Generate Tessellation (Data Management)—ArcGIS Pro | Documentation

And you didn't indicate whether the process works with just one, skipping the multiprocessing stuff

... sort of retired...

Anonymous User · ‎04-02-2022

Correct me if I am wrong, but I don't think that File Geodatabases support concurrent writing. I, at least have never been able to do so with multiprocessing and from experience in trying to, the worker fails at that point. Writing to directory works concurrently. It should at least write the first one to the gdb, and maybe another if the lock is released in time for another worker to write. Another thing that I see that may be an issue is if the feature (specifically the geometry) you are passing is able to be pickled. Wrapping the code in a try/except can provide some clues.

Running it once may provide false positives due to the file geodatabase limitation so I would use some breakpoints and debug it. I would move the worker to a new file and wrap the process in a try/except to return either success or the exception to the pool results list. Then you can print out the results and see what is failing in your processes.

When you map to the function that is located in the same script, it loads the whole script again so keeping your worker in a separate file will just load the function and not go through the global scope code and if name is main name check stuff again.

Anonymous User · ‎04-03-2022

This is an example of some processes that I have set to work in multiprocessing. The script checks to see if the datasets needs to be exported (source date > destination date) that returns a dictionary of basic messages. In a separate script is the worker:

# IMPORTS
from arcpy import CopyFeatures_management
from arcpy import env

def export_to_shp(fcname, fc, shp):
    """
    Function to copy features to shapefile
    """
    try:
        env.overwriteOutput = True
        CopyFeatures_management(fc, shp)
        return {'fcname': f'{fcname}', 'status': True, 'exmessage': 'was exported!'}

    except Exception as ex:
        return {'fcname': f'{fcname}', 'status': False, 'exmessage': f'{ex}'}

The main script looks like this. I removed a lot of the variables that are in house to us, so this wont be a copy/run but just to get an idea of structure and how you can return useful information from the multiprocessing.

# IMPORTS
import datetime as dt
import os
import sys
import time as t
from multiprocessing import Pool
import arcpy
from multiprocesses import ExportToShp

# ---------------------------------------------------------------------------

def weekly_export(sde_view):
    """
    Function to export featureclasses to shapefiles if they were changed in the past 7 days
    :param sde_view: connection class to connect to sde and output directories
    """
    arcpy.env.overwriteOutput = True
    startTime = t.time()
    try:
        # Featureclasses variables:
        themesDir = r'path to output folder'

        fcDict = {
            'zoning': [sde_view.fc_connection("admin_CLC", "zoning"),
                       os.path.join(themesDir, 'zoning.shp')],
            'row': [sde_view.fc_connection("property", "row"),
                            os.path.join(themesDir, 'row.shp')],
            'physftr': [sde_view.fc_connection("property", "physftr"),
                            os.path.join(themesDir, 'physftr.shp')],
            'streets': [sde_view.fc_connection("transportation", "streets"),
                        os.path.join(themesDir, 'streets.shp')]
        }

        fcList = []
        for fc, fcPaths in fcDict.items():
            # mt is an in house maintenance package, its just checking dates and feature counts here.
            if any([mt.last_edited_date_check(fcPaths[0], "last_edited_date", 7),
                    mt.get_number_of_features(fcPaths[1]) > mt.get_number_of_features(fcPaths[0])]):
                fcList.append((fc, fcPaths[0], fcPaths[1]))

        # If more than one dataset needs to be exported, do it in parallel
        if len(fcList) > 1:
            with Pool(processes=4) as pool:
                result = pool.starmap(ExportToShp.export_to_shp, fcList)

            for res in result:
                print(f'{res["fcname"]} {res["exmessage"]}')
                if not res['status']:
                    print(f'{res["fcname"]} {res["exmessage"]}')

        elif len(fcList) == 1:
            result = ExportToShp.export_to_shp(fcList[0][0], fcList[0][1], fcList[0][2])
            print(f'{result["fcname"]} {result["exmessage"]}')
            if not result['status']:
                print(f'{result["fcname"]} {result["exmessage"]}')

        else:
            print('No updates were made and nothing was exported')


    except Exception as err:
        print(
            f'{sys._getframe().f_code.co_name} - FAILED in {dt.timedelta(seconds=t.time() - startTime)}! See error log...')
        print(f'{sys._getframe().f_code.co_name} FAIL! - {err}')


if __name__ == '__main__':
    mTime = t.time()
    try:
        weekly_export(con)

        print(f'Export Weekly - COMPLETED in {dt.timedelta(seconds=t.time() - mTime)} ...')

    except Exception as err:
        print(
            f'{sys._getframe().f_code.co_name} - FAILED in {dt.timedelta(seconds=t.time() - mTime)}! See error log...')
        print(f'{sys._getframe().f_code.co_name} FAIL! - {err}')

Jay_Gregory · ‎04-04-2022

I have simplified everything to minimize the number of lines of code, split out my multiprocessing function into a separate file, tested without using multiprocessing, trying to catch errors, and still can't get it to work. Could this have something to do with the GenerateTessellation method?

First my file containing the function to be passed into the multiprocessing Pool:

def multifunction(airport):
    from arcpy import GenerateTessellation_management
    import os
    apt = airport[0]
    extent = airport[1].extent
    print('Working on {}'.format(apt))
    outputDir = "C:\\Jay\\data\\{}".format(apt)
    os.mkdir(outputDir)
    output = "C:\\Jay\\data\\{}\\{}.shp".format(apt, apt)
    print(output)
    try:
        GenerateTessellation_management(output, Extent=extent, Shape_Type="SQUARE", Size="900 SquareMeters")
    except Exception as err:
        print("error", err)
    return output

Next the main file:

import multiprocessing
import os
import time
import arcpy
from func import multifunction

startTime = time.time()

basedir = "C:\\Users\\jay\\Documents\\ArcGIS\\Projects\\DataPrep"
features = os.path.join(basedir,"features.shp")


def main():
    processList = [ft for ft in arcpy.da.SearchCursor(features, ['Name', 'SHAPE@'])]
    #THESE LINES DO WHAT I WANT
    # for row in processList[0:10]:
    #      multifunction(row)

    pool = multiprocessing.Pool()
    pool.map(multifunction, processList[0:10])
    pool.close()
    pool.join()


if __name__ == "__main__":
    print("Running")
    main()
    executionTime = (time.time() - startTime)
    print('Execution time in seconds: ' + str(executionTime))

Jay_Gregory · ‎04-04-2022

If I comment out the GenerateTessellation line and replace it with

CreateFeatureclass_management(outputDir, "{}.shp".format(apt), geometry_type="POINT", spatial_reference=SpatialReference(4326))

the multiprocessing works.

I'm getting no pickling errors, but this might have to do with the Extent portion? I have no idea 😞