TIP: Simple code to make IDENTITY or ERASE or Other Overlay Functions much faster

BillMiller · ‎01-15-2014

I'm not sure if this is a good idea to post but maybe it will help someone.
Sometimes ArcMap will take a very long time to complete an identity.
EXAMPLE:
Streams - about five million lines (All lines are single part and reasonable length)
Lakes - about ten thousand polygons (All polygons are single part and reasonable area)
The processing time for this identity was over a day before I stopped it.

arcpy.Identity_analysis("Streams", "Lakes", "StreamWithLakeId")

Using the following code reduced this time to about ten minutes.

def FastIdentity(inFL, idFL, outFC):  # Input must be Feature Layers
    arcpy.SelectLayerByLocation_management  (inFL, 'INTERSECT', idFL)
    arcpy.Identity_analysis                 (inFL, idFL, "in_memory/Flow0") #or ERASE
    arcpy.MultipartToSinglepart_management  ("in_memory/Flow0", outFC)      #OPTIONAL
    arcpy.Delete_management                 ("in_memory/Flow0")
    arcpy.SelectLayerByAttribute_management (inFL, "SWITCH_SELECTION")
    arcpy.Append_management                 (inFL, outFC, "NO_TEST","","")

arcpy.env.workspace = "C:/somepath.gdb"
arcpy.MakeFeatureLayer_management("Lakes",   "LakeLayer")
arcpy.MakeFeatureLayer_management("Streams", "StreamLayer")

FastIdentity("StreamLayer", "LakeLayer", "C:/out.gdb/OutStreams")

Check the following before using this method:

1) Set the join_attributes fields which could also be done with fieldmapping.
2) Maybe add the cluster tolerance to Identity.
3) Erase or other analysis overlay tools can also be done faster by using the same type of code.
4) Skip the MultipartToSinglepart if multiparts are needed.

NOTE: I only have access to 10.1 so this might be better in 10.2 or maybe I'm doing something wrong.
Also, I never did try using "in_memory" with the original Identity so that might have fixed the problem with less code.

ChrisSnyder · ‎01-16-2014

I also sometimes add this optimization step (a select by location - sometimes a complex series of them) to many of my scripts that use overlay tools.

It doesn't seem to help too much for smaller datasets, but does a lot for some larger datasets. I theorize that this method can often whittle the input features down enough so as to keep the 'LargeOverlayTiles' background process from firing up. This overlay tiles thing (while well intentioned for sure and pretty much hidden from view) can add unexpectedly and unreasonably long processing times to overlay tasks.