Solution to: Dissolve tool does not fully dissolve medium-size dataset

10279
6
01-27-2014 01:43 AM
FilipKrál
Occasional Contributor III
Hi, this is a solution to a problem with the Dissolve (management) tool in Python.

Problem description:
I needed to dissolve singlepart polygon File Geodatabase feature class with a little over 200 000 features.

(Details:
feature areas: 0.00156 to 725.1 km sq
vertex counts: 6 to 2983 per feature, which I didn�??t consider to be Godzillas, expected output should not result in massive features either. Besides I didn�??t want to Dice the features because I didn�??t want any extra vertices in the output
Extent width and height: 521550.0 m and 967850.0 m.
Check Geometry and Repair Geometry didn�??t find or repair any errors.)

The operation I needed was:
arcpy.Dissolve_management(in_fc, out_fc, myField, "#", "MULTI_PART", "DISSOLVE_LINES")


This command was part of a larger script which used the dissolve tool several times.
Well, the result of this command was a feature class of polygons that were dissolved only within contiguous groups, as if the MULTI_PART parameter was set to SINGLE_PART!
The strange thing was that running the tool from ArcMap with exactly the same settings worked fine. When I copied the geoprocessing result as a snippet and ran it from script, it produced incorrect result as described above. No error or warning message was raised.

Several forum topics and blog posts describe various difficulties with the Dissolve tool, but nothing I found quite matched my situation. The closest was perhaps this blog post suggesting that it might be related to bug NIM079373: Running a large number of features through the Dissolve or Buffer with dissolve option, hangs during process.

What I tried and did not work

  • Playing with all parameters of the tool and with arcpy.env.XYTolerance and arcpy.env.XYResolution.

  • I built a model with the Dissolve tool, saved the model i a toolbox, added the toolbox to my script and ran the model as a geoprocessing tool from Python.

  • I wrote a function that looped through distinct combinations of values in dissolve fields and dissolved features in one category at a time (i.e. features with values in one combination at a time) and then merged the results into a final feature class. I had some success with this approach but it did not work when used as part of my larger script.

  • I wrote a function that splitted the input into chunks using another polygon feature class, dissolved the chunks and merged them into a final feature class.


What seems to work, so far:
I wrote a script with the Dissolve tool and �??exposed�?� its parameters using sys.argv. Then I called this script from my larger script using subprocess.popen.
My lines in the larger script look like this:

def dissolveInSubprocess(in_fc, out_fc, dissolve_fields, stat_fields = '#', multi_part = 'MULTI_PART', unsplit_lines = 'DISSOLVE_LINES', xytol = '#'):
    """Execute Dissolve tool in a subprocess"""
    opts = (in_fc, out_fc, dissolve_fields, stat_fields, multi_part, unsplit_lines, xytol)
    cmd = r"C:\Python27\ArcGIS10.1\python.exe C:\Scripts\dissolver.py " + " ".join(map(str, opts))
    fn.msg("Executing subprocess with " + str(cmd))
    chld = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    r = chld.communicate()
    print r
    if chld.returncode != 0:
        raise Exception("Error while dissolving in subprocess:" + str(chld.returncode))
    else:
        fn.msg("Finished subprocess with " + str(cmd))
    return chld.returncode

# �?� then I call the function

dissolveInSubprocess(in_fc, out_fc, ";".join(list_of_cols), stat_fields ='#', multi_part = 'MULTI_PART', unsplit_lines = "DISSOLVE_LINES", xytol = '#')


And the script that is called (dissolver.py) looks like this:
"""Script to execute dissolve in a separate subprocess."""
def main(in_fc, out_fc, dissolve_fields, stat_fields, multi_part, unsplit_lines, xytol):
    import arcpy
    if xytol[0].isdigit():
        arcpy.env.XYTolerance = xytol
    try:
        estuarycatchmentsRaw = arcpy.Dissolve_management(in_fc, out_fc, dissolve_fields, stat_fields , multi_part, unsplit_lines).getOutput(0)
    except Exception as ex:
        print ex.message
    return 0

import sys
args = sys.argv
print args
script, in_fc, out_fc, dissolve_fields, stat_fields, multi_part, unsplit_lines, xytol = args

main(in_fc, out_fc, dissolve_fields, stat_fields, multi_part, unsplit_lines, xytol)


I cannot fully explain why this works and all of the previous approaches didn�??t, but it seems that arcpy.Dissolve_management is more reliable when it runs in a separate process. Perhaps running tools in a separate process might help also when the Analysis Toolbox tools (Intersect etc.) exhibit inappropriate behaviour.

My system is ArcGIS 10.1 SP1 build 3143 on Windows 7 SP1 64 bit, Dell Precision T3600, 3.6 GHz 8 core Intel Xeon CPU and 16 GB RAM. After some experiments with ArcGIS 10.1 SP 1 for Desktop Background Geoprocessing (64-bit) I stick to the 32 bit Python that came with ArcGIS originally.

Unfortunately I cannot provide the dataset due to licensing restrictions.

I would appreciate your experiences and comments.
Filip.
Tags (2)
6 Replies
MathewCoyle
Frequent Contributor
Dissolve has some memory management issues, especially with 64-bit geoprocessing where it becomes very unstable with large datasets. With the 32-bit standard geoprocessing it just skips features that would consume too much memory to dissolve as opposed to outright crashing. That has been my experience anyways.
FilipKrál
Occasional Contributor III

Today I learned a new fact relevant to dissolving or intersecting large datasets.

ArcGIS Pro has Parallel Overlay toolset with Parallel Dissolve and Parallel Intersect tools that I hope address this issue.

I haven't tried these tools yet but the help looks promising.

Filip.

0 Kudos
AlexMackie
New Contributor

Thank you very much for taking the time to write this answer. I found the same bug in Dissolve_management and it wasted a lot of my employer's time. The insidious semi-random nature of these memory management bugs makes them extremely difficult to debug because setting up a test case that fails reliably is difficult.

Your solution of running in a separate process is the only thing I found that worked (and like you I tried many other things). For me there is an overhead of about 6 seconds per extra process call while the other process imports arcpy, but it is at least a solution!

I also found 32-bit background geoprocessing more reliable than 64-bit (which is simply broken for me). Now I just call all geoprocessing tasks in their own separate processes (like clip and erase). You can't use Feature Layers this way so it takes a little longer to write anything into a Feature Class but I choose reliability over speed any time.

This sort of bug can really hurt your professional reputation if you are required to perform some ostensibly simple tasks for a tight deadline. They should not still be in the wild.

curtvprice
MVP Esteemed Contributor

I was seeing similar behavior dissolving large (but not that large) Raster To Polygon outputs. The tool would simply hang, and then many hours later crash with a obviously "broke" type of error message.

Some of the features were discontiguous areas of the same value and many were just one per value. My solution was to split the data into parts: 1) count all the polygons by GRID_CODE and copy the 'single-parts' (frequency = 1) to another dataset, then 2) dissolve 'multi-parts' polygons at a time in a loop, 5000 at a time, then 3) appended them all together. Kludge but it does (usually) work.

Thanks, Filip and Mathew for the additional insight and maybe a better approach. For my particular problem (this is a script that will be run a LOT) my current approach, but putting the dissolve chunks in multiprocessing would run fast and consistently complete.

Alex I second your crankiness. That said: I strongly urge you all to send these problematic datasets to Esri Support so they can be added to test datasets Esri can use in their next development cycle. We are the best source of test datasets for them as we work with real data every day. The problem with this one is sometimes they work, sometimes they don't -- depends on the memory state when you launch the tool!

0 Kudos
curtvprice
MVP Esteemed Contributor

Just a side note, you can get the path in a version independent way by replacing this:

cmd = r"C:\Python27\ArcGIS10.1\python.exe"

with

cmd = os.path.join(sys.prefix, "python.exe")

or, if you want it to run 32 bit even if in launched from arcpy x64:

cmd = os.path.join(sys.prefix.replace("x64",""), "python.exe")

0 Kudos
curtvprice
MVP Esteemed Contributor

Filip, here's how that would work using multiprocessing.

http://stackoverflow.com/a/2046630/2234229

Unfortunately, I have been unsuccessful using multiprocessing in a script launched from Desktop. For a tool to be really useful I want to be able to run it from there (as well as from a standalone script). I currently have a case going with Esri on this that clearly has one of their best people on the case... when I get an answer I will put a blog post together.

0 Kudos