Dissolving Large Data

1991
6
11-13-2017 11:10 AM
DaveTenney
Regular Contributor

All,

  Desktop Version 10.2.2

I was curious if anyone else has run into this particular issue before...

we have about 50 layers within a file geodatabase that we do a compare on, lets call it "old vs new" and we only want to point out areas that have new or updated data. we only care about changes and nothing else. we create a feature envelope around each feature that is considered "new", we then take those 50 layers and merge them into 1. 

within that newly created merged layer we have approx 800k features....you guessed it, time to dissolve. unfortunately, from time to time it appears that the dissolve process fails. we have logging in place and we never get any notification that the process has failed, we just go days (we are currently sitting at 2 days) with the last line in our log stating..." dissolving "merged layer name""

the features themselves are not large features, but the area that encompasses all the feature is statewide and not all of the features are within close proximity of one another. is there a more efficient manner of dissolving all these features?

any thoughts?

dave

0 Kudos
6 Replies
DanPatterson_Retired
MVP Esteemed Contributor

assist the software perhaps by dicing (Dice) your large area into manageable chunks, or try it in Pro (at least it will use your available memory)

DaveTenney
Regular Contributor

Dan

  this may be my misunderstanding of the tool, but wouldn't that really only be beneficial for large features ie: political boundaries or large water features? 

the features are not large themselves, the total area that the feature encompasses is large. so I'm not sure of the logic under the hood for dissolve just doesnt like the fact that I have 800k "small" features spread out over a large area. for example, fire hydrants, not large features, but a lot of them and spread out all over the state. you'll find most of them in the big cities but they are everywhere.

im just not sure the logic with dissolve is built for that to be efficient, or I guess one could somehow group them into small subsets in order to create smaller dissolving task at first.

thanks

Dave

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

the number of vertices would obviously be the key, I have worked with shapes that have an exceptionally large area but relatively few vertices per feature.  It would be the number of checks that are needed during the dissolve process not necessarily the areal size that the feature covers

KenHartling
Esri Contributor

Would I be able to get a copy of the data to see what the real problem may be?

Thanks. Ken (ESRI GP Product Engineer)

0 Kudos
DaveTenney
Regular Contributor

Ken,

   what would be the best way to send you the gdb?

Dave

0 Kudos
KenHartling
Esri Contributor

Looking at Dave's data I found that there are many places in the data where there are thousands and thousands of overlapping polygons. This is a particular challenge for topological tools. The underlying engine, the Topology Engine (TE) is designed to create a topological fabric across the entire dataset and then work out all the topological relationships between all the features. With thousands upon thousands of polygons interacting over most of the dataset in question there will be a fair resource requirement in order to complete the operation. On 32bit Dave's case would page badly. At the same time the engine gets bogged down in figuring out how every polygon interacts with each and every other polygon.

In ArcGIS Pro we developed new tools that can be of help. The Pairwise Tools (Pro only). For data like Dave has created we released the PairwiseDissolve tool. The PairwiseDissolve tool with Dave's data took just under 1 minute on my machine as apposed to many hours using the Dissolve tool.  The PairwiseDissolve tool is a geometry operation that is more lightweight than the TE and it runs in parallel mode taking advantage of all the cores on your machine. Data precision in both the PairwiseDissolve and TE generated output is extremely good.

 

Although the DM/Generalization/Dissolve tool is extremely efficient for most cases you throw at it, cases like this with extreme overlap can cause it to run slow and in 32bit environments failure is possible. For these cases there is often nothing that will get a Dissolve operation using the DM/Generalization/Dissolve tool to perform anywhere near the PairwiseDissolve tool. We built the pairwise tools to take care of these scenarios. 

If you are performing any operations of size or complexity I would highly recommend you only run using python 64bit (which comes with Pro) with more than the minimum system requirements for Pro. For overlay operations investigate running the Pairwise tools or Analysis tools that run in parallel mode.  In ArcGIS Pro 2.1 (coming soon) many of the tools under the Analysis/Overlay toolbox support a parallel option for area-area, area-line and area-pnt. Pro 2.0 overlay tools only support parallel for area-area overlays.

Note: For the Pairwise tools, be sure to look very carefully at the doc for PairwiseIntersect… output is quite different than Analysis/Overlay/Intersect.

 

As a reminder...

Pro can be installed and run on the same machine (provided its windows 64bit) at the same time as the other ArcGIS applications.  More info - http://pro.arcgis.com/en/pro-app/get-started/about-licensing.htm 

 

To run python scripts make sure to run your scripts with the appropriate python install from the ArcGIS product you with to run them with (ArcMap, Pro...) - http://pro.arcgis.com/en/pro-app/arcpy/get-started/installing-python-for-arcgis-pro.htm

0 Kudos