How can I speed up a simple dissolve?

KimOllivier · ‎02-13-2011

I have a singlepart featureclass of polygons (2.5M) with an indexed key that I want to convert to multipart polygons. There are less than 0.5% (12,480) that are actually multipart, the rest are single parts. They are all reasonable multiparts, nothing silly that spans the whole dataset, no universal polygon or unreasonably large polygon such as a huge road casing.

Since there are only a few to process I thought: "Aha, select the polygons with multiple keys, process these and merge them back with the remaining singles". That dissolve of the subset does complete in human time.

But I can't complete the process because merging single and multipart features is not permitted. Who knew that? I thought multiparts were to be avoided for analysis, now I know.

Is there a way of merging multiparts without rewriting the entire tool in Python? I didn't think that multipart was a different feature type from single part. I had thought that a single part feature was just a multipart with one part. Is that not so?

The simple way (Dissolve_management) on the whole featureclass takes hours for sample sets (and fails to complete when it runs out of memory on the full set), giving me time to write this... ah here it is: 'unable to read memory, crashed'.

I should expect this by now, nearly any process on a large dataset seems to run out of memory far too often. Just a simple relate with a selection (keyfile select) takes hours just to select and fails to write a subset, compared to Workstation which takes seconds. I have tried using a Python set and creating an SQL query as a layer but this is also very slow when it comes to writing out the records.

Maybe I should just write out the records when running a cursor to test the key against the set and write them out manually? But that is rewriting the tool which I hoped to avoid. I am using local file geodatabases and 9.3, Duo Quad (8 processors) and 4 GB of memory, TB disk.

Any other ideas?

ChrisSnyder · ‎02-14-2011

merging single and multipart features is not permitted

Kim, I don't seem to have any issues creating a FC with mixed states of single and multipart (made a singlepart dissolve, and then a multipart dissolve, and merge together - no error). What error do you get? I think maybe I'm not grasping your issue?

My understanding is that the stupid ArcGIS Dissolve tool ALWAYS makes a multipart FC, and then depending on what you want, will break it into singlepart as a post process. Unfortunately I've found that that doesn't even work sometimes, and that you need to run the MultipartToSinglepart tool after you run the Dissolve (singlepart) tool to fully break it up. There is some logged bug about this... Coverages were easier as everything was always singlepart - no surprises!

Relates: I found a pretty slick way of doing large relates in a FGDB... The cool thing is you can pass EXTREEMLY large SQL statements to a FGDB - like millions of characters long... I never did find a limit... So instead of relying on the ArcGIS Relate "tool". I just gather the "key" field values in a search cursor, and use that to build a SQL expression to select the relate table with. It's even more efficient when using the OBJECTID fields cause you can use the .fidset describe property to gather the selected OIDs.

KathrinSchulte-Braucks · ‎02-16-2011

You're using a multicore processor. I made the interesting observation that dissolving a huge shapefile (several million features) takes days on my Multicore PC while it takes only an hour on an old Pentium 4 Single Core. Maybe you could try that out, too?

ChrisSnyder · ‎02-16-2011

Hmmm... I remeber a while back that processor affinity was an issue: http://resources.arcgis.com/content/nimbus-bug?bugID=TklNMDM0MTgx.

KathrinSchulte-Braucks · ‎02-16-2011

Hm, unfortunately in my case setting the affinity did not solve my problem. Another tip is to set in msconfig that only one core should be used.

KimOllivier · ‎02-24-2011

Kim, I don't seem to have any issues creating a FC with mixed states of single and multipart (made a singlepart dissolve, and then a multipart dissolve, and merge together - no error). What error do you get? I think maybe I'm not grasping your issue?

Polygons are inherently multipart so the dissolved and undissolved subsets merged successfully.

It applies to points. There is a Point featuretype and Multipoint featuretype which cannot be merged.
I now have the difficult decision to have all parcel labels as multiparts or create two layers which then makes searching a single table, easy joins, counting all more difficult. I could abandon points and store the attributes on the polygons directly, but that has other inconveniences such as polygon label placement and text rotation is not possible.

ChrisSnyder · ‎02-24-2011

parcel labels as multiparts

I don't understand why they have to be multipart... Is it becasue the parcels themselves are (sometimes) multipart polygons)? Why not just first break the parcels into singleparts (MultipartToSinglePart tool) and then get rid of any resulting labelpoints that share the same PARCEL_ID?