Processing Large Datasets

MikeMacRae · ‎09-03-2014

I am going to attempt to create an ongoing discussion of how to handle large datasets. I am using a number of tools in ArcGIS (clip, intersect, selections, etc) and I have found some of the information out there very useful, however some of the information targets specific issues where some of it is a bit old. I believe that there may be new processes out that adds more flexibility. So far, this is what I have found:

From this thread at GIS Stacked, the most comprehensive discussion I have found, provides some very good solutions to processing large datasets whcih also includes a link to ArcGIS help for Permance Tips for Geoprocessing Services

There is the ESRI Help tip for Tiled Processing of Large Datasets which provides information on performance and scalability of feature overlay tools (Intersect, Union, etc)

There is another thread that's about 3 years old from GIS Stacked, discussing ArcScripting and a good explanation from a former ESRI employee about his throughts on processing large datasets.

Another ESRI Powerpoint re-enforcing some of the techniques to process large datasets.

Another ESRI List of ways to successfully overlaying large, complex datasets in Geoprocessing

For what it's worth, I wrote a script a month or so ago that attempted to avoid a crash that was produced when running a dataset locally from a laptop hard drive. The idea was to intersect 2 very large datasets (both were the size of the state of Oregon). The memory allocation couldn't handle the size of the datasets so I basically took one dataset, converted it to a layer and incrementally changed the definition query to pull in groups of a thousand or so records. This allowed me to intersect the 2 datasets without it crashing. Mind you, the script took about 8-9 hours to run, but it was successful!

Please post any addition throughs, process flows and increase in performance solutions you have come across. I am particularly interested in speed performance.

MartinAmeskamp · ‎09-18-2014

Hi, rather than processing large dataset, we frequently have to move large datasets from one database to another (e.g. in data migration). Mostly, this takes place on Oracle ArcSDE platforms and typically, we have to move data into an existing schema. Also, most of my experience is using ST_GEOMETRY.

We have come up with a couple of rules that help us dealing with this:

Strip the target table or feature class as far as you can - no indexes, spatial or otherwise, no class extensions, no relationship classes, no geometric networks, no topologies, no versioning
Old style sdeexport/sdeimport is the fastest we have yet seen.
Work as close to the database as you can: Latency can be a serious problem. For big projects, we set up dedicated machines (typically high end desktops with fast Intel quadcore CPUs and SSDs) that have both Oracle and ArcGIS Desktop installed. Works wonders for GlobalID-creation!

Martin