I am doing my Master thesis with GIS Data where I have to churn a large amount of data, I have more than 2 million rows, which I must intersect with another map to get the common attribute table which I use in other software to get the required results, but processing itself is taking hours, yesterday I started it on ArcMap and it ran for 18 hours but just said "drawing features 36,000" at which point I thought it would take months to complete the operation and installed pro as i saw this thread.Dissolving Large Data
But still the problem is persisting and I ran into a problem again, I waited for 3 hours at which point Pro returned an error and started processing again, can you guys say how to go through this large data intersect.
Solved! Go to Solution.
create a featureclass from the shapefile will improve things (assume it should be in the same geodatabase)
prior to doing any intersections, make sure that you have made the featureclass with the join permanent (ie a new featureclass)
limit your area of interest prior to doing any intersection. In other words are there areas in both files that you know won't intersect? you want to remove having to check geometries against geometries that won't participate in an intersection at all. This could be simplified by doing a select by spatial location to limit to only the overlapping areas.
Thank you worked for me now, it worked in Arcpro and pairwise intersect took just less than 2 minutes, and normal intersect took around 15 minutes, after following all your steps carefully it worked pretty good, checked 3-4 times all your suggestions and followed them.
I agree with Dan, work in the native file formats for the best performance. Working with shape files, CSV files, Excel files, and doing joins and geoprocessing is a recipe for disaster. We have come across several geoprocessing operations in Pro that take 10x longer working with shape files than native file formats.
Two million rows isn't all that much with modern computers. I regularly process 20m x 60k Intersect operations, and they rarely run more than an hour. 8Gb RAM isn't all that much -- 16Gb is a modern low-end RAM allocation.
But more important than RAM is the type and speed of your disk. A laptop with a clunky old 40ms HDD seek disk would take a compute-year or more to do what a hot new <1ms SSD can do in minutes.
That's solid...I want the insiders secret
I'm running intersects in Pro using NVMe SSD, the best there is, and Pro would 'Shirley' crash before it ever completed 20m points to 60k polygon intersects.
Alteryx is the fastest analytics software with a GUI I've worked with, and it took 2 days for us to intersect 30 million points with 1k drive time polygons.
Please help us by sending us the case you see 'Shirley' crashing so we can take a look. Either via your support contact or we can arrange for you to send it to me directly. It would also be good to know your machine specs.
I'd love to see your '2 days for us to intersect 30 million points with 1k drive time polygons' case as well. 2 days seems a little too long to me unless the machine it is being run on is pretty slow or doesn't have adequate resources.
Pro has had the ability to run most of the overlay tools in parallel for some time (set arcpy.env.parallelProcessingFactor = 100 before running the tool). If you have a machine that can handle it, you may be able to get more performance out of many overlay operations.