Optimizing the performance of processing task with heavy load overlay analysis

480
4
Jump to solution
02-27-2024 01:13 AM
NH33
by
New Contributor

I seek guidance on optimizing the performance of processing tasks within ArcGIS Pro 3.2.I focus on addressing a processing bottleneck when executing a task with a heavy load.

My project involves distinguishing a large number of agricultural land polygons that are free from steep slopes, meaning they do not intersect with polygons designated as steep terrain (a technique known as Boolean overlay analysis). There are about 50 million farmland polygons and 10 million steep slope 100m mesh polygons. Both the agricultural land and steep slope polygons are stored individually as shapefiles (vector format) on the SSD of the C drive. For this purpose, I utilized the "Select Layer By Location" tool from the "Data Management Toolbox," adjusting it to select agricultural polygons that intersect with steep slope polygons (+ Invert spatial relationship). Although I anticipated the process to be time-consuming. However, I was surprised to find that the utilization of the CPU, GPU, and memory on my computer remained under 10%, seldom reaching 20%, despite the prolonged processing time. Typically, if these components were nearly maxed out and causing delays, I would consider optimizing the workflow or upgrading the system's components. However, since the bottleneck seems to stem from the software, it appears more prudent to tackle this issue first. My computer specifications are shown below meets the recommended specifications. My hope is to address the bottleneck on the software side and speed up my task.

I would appreciate suggestions on:
Methods to address the bottleneck and use hardware performance efficiently.
- Modifications to ArcGIS Pro, Windows or other settings
- Alternatives to the current toolbox utilized
- Adjustments to improve hardware compatibility with ArcGIS Pro ...etc.

I am not looking for such solutions at this time:
Methods that do not address the bottleneck directly to speed up and streamline tasks.
- Simplifying polygons
- Files dividing and automation (Model builder, ArcPy)
- Employing raster geoprocessing tools ...etc.
My primary goal is to fully leverage my PC's capabilities to enhance processing speed in ArcGIS Pro.

・PC specifications
- Model: ASUS G35DX-R9R2080TI
- Operating System: Windows 10
- CPU: AMD Ryzen 9 3950X (16 cores, 32 threads)
- GPU: NVIDIA GeForce RTX 2080 Ti (11GB)
- Memory: DDR4-3400 64GB
- Storage: SSD (NVMe PCI Express 3.0 x4) 512GB + HDD 2TB (with at least 100GB of free space on the SSD)
- Power Options: High-Performance mode

2 Solutions

Accepted Solutions
DanPatterson
MVP Esteemed Contributor
DuncanHornby
MVP Notable Contributor

I'm surprised you did not observe a reduction in processing time by moving the data into a file geodatabase. May be the shapefiles had a spatial index already built? You can tell this by simply opening up it's attribute table and if you can see a * next to the shape field header then you have a spatial index.

Having seen your screenshot and knowing you are talking about millions of polygons which would suggest your dataset is covering a nation, another thing that can cripple performance is having large MULTIPART features that cover the majority of the extent of your data. If that is something you have in your data then another simple performance boosting thing you can do is to convert the data into SINGLEPART. You run your selection tool on those datasets.  Just an idea, won't make your PC use all its cores but makes the query significantly more efficient.

View solution in original post

4 Replies
DanPatterson
MVP Esteemed Contributor

related and unresolved

Hardware recommendations for heavy ArcGIS Pro usag... - Esri Community


... sort of retired...
DuncanHornby
MVP Notable Contributor

Can you explain your data a little more?  You seem to be saying that both your farm land and steep terrain polygons are "100m mesh polygons"? What do you mean by that, a picture would be helpful.  You also say your data is stored as shapefiles, these are an old format and spatial indexing is something you need to add to the data. But as you are talking in the millions of polygons I would move the data into a file geodatabase, you will get an instant boost in performance.  Your PC hardware is pretty high spec so I don't think you can improve upon that.

NH33
by
New Contributor

Thank you for your replies. I'm including a visual sample to better explain the "100m mesh polygons." In the attached image, the yellow areas represent the steep terrain polygons, each approximately 100m in length and width. The blue areas denote the farmland polygons. For simplicity, the image showcases a smaller section, but in total, the dataset comprises 50 million blue polygons (farmland) and 10 million yellow squares (steep terrain).

I have also taken your advice into consideration and attempted to migrate the data to a file geodatabase. Unfortunately, this did not lead to an increase in CPU utilization or overall performance improvement.

Upon further investigation, I've found that ArcGIS Pro has limited tools that support parallel processing, as mentioned on their help page (Parallel processing with Spatial Analyst ). It was pointed out in a discussion introduced by @DanPatterson that ESRI seems to prioritize clock speed over cross-core processing capabilities. This might be the crux of the bottleneck issue, as the full potential of my 16-core CPU isn't being utilized.

Sample.png

0 Kudos
DuncanHornby
MVP Notable Contributor

I'm surprised you did not observe a reduction in processing time by moving the data into a file geodatabase. May be the shapefiles had a spatial index already built? You can tell this by simply opening up it's attribute table and if you can see a * next to the shape field header then you have a spatial index.

Having seen your screenshot and knowing you are talking about millions of polygons which would suggest your dataset is covering a nation, another thing that can cripple performance is having large MULTIPART features that cover the majority of the extent of your data. If that is something you have in your data then another simple performance boosting thing you can do is to convert the data into SINGLEPART. You run your selection tool on those datasets.  Just an idea, won't make your PC use all its cores but makes the query significantly more efficient.