Zonal Statistics takes 1.5 hours in AGPro, in Python the same code takes a few days

MichaelLedwith · ‎03-02-2022

Hi,

I've been pulling out my hair trying to find a solution to this enigma. Zonal Statistics as Table takes less than two hours if I run it using the AG Pro 2.9.2 tool. If I copy the python code and run it in a 64-bit IDLE shell (3.7.11), it takes almost two days.

I've tested this on my workstation and three different virtual desktops. All are fairly well equipped Win10 environments with plenty of disk space, RAM, graphic card or graphic support, etc. There's plenty of available cpu- and gpu-resources during the process.

Here's the long and short of the situation:

1. The in_zone_data is comprised of slightly more than 4000 polygons that have been converted to a raster with 10 m resolution and a unique ID. The raster has a defined coordinate system and is approximately 5000 X 5000 pixels, unsigned 16-bit;

2. The in_value_raster is the result of two pre-processes. First, the red band is extracted from a RGB-image via Raster Functions (in AG Pro) or arcpy.ia.ExtractBand (in Python). Focal Statistics is then run on the red band via Raster Functions (in AG Pro) or arcpy.ia.FocalStatistics (in Python). The two processes are done almost instantaneously in both AG Pro and Python. The raster has a defined coordinate system, 0.16 m resolution, 625 000 X 625 000 pixels, unsigned 8-bit. In AG Pro, the raster layer source for the Focal Stats raster is on a separate drive (D:\Temp\ArcGISProTemp9836\);

3. Zonal Statistics as Table. Running this tool in AG Pro on the Raster Layers takes about one hour and forty-five minutes. Running the same process (arcpy.ia.ZonalStatisticsAsTable) in Python takes days.

4. I've set up the environments so that they are identical (workspace, scratch, extent, etc). The processing times are the same.

5. I've converted the in_zone_data to a raster layer via arcpy.management.MakeRasterLayer. The processing times are essentially the same.

Any explanation? Anyone experience something similar? This snippet is part of a larger program that iterates through tons of different areas so I really need to be able to run this via a script and not manually.

No, I cannot provide anyone with the input data but certainly all help appreciated.

Thanks!

SarmisthaChatterjee · ‎03-04-2022

Thank you for your response.

Recommendations:

Based on your response and your script, I have the following recommendations:

Before executing Focal Statistics, specify
1. arcpy.env.cellSize to your zone raster (10m)
2. arcpy.env.snapRaster to your value raster
Before executing Zonal Statistics or Zonal Statistics as Table, you can save the output from Focal Statistics on disc, to check how much time it takes to create this raster dataset. You can also check the cell size to make sure that it is 10m (OPTIONAL).
You can specify the output from Focal Statistics as value raster to Zonal Statistics or Zonal Statistics as Table. At this point, the output from Focal Statistics, if not saved earlier will be saved at 10m cell size, which should take substantially less time.

Comments:

I have also put together some comments below to explain my recommendations.

1. Cell size: It looks like you have not explicitly specified the environment cell size. The Zonal Statistics or Zonal Statistics as Table tool is executing with default cell size, which is Maximum of inputs. So, your Zonal Statistics output cell size is 10m. The Zonal Statistics as Table output is also calculated based on 10m cell size.

2. Focal Statistics: Since there is no analysis environment cell size specified, Focal Statistics is calculated with 0.16m cell size, resulting in a large raster. You are not immediately saving this large raster, and passing the output from Focal Statistics to Zonal Statistics or Zonal Statistics as Table. Since you are not saving the Focal Statistics output, it feels like Focal Statistics is executing quickly. In reality, Focal Statistics is returning a raster function, which stores the analytical model only, but not a raster dataset on disc.

If you are not interested in the Focal Statistics output as the final result, you can use this raster function output to another tool/raster function. However, depending on the nature of the subsequent operation, the raster function will be extended with additional analytical operation (function chain) or first a dataset will be created, and then used as input in the analytical tool.

Usually, the local and focal operations create function chain, where a raster dataset can be created by saving the final output at the end of the analysis, which makes the execution faster.

If a raster function or a function chain is used as input to zonal or global tools/raster functions, a dataset will first be created internally, before executing that in the tool. In this case, the local/focal operation is not chained together with the zonal/global operation.

3. Zonal Statistics as Table: You are specifying the output of Focal Statistics as input to Zonal Statistics or Zonal Statistics as Table. Due to the nature of the operation (zonal), the tool requires a raster dataset as value. So, it triggers the save internally, which is creating a large raster with 0.16m cell size. This raster is then internally resampled to a 10m value raster for performing the analysis with a 10m zone raster. All this extra work in Zonal Statistics as Table is probably taking the additional time, which can be avoided by specifying analysis cell size and snap raster before performing Focal Statistics.

It is not very clear yet why the tool is executing faster in Pro compared to your standalone python script. I am hoping by specifying the cell size and snap raster explicitly, you will see a performance improvement. We can then take a closer look at your workflow in Pro to understand the source of the execution time difference.

Please let me know if this works with your dataset, and if you have additional questions.

Thanks,

Sarmistha

View solution in original post

DanPatterson · ‎03-02-2022

a listing of the script section would help

... sort of retired...

MichaelLedwith · ‎03-02-2022

Well, the AG Pro "script" is:

arcpy.ia.ZonalStatisticsAsTable("U2_LPIS_m10_10rast1", "BLOCKID", "Focal Statistics_Extract Bands", r"D:\Projekt\SJV_Blockkontroll2022_XXXXXXX_MILE\Arbete\Produktion\U2\ZonalStatistics\U2_zonalstats.gdb\U2_LPIS_m10_10rast1_textur", "DATA", "MEAN", "CURRENT_SLICE", 90, "AUTO_DETECT")

That probably doesn't help you much.

The Python script (the last one of many, many that I've tried) is:

from arcpy import CheckInExtension, CheckOutExtension
from arcpy.ia import ExtractBand, FocalStatistics, NbrRectangle, ZonalStatisticsAsTable
from datetime import datetime as dt
from os import path

print("Starting at {0}".format(dt.now()))

CheckOutExtension("ImageAnalyst")
proj_mapp = path.join(r"D:\Projekt", "SJV_Blockkontroll2022_XXXXXXX_MILE")
arb_dir = path.join(proj_mapp, "Arbete")
prod_dir = path.join(arb_dir, "Produktion")
mos_dir = path.join(prod_dir, "U2", "Mosaik")
mos_gdb = path.join(mos_dir, "U2_mosaic.gdb")
in_md = path.join(mos_gdb, "NIR_red_green_mosaic")
red_band = ExtractBand(in_md, [1])
textur = FocalStatistics(red_band, NbrRectangle(3, 3, "CELL"), "RANGE")
m10 = path.join(arb_dir, "Indata", "U2", "LPIS", "U2_LPIS.gdb", "U2_LPIS_m10_del1")
print("Starting Zonal Stats at {0}".format(dt.now()))
ZonalStatisticsAsTable(m10, "BLOCKID", textur, path.join(prod_dir, "U2", "ZonalStatistics", "U2_zonalstats.gdb", "U2_LPIS_m10_10rast1_textur_python"), "DATA", "MEAN", "CURRENT_SLICE", 90, "AUTO_DETECT")
CheckInExtension("ImageAnalyst")

print("Finished at {0}".format(dt.now()))

Like I wrote, the script works without any errors and the process time up to running Zonal Statistics as Table can be counted in microseconds.

The reason for the excessive amount of path variables has to do with the main script, of which this little snippet is a part.

My problem is that it takes about 24 times as long to run it as a Python script than from within ArcGIS Pro.

Anonymous User · ‎03-02-2022

It's probably the 64-bit IDLE. Try running it in the 32 bit or another IDE, pointing to the python.exe located in the esri conda env.

C:\Users\<you>\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\python.exe

MichaelLedwith · ‎03-02-2022

I'll try this, if simply out of curiosity, but I wonder if you could provide a reason as to why reducing the amount of available RAM from 32 GB (64-bit processor) to 4 GB (32-bit processor) would make such a huge difference in time of process.

Update: Please inform me as to how I can run arcpy commands for python 3.7.11 in a 32-bit IDLE shell. The Desktop 32-bit IDLE shell doesn't work because I don't have Image Analyst available for Desktop (only Pro). The python.exe file (mentioned above) opens a 64-bit shell.

DanPatterson · ‎03-02-2022

Python 3.7.11 [MSC v.1927 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 7.21.0 -- An enhanced Interactive Python.

As a side note, python in the Pro environment (no clone) is above and it is 64-bit. ArcMap I suspect was different, but that was for python 2.7

... sort of retired...

Anonymous User · ‎03-03-2022

Looks like you got some good advice in other directions so my post may not be worth anything. The point of the post was to try other IDE's pointed to the env. Try running it in the 32 bit or another IDE. Sorry for my mistake using '32 bit'. I was thinking of ArcMap's 2.7 arcpy but since you don't have the license, have Pro, and the 2.7 IDLE is still 64 I don't think it's worth pursuing.

A few times I've had gp tasks take more time in 3.x than they once did in 2.7, and I also experienced time variance between IDE's executing the same scripts. Not significant like 24 hours difference, but some instances were double digit minutes. Running from the shell... you may try executing the script from the command line. We have a dissolve process that will only fully dissolve the dataset and not leave any tiles if it is ran as a sub process from the command line in both 2.7 and 3.7. But the other advice given seems more plausible.

DanPatterson · ‎03-02-2022

By default, this tool will take advantage of multicore processors. The maximum number of cores that can be used is four.

To use fewer cores, use the parallelProcessingFactor environment setting.

You might want to check to see whether this applies if your python is external to the arcgis pro environment.

... sort of retired...

MichaelLedwith · ‎03-02-2022

I suppose that I haven't explained the situation clearly.

I run ZonalStatisticsAsTable from within ArcGIS Pro using the tool. It takes about 1.5 hours.

I run ZonalStatisticsAsTable from IDLE (ArcGIS Pro, i.e. python 3.7.11 - 64-bit) using arcpy. It takes about 36 hours.

The computer, data sources, environmental settings are exactly the same for both. The only obvious difference is that the inputs are opened in the Table of Contents within ArcGIS Pro (presumably as raster layers).

In neither cases are the computer's resources being utilized excessively.

DanPatterson · ‎03-02-2022

And as I was suggesting, perhaps the parallel processing factor, which is supposed to be using 4 cores by default, is somehow getting ignored when run from a standalone script. Set it in your script to be sure, next time

... sort of retired...