How to improve 'calculate statistics' performance?

12-10-2019 05:57 PM
Occasional Contributor II

I have a variety of satellite products from multiple sensors which have been added to multiple source mosaic datasets (MD) in an enterprise geodatabase. All of these source mosaics are then added to a single derived mosaic. My understanding is at this point you should calculate statistics - on the derived mosaic dataset.

If I run calculate statistics on the derived MD with an AOI that is around multiple images, the tool runs quickly. CPU does not particularly spike, but I can see that the disk I/O for reading the data (not local) spikes dramatically. This aligns with my understanding that the tool fundamentally queries a large number of pixel values. The output of this is an image that appears crystal clear per stretch properties.

However, if I then run the tool under similar parametrs but with no AOI, it hangs and doesn't do much. CPU consumes a single core while disk I/O flatline entirely. After waiting several hours - no real update, same behaviour.

Why is this occuring? How can I effectively calculate statistics on hundreds to thousands of satellite products in a derived mosaic dataset? The images are not contiguous, so perhaps it is querying NULL values outside a footprint which is causing the underlying issue?

FYI, using ArcGIS Pro.

0 Kudos
1 Reply
Esri Regular Contributor

I assume you have a source mosaic dataset that contains the satellite imagery that you have. This may have footprints updated to remove nodata areas etc. To do color correction it is necessary to create statistics. These statistics need not be 100% accurate, but good enough estimates to enable the color correction. You need to use the "Build Pyramid And Statistics" geoprocessing tool. Specify as input the mosaic dataset. Add a query if you want to limit to a specific subset. You can turn off 'Include subdirectories', 'build pyramids'. Exclude 'Include Source Datasets'. Key is to set a suitable X & Y Skip factor. For large images I would recommend a value of 4. This makes the process about 16x faster. You could go larger, but depending on the source you might start seeing less good results. If you have data that has no data (eg 0) then set this in the Ignore Values. You should find that this runs quite fast.

0 Kudos