Mosaic dataset operator versus cell statistics - What is wrong ??

AlbertoAloe · ‎12-06-2017

Hi guys.

I have the following problem (probably due to my lack of confidence when using the mosaic dataset):

I have a set of 366 rasters (one for each day in 2008) with the following specs:

1000 columns, 950 rows
1 band
Cell size is 5 km
TIF format
32 bit floating point
Coordinate system is ETRS_1989_LAEA

The question is :

Why am I getting different results when calculating mean with spatial analyst cell statistics versus a mosaic dataset loaded with the same set of rasters ? There may be a difference of almost 2%

Cell statistics seems to be the correct one because I'm getting the same values in other ways. Rasters are coming from a big netcdf and I verified the avg values of the mean of the same slices using numpy (as well as other tools like sample and multi dimension ones..)

I'm using ArcGIS 10.5.1 on Windows 8.1 workstation with 16 GB of RAM and I've been very carefull to set up the mosaic dataset with the correct properties(pixel depth of the mosaic is 32 bit and maximum size of requests are properly set). I also created a mosaic dataset based on the original netcdf (9131 temporal steps) getting the same results of the tif based mosaic when making a query definition for 2008 year and settimg operator to mean. Values for the rasters are in range 0 - 10000 more or less

Thanks a lot to whoever is able to give me a hint

Alberto

DanPatterson_Retired · ‎12-07-2017

ahhh! Ok that ruled out one thought I had

AlbertoAloe · ‎12-07-2017

Dan,

I think that if you made r_m.mean() on the temporal axis (axis = 0) and converting back to raster you should get Yr2008TestRasterMean. We know that this works. The problem is the mosaic dataset that is supposed to properly calculate the mean of a pile of rasters as an aggregation on the temporal dimension (or whatever we want to call it).

Thanks again for your help

Alberto

DanPatterson_Retired · ‎12-07-2017

Just finished... I got the same results as you did converting the mean along axis 0.

I will investigate 'where' the differences occur..

DanPatterson_Retired · ‎12-07-2017

Differences... ok... the mosaic mean behaves differently, they get pretty close when you move away from the edges (, but they are 'exact' in their behaviour of calculating the mean. I also can't see where the 'float32' dtype would matter. Someone should try the mosaic data set in pro IF 'float64' is the default...

# ---- top 5x5 block of values

r_mean.data[:5, :5]  # ---- tiffs to array, using numpy, mean on axis= 0
 
array([[ 10.94,   1.05,   1.2 ,   1.14,   0.94],
       [ 16.11,   2.31,   3.41,   2.19,   0.6 ],
       [  1.06,   4.59,   1.22,   1.99,   2.45],
       [  1.17,   5.82,   0.86,   0.56,   2.86],
       [  6.88,   0.93,   2.95,   0.42,   0.45]])

r1m.data[:5, :5]     # ---- Yr2008TestRasterMean.tif to array, masked 
 
array([[ 10.94,   1.05,   1.2 ,   1.14,   0.94],
       [ 16.11,   2.31,   3.41,   2.19,   0.6 ],
       [  1.06,   4.59,   1.22,   1.99,   2.45],
       [  1.17,   5.82,   0.86,   0.56,   2.86],
       [  6.88,   0.93,   2.95,   0.42,   0.45]], dtype=float32)

r0m.data[:5, :5]     # ---- Yr2008TestRasterMosMean.tif the mosaic

array([[ 11.09,   1.05,   1.2 ,   1.13,   0.93],
       [ 16.26,   2.31,   3.4 ,   2.17,   0.61],
       [  1.06,   4.57,   1.2 ,   1.97,   2.44],
       [  1.16,   5.78,   0.87,   0.56,   2.85],
       [  6.84,   0.93,   2.95,   0.42,   0.44]], dtype=float32)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

DanPatterson_Retired · ‎12-08-2017

Cody Benkelman‌ ... thoughts? or who to flag on this issue?

This doesn't apply since the extents are identical http://support.esri.com/en/bugs/nimbus/QlVHLTAwMDEwMTQ2MA==

There are tool many bugs for mosaic dataset, but nothing within the last year (as search on http://support.esri.com/en/)

AlbertoAloe · ‎12-08-2017

Dan,

thanks for taking care of this..

I'm still out of office (the 8th of December is bank holiday in Italy).

On Monday I can make other tests...

Out of the many I already did one was with ArcGIS Pro 2 getting (if I remember correctly) the same issues.

I agree that float32 cannot be an issue. You can try to use dtype 64 when averaging over axis = 0 and you would get irrelevant differences. Also ArcGIS Pro is 64 bit application but I think that if the rasters (and the mosaic dataset) are 32 bit nothing should change except the ability to reference more than 2 GB of memory (but this is not the case given the very small size of the test data).

I'll keep you posted

Thanks again

Alberto

AlbertoAloe · ‎12-12-2017

Dan,

still I cannot figure out what is going on....

I tried to check mosaic dataset properties in order to find the reason for observed difference among mean calculated with mosaic operator and cell statistics on source rasters

I'am attaching a python script with the geoprocessing chain I run in ArcGIS 10.5.1 for mosaic creation and setting. It can be synthesized as follows:

Create a mosaic dataset inside a file geodatabase
Load the mosaic pointing to the 366 rasters I sent you before (previous attachment)
Set the mosaic properties through gp function (mosaic method to none, operator to mean and other stuff.....)

Then, I input mosaic dataset to "copy raster" (or alternatively I can make a mosaic layer keeping the default properties) which should output a raster that SHOULD BE IDENTICAL to the one created with cell statistics. As we have seen this is not the case.

A note on ArcGIS Pro: I get the same exact raster I get in ArcGIS 10.5.1 when running copy raster against the mosaic dataset.

Another detail: I'm attaching the output from copy raster Yr2008TestRasterMosMeanV2.tif (based on these settings) that is slightly different from the one I sent you before (Yr2008TestRasterMosMean.tif). I found out that the reason for this is the mosaic method :

When set to None you would get Yr2008TestRasterMosMeanV2.tif. In this case the sorting of the rasters are based on the ObjectID
When set to "By Attribute" with StdTime field and base value '2008/01/01' you would get a raster that is identical to Yr2008TestRasterMosMean.tif

Given that:

For the sake of calculating our mean the sorting is irrelevant
I purposely scrambled ObjectID and StdTime in a way that sorting by ObjectID is different from sorting by StdTime

....there must be something wrong in the mosaic dataset

Alberto

P.S. I gotta solve this otherwise I cannot sleep !!

DanPatterson_Retired · ‎12-12-2017

I would report this since we both seem to replicate the same results from the standalone images using cellstatistics and numpy. don't know what is going on in the mosaic data set, but it has to be some 'setting' or another

AlbertoAloe · ‎12-12-2017

Thanks Dan.

Have a nice day

Alberto

DanPatterson_Retired · ‎01-21-2018

Any news on this issue Alberto?