Mosaic dataset operator versus cell statistics - What is wrong ??

AlbertoAloe · ‎12-06-2017

Hi guys.

I have the following problem (probably due to my lack of confidence when using the mosaic dataset):

I have a set of 366 rasters (one for each day in 2008) with the following specs:

1000 columns, 950 rows
1 band
Cell size is 5 km
TIF format
32 bit floating point
Coordinate system is ETRS_1989_LAEA

The question is :

Why am I getting different results when calculating mean with spatial analyst cell statistics versus a mosaic dataset loaded with the same set of rasters ? There may be a difference of almost 2%

Cell statistics seems to be the correct one because I'm getting the same values in other ways. Rasters are coming from a big netcdf and I verified the avg values of the mean of the same slices using numpy (as well as other tools like sample and multi dimension ones..)

I'm using ArcGIS 10.5.1 on Windows 8.1 workstation with 16 GB of RAM and I've been very carefull to set up the mosaic dataset with the correct properties(pixel depth of the mosaic is 32 bit and maximum size of requests are properly set). I also created a mosaic dataset based on the original netcdf (9131 temporal steps) getting the same results of the tif based mosaic when making a query definition for 2008 year and settimg operator to mean. Values for the rasters are in range 0 - 10000 more or less

Thanks a lot to whoever is able to give me a hint

Alberto

AlbertoAloe · ‎05-03-2018

Dan,

after contacting ESRI support they recognized it as a bug for both ArcGIS and ArcGIS Pro.

BUG-000113559 : Mosaic operator ‘Mean’ produces the incorrect results in ArcMap
BUG-000113661 : Mosaic operator ‘Mean’ produces the incorrect results in ArcGIS Pro

It should be fixed in following releases of the software

Ciao

Alberto

View solution in original post

DanPatterson_Retired · ‎12-06-2017

how do the various approaches handle nodata areas and masks? Cell statistics allows you to explicitly set that behaviour when you run it and in the Environments section (bottom of the help), it enables you to specify how it handles a mask. You will have to verify you methods in light of these behaviours.

AlbertoAloe · ‎12-07-2017

Dan,

thanks for your reply.

Actually I took into account that when running cell statistics. Also when getting the mean from the mosaic I've been also carefull to mosaic properties that can prevent the use of all the rasters.

Basically the mosaic is a cube of 366 overlapping rasters. I just want the mosaic to return the mean of the 366 rasters......it works but when I make a percent variation out of cell stats versus mosaic layer I have an average of 3% and a maximum of 10%.

I'm attaching a simple test scenario with the 366 rasters (File Year2008TestRaster.zip) as well as what I get from running mean cell statistics (file Yr2008TestRasterMean.zip) and what I get from the mosaic (pointing to the same set of 366 rasters) with the operator mean (File Yr2008TestRasterMosMean.zip).

If you (or somebody else..) have few minutes to tell me how to get the same values from the mosaic.......

There must be something I misunderstood about mosaic datasets

Thanks a lot

Alberto

DanPatterson_Retired · ‎12-07-2017

Ok... lets have a look

in_ras0 = r"C:\Temp\.......\Yr2008TestRasterMosMean.tif"  # ---- mosaic mean
in_ras1 = r"C:\Temp\.......\Yr2008TestRasterMean.tif"     # ---- plain gnarly
r0 = arcpy.RasterToNumPyArray(in_ras0)  # ---- convert both to rasters
r1 = arcpy.RasterToNumPyArray(in_ras1)  # ---- nodata wasn't specified so I had 
r0m = np.ma.masked_where(r0<0.0, r0)    # to track it down and produce a 'mask'
r1m = np.ma.masked_where(r1<0.0, r1)    # for both

r0m.shape, r1m.shape      # ---- both the same shape
((60, 120), (60, 120))

r0m.size, r1m.size        # ---- total number of cells
(7200, 7200)

r0m.count(), r1m.count()  # ----- total number with values
(6274, 6274)

r0m.mean()                # ----- overall mean for r0m
29.678883088938477

r1m.mean()                # ----- ditto for r1m
30.755663253108064

r0m.max(), r1m.max()      # ----- hmmmmm
(1615.9902, 1670.0612)

r0m.min(), r1m.min()      # ------ hmmmmm
(0.00056408811, 0.00051229505)
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

so the shapes, size and count of the two arrays/rasters are the same interms of the values given.

Which suggests that if the results were the result of processing then something is 'off' in the stack or the processing is different in the two methods.

The float32 dtype doesn't break anything.

I will let you ponder and suggest anything else for me to look at

AlbertoAloe · ‎12-07-2017

Dan,

Year2008TestRasterMean.tif (output from cell statistics) is OK because it's exactly what I get using other methods (Aggregation of the numpy array from the original netcdf, using sample in sa...).

The problem is in the mosaic dataset. It just references the same rasters....I cannot get it !!

Basically I set the mosaic as follows:

Mosaic dataset in a file geodatabase with ETRS_89_LAEA projection
"Add rasters", pointing to the folder where I have the 366 rasters I gave you , without building overviews
I then set the mosaic operator to Mean
I set "MaximumNumber of Rasters per Mosaic" = 400 to include them all. This property if not set correctly would have an impact on the mosaic operator limiting the number of rasters used

After I make a mosaic layer (dragging and dropping in ArcMap or using the gp tool) double checking that the default mosaic operator still is 'mean'. I run copy raster....and I get what you saw.

If you had time to try something I would ask you to make a mosaic dataset out of the 366 rasters and see if you can get the mosaic mean that matches the one from cell stats (or one that matches mine).

Thanks again

Alberto

P.S. If somebody else would be able to confirm my output from the mosaic dataset I would start to think that there is something weird in the mosaic algorithm

DanPatterson_Retired · ‎12-07-2017

So my numpy r1m (the masked array version of r2008TestRasterMean.tif was ok?

And you can confirm that they all share.

AlbertoAloe · ‎12-07-2017

Dan,

I'm out of office... I can only see at your figures but it completely makes sense for me. Your r0m.max() and r01m.max() are exactly the values at the location with max discharge (po river outlet)

In my case I did a per cell percent variation calculating a raster like this:

perc var = abs(Yr2008TestRasterMosMean - Yr2008TestRasterMean) / Yr2008TestRasterMean

Statistics on percvar are a mean of 3% and a max of 10%. You see at cell with max it should be 1670 but I get 1615. Why ?

Why Yr2008TestRasterMosMean is different from Yr2008TestRasterMean (which I know is correct ) ? What have I done wrong when obtaining the mean mosaic layer ?

I do not need the mosaic dataset to do these things but it's a convenient way to expose netcdf and multidimensional stuff to people that do not know how to use numpy, netcdf4 and so on...Besides, ESRI is advertizing it this way.

Thanks Dan

Alberto

DanPatterson_Retired · ‎12-07-2017

Maybe someone else will weigh in..

my only assumption was that 366 rasters needed to be included and that they shared all cell commonly.

The only thing else to do would be to examine each raster for nodata consistency.... in numpy I trust,

DanPatterson_Retired · ‎12-07-2017

I made a (366, y, x) array from your individual tiffs.

First question 366? just to be sure that each is a day... I have no intention of validating the inputs

Here is what I got

folder = r"C:\Temp\ ...... "
arcpy.env.workspace = folder             # ---- set the workspace
rasters = arcpy.ListRasters("*", 'tif')  # ---- read all the tiff's... hope they belong
a_s = []

for raster in rasters:
    a_s.append(arcpy.RasterToNumPyArray(raster))  # ---- convert to arrays and store

r = np.asarray(a_s)
r_m = np.ma.masked_where((r<0.0) | (r>10000), r)  # ---- masked using high/low guess

r_m.min()   #  0.0
r_m.max()   #  6128.4219
r_m.mean()  #  30.755659143207026

# ---- there appeared to be some that didn't have a nodata set or it was in error
# hence my dual masking‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Advise....

AlbertoAloe · ‎12-07-2017

Dan,

2008 was a leap year...366 are days.

Alberto