Hi guys.
I have the following problem (probably due to my lack of confidence when using the mosaic dataset):
I have a set of 366 rasters (one for each day in 2008) with the following specs:
The question is :
Why am I getting different results when calculating mean with spatial analyst cell statistics versus a mosaic dataset loaded with the same set of rasters ? There may be a difference of almost 2%
Cell statistics seems to be the correct one because I'm getting the same values in other ways. Rasters are coming from a big netcdf and I verified the avg values of the mean of the same slices using numpy (as well as other tools like sample and multi dimension ones..)
I'm using ArcGIS 10.5.1 on Windows 8.1 workstation with 16 GB of RAM and I've been very carefull to set up the mosaic dataset with the correct properties(pixel depth of the mosaic is 32 bit and maximum size of requests are properly set). I also created a mosaic dataset based on the original netcdf (9131 temporal steps) getting the same results of the tif based mosaic when making a query definition for 2008 year and settimg operator to mean. Values for the rasters are in range 0 - 10000 more or less
Thanks a lot to whoever is able to give me a hint
Alberto
Solved! Go to Solution.
Dan,
after contacting ESRI support they recognized it as a bug for both ArcGIS and ArcGIS Pro.
BUG-000113559 : Mosaic operator ‘Mean’ produces the incorrect results in ArcMap
BUG-000113661 : Mosaic operator ‘Mean’ produces the incorrect results in ArcGIS Pro
It should be fixed in following releases of the software
Ciao
Alberto
how do the various approaches handle nodata areas and masks? Cell statistics allows you to explicitly set that behaviour when you run it and in the Environments section (bottom of the help), it enables you to specify how it handles a mask. You will have to verify you methods in light of these behaviours.
Dan,
thanks for your reply.
Actually I took into account that when running cell statistics. Also when getting the mean from the mosaic I've been also carefull to mosaic properties that can prevent the use of all the rasters.
Basically the mosaic is a cube of 366 overlapping rasters. I just want the mosaic to return the mean of the 366 rasters......it works but when I make a percent variation out of cell stats versus mosaic layer I have an average of 3% and a maximum of 10%.
I'm attaching a simple test scenario with the 366 rasters (File Year2008TestRaster.zip) as well as what I get from running mean cell statistics (file Yr2008TestRasterMean.zip) and what I get from the mosaic (pointing to the same set of 366 rasters) with the operator mean (File Yr2008TestRasterMosMean.zip).
If you (or somebody else..) have few minutes to tell me how to get the same values from the mosaic.......
There must be something I misunderstood about mosaic datasets
Thanks a lot
Alberto
Ok... lets have a look
in_ras0 = r"C:\Temp\.......\Yr2008TestRasterMosMean.tif" # ---- mosaic mean
in_ras1 = r"C:\Temp\.......\Yr2008TestRasterMean.tif" # ---- plain gnarly
r0 = arcpy.RasterToNumPyArray(in_ras0) # ---- convert both to rasters
r1 = arcpy.RasterToNumPyArray(in_ras1) # ---- nodata wasn't specified so I had
r0m = np.ma.masked_where(r0<0.0, r0) # to track it down and produce a 'mask'
r1m = np.ma.masked_where(r1<0.0, r1) # for both
r0m.shape, r1m.shape # ---- both the same shape
((60, 120), (60, 120))
r0m.size, r1m.size # ---- total number of cells
(7200, 7200)
r0m.count(), r1m.count() # ----- total number with values
(6274, 6274)
r0m.mean() # ----- overall mean for r0m
29.678883088938477
r1m.mean() # ----- ditto for r1m
30.755663253108064
r0m.max(), r1m.max() # ----- hmmmmm
(1615.9902, 1670.0612)
r0m.min(), r1m.min() # ------ hmmmmm
(0.00056408811, 0.00051229505)
so the shapes, size and count of the two arrays/rasters are the same interms of the values given.
Which suggests that if the results were the result of processing then something is 'off' in the stack or the processing is different in the two methods.
The float32 dtype doesn't break anything.
I will let you ponder and suggest anything else for me to look at
Dan,
Year2008TestRasterMean.tif (output from cell statistics) is OK because it's exactly what I get using other methods (Aggregation of the numpy array from the original netcdf, using sample in sa...).
The problem is in the mosaic dataset. It just references the same rasters....I cannot get it !!
Basically I set the mosaic as follows:
After I make a mosaic layer (dragging and dropping in ArcMap or using the gp tool) double checking that the default mosaic operator still is 'mean'. I run copy raster....and I get what you saw.
If you had time to try something I would ask you to make a mosaic dataset out of the 366 rasters and see if you can get the mosaic mean that matches the one from cell stats (or one that matches mine).
Thanks again
Alberto
P.S. If somebody else would be able to confirm my output from the mosaic dataset I would start to think that there is something weird in the mosaic algorithm
So my numpy r1m (the masked array version of r2008TestRasterMean.tif was ok?
And you can confirm that they all share.
Dan,
I'm out of office... I can only see at your figures but it completely makes sense for me. Your r0m.max() and r01m.max() are exactly the values at the location with max discharge (po river outlet)
In my case I did a per cell percent variation calculating a raster like this:
perc var = abs(Yr2008TestRasterMosMean - Yr2008TestRasterMean) / Yr2008TestRasterMean
Statistics on percvar are a mean of 3% and a max of 10%. You see at cell with max it should be 1670 but I get 1615. Why ?
Why Yr2008TestRasterMosMean is different from Yr2008TestRasterMean (which I know is correct ) ? What have I done wrong when obtaining the mean mosaic layer ?
I do not need the mosaic dataset to do these things but it's a convenient way to expose netcdf and multidimensional stuff to people that do not know how to use numpy, netcdf4 and so on...Besides, ESRI is advertizing it this way.
Thanks Dan
Alberto
Maybe someone else will weigh in..
my only assumption was that 366 rasters needed to be included and that they shared all cell commonly.
The only thing else to do would be to examine each raster for nodata consistency.... in numpy I trust,
I made a (366, y, x) array from your individual tiffs.
First question 366? just to be sure that each is a day... I have no intention of validating the inputs
Here is what I got
folder = r"C:\Temp\ ...... "
arcpy.env.workspace = folder # ---- set the workspace
rasters = arcpy.ListRasters("*", 'tif') # ---- read all the tiff's... hope they belong
a_s = []
for raster in rasters:
a_s.append(arcpy.RasterToNumPyArray(raster)) # ---- convert to arrays and store
r = np.asarray(a_s)
r_m = np.ma.masked_where((r<0.0) | (r>10000), r) # ---- masked using high/low guess
r_m.min() # 0.0
r_m.max() # 6128.4219
r_m.mean() # 30.755659143207026
# ---- there appeared to be some that didn't have a nodata set or it was in error
# hence my dual masking
Advise....
Dan,
2008 was a leap year...366 are days.
Alberto