Hey everyone, I've been having some issues using the cell statistics tool in a python script recently. The issue is occurring in a stand alone script. I am trying to calculate standard deviation across 30 tif's representing annual cumulative rainfall for the globe.
While most of the values in the resulting raster (also a tif) are correct, there are large pockets of data that have extremely high standard deviation values (around 2000 - 3000 when they should be around 150).
When I run the same analysis in ArcGIS using the cell statistics geoprocessing tool the output is correct and I do not see these irregular values.
Does anyone have any idea why this would be happening? I assumed at first it was something wrong with my script but how could only some of the values be incorrect? There are also no no data values in these issue areas that could be causing the miscalculation although there is a no data value set in all of the input rasters.
I am providing statistics on the two raster in case it is helpful to anyone.
script output:
min = 0
max = 6697.9404296875
mean = 75.83343676729601
std. dev. = 333.72000491046
arcmap output:
min = 0
max = 1507.4833984375
mean = 35.804817248754
std. dev. = 84.35107381816201
Solved! Go to Solution.
I converted all of my 16bit integer tifs to grids with integer values. The analysis ran correctly and gave me results identical to using 32 bit tifs as input.
are there nodata cells and what are their values and what option did you use to account for them in the cell statistics options
Cell Statistics—Help | ArcGIS for Desktop even if you are coding, you will need to ensure that the cells align and are of the same size and the same nodata value in order to compute properly
Dan, I'm sorry. I think you replied to a previous post that I had deleted. I wanted to delete it to change my answer after looking at the data again.
There are nodata values in the input rasters. They are being reported as -3.40282346639e+038 in the Layer Properties window. I am using the "DATA" option in my script.
I do believe that the rasters align. I checked and they do all have the same extent.
If you have manually executed the tool, you can copy the python snippet from the Results window. This will give you the code you can use to perform the exact same operation from Python (assuming the you don't have any strange setting in you geoprocessing environment). If you execute that python command, does that yield the same output and the manually executed tool?
If so, could you post your code that you used to generate the result with Python?
I manually executed the tool and then copied the code snipped from the results window. I ran the provided python code in both the python environment within ArcMap and as a standalone script in PyScripter. Both of these resulted in the same raster that was created when I manually executed the tool. All of the values in these rasters appeared to be correct. Do you want to see the code provided by the manual execution of the tool or the python code that I wrote that is giving me incorrect results?
I figured out what the issue was...definitely a mistake on my end.
The input rasters were 32 bit but not floating point. I reran the analysis using a copy of the data that was floating point and everything worked fine. Sorry for the confusion.
I wondered what was going on... I hope you saw my answer about the raster type and no data... I though I was losing my mind... well... more than I have already
Fortunately your reply that got deleted was sent to my email so I was able to read it. That's why I checked the input data again to make sure it was floating point. I forgot that you could have 32 bit integer rasters.
The particular analysis I am performing is using 30 rasters representing annual cumulative rainfall for the globe. I was asked to calculate coefficient of variation (possibly a simplified version) by dividing the standard deviation of the dataset by the mean of the dataset ((std / mean) * 100).
Is there anyway that I could use integer values as input for such a calculation? It would be nice to save some space on disk by saving the input at a lower bit depth. The values only range from 0 mm - around 12000 mm.
Integers are fine... since they are within the realm of interval/ratio data. There is no need to have a decimal point to perform "method of moments" calculations, your data are simply scaled to mm instead of meters or cm and the spacing between increments on the number scale are equal and have real/physical meaning. The only thing you can't do the maths on is nominal or ordinal data (ie classes or ranks)
I use numpy arrays to consolidate space and speed up raster processing, You can save large arrays in binary format and reload them much like you can with other raster data. There is a recent post (linkless at the moment) dealing with raster processing for climate data whcih gives some basics. If you are familiar with python and numpy then there is a small group dealing with arrays, data and python NumPy Repository and other esoterica in data processing
Thanks for the link to the NumPy repository. I'll definitely take a look at that.
It seems like the integer input is what was causing the extremely high (2500-ish instead of 150-ish) standard deviation values though? Or am I misunderstanding the issue. Sorry for all the questions. I'm just trying to understand what happened so I don't run into a similar issue again.