Different Results from ArcPro and Arc GIS 10.6

HughGraham · ‎05-03-2019

So I'm working on building a custom python toolbox to aid the interpretation of a dataset we've created.

The tool is simple it should extract key summary stats from the raster for a given area(s) defined by a polygon.

I've built the tool using Arc Pro and in this environment everythin works as expected. However, Our clients are still using ArcGIS so the tool must be cross compatible. When moving the tool across to ArcGIS 10.6 everything goes terribly wrong...

My first approach was to convert the raster to a numpy array and extract data from here - all very quick and easy. However, in ArcGIS numpy.nanmean would return a different value to Arc Pro and no matter what I did all calculatations on subsets returned zero. Additionally calculating numpy.std would crash due to memory error (I realise this is due to the the use of python 2 so less of an issue).

So with numpy seemingly out I tried to use arcpy tools along these lines with raster properties and zonal histogram:

stdVal = arcpy.GetRasterProperties_management(raster, property_type="STD").getOutput(0)

ZonalHistogram(shp, "OBJECTID", raster, histtab)

Once again, GetRasterProperties returns different values in ArcGIS than ArcPro

Also the standard deviation property returns zero and all values calculated with zonal histogram are also zero.

So any reasons for this - Does my ArcGIS build have a broken version of numpy - I will try to run the tool on another machine and report back but currently have no access. Any advice would be much appreciated.

Cheers,

Hugh

DanPatterson_Retired · ‎05-03-2019


import numpy as np
np.version.version‍‍‍‍‍‍‍

what is it?

The raster cell size and extent I assume were identical?

Can you report 'yourarray'.shape in both environments (checking for array shapes.

nanmean and its family have been around for quite a while. Can you give an example of the difference? and can you confirm that your array is indeed floating point with arc* rasters nodata set to nan and not something like 0. It would be a good idea to query the array for what was the arc* nodata value to ensure that they were indeed converted to nan during the conversion.

Even

a = np.array([[1, 2, 3.], [4, np.nan, np.nan], [7, 8, 9]]).reshape(3,3)

a.shape
(3, 3)

a.size
9

np.sum(np.isnan(a))
2‍‍‍‍‍‍‍‍‍‍

np.nanmean(a)
4.857142857142857‍‍‍‍‍‍‍‍‍‍‍‍‍

HughGraham · ‎05-03-2019

Thanks Dan, really appreciate the ideas. numpy version is 1.9.3. It looks as though something is going funny with the raster to numpy array function... I belive you may be right with your suggestion about the handling of nan values. I've got some more probing to do though as numpy does appear to function correctly in both versions of python. I'll report back when I get a little further with this.

Cheers,

Hugh

DanPatterson_Retired · ‎05-03-2019

Hugh,

I could find no closed issues on numpy's GitHub that found an error in any of the nan functions, going back as far as I could.

GitHub - numpy/numpy: The fundamental package for scientific computing with Python.

You should really stick to Pro (a client nudge perhaps)… A lot has happened in numpy and Pro uses 1.15.? but is easily upgraded using conda like I have done.

Currently I am using np.version.version # '1.16.3' and scipy has had several version upgrades with improvements and additions.

Sadly, I suspect ArcMap 10.7 will be stuck with python 2.7 and the versions of numpy it supports (2.7 support ends this December). Pro is using 3.6.*

(I upgraded a bit to sys.version # =>: '3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 18:50:55) [MSC v.1915 64 bit (AMD64)]'. 3.7 is stable and 3.8 is a couple of candidate releases along.

I suspect you will have to wait out your clients until they are ready to move on... warn them about the world of Unicode as well, you will have no end of fun

HughGraham · ‎05-03-2019

Thanks for looking into it! I totally agree - really impressed with ArcPro and particularly the Anaconda environment - makes things much easier. Unfortunately I am unlikely to be able to persuade the clients to switch over any time soon. I'll update when I find the issue!

Anonymous User · ‎05-07-2019

Hi Hugh,

I'd like to take a closer look at this and help resolve any potential bugs with the RasterToNumPyArray function. If I'm reading correctly, you are having an issue with the RasterToNumPyArray function in ArcGIS 10.6 which is not happening on Pro (which version of Pro?). Once you create the numpy array using the above function and run numpy.nanmean on this array the results are different in Pro and ArcMAP 10.6, is this correct?.

Could you please share the input raster data and what optional parameters you are specifying while running this function? If you are leaving the optional parameters empty, please specify that too.

RasterToNumPyArray (in_raster, {lower_left_corner}, {ncols}, {nrows}, {nodata_to_value})

Best Regards,
Neeraj

HughGraham · ‎05-09-2019

Hi Neeraj,

thanks for the response. So I think I have narrowed down the problem. I don't think the issue lies entirely in RasterToNumpyArray.

A large part of the issue comes from me comparing results created from Python 3 using the Arc Pro environment (run outside arc) with results from Python 2 run inside Arc 10 (various versions).

It seems that when run outside arc some arguments for various functions are ignored in Arc Pro.

A little more info...

Prior to RasterToNumpy array I have extracted the Raster by mask with:

ebm_ras = arcpy.sa.ExtractByMask(bhi_ras, shp)

Arc 10 handles this well and creates a raster where cells outside the "extract shape" being masked and set to NA.

Arc Pro on the other hand converts all cells outside the "extract shpe" to zero. I got around this by using Map Algebra instead of extract by mask.

This is one cause for the difference in np.nanmean.

But wait there's more...

When I run RasterToNumpyArray with:

ebm_np = arcpy.RasterToNumPyArray(in_raster = ebm_ras)

Arc10 converts the No data values to values of 15... The real raster data consists of cell values from 0-5.

to remedy this the follwowing can be used

ebm_npa = arcpy.RasterToNumPyArray(ebm_ras, nodata_to_value=-100)

and then convert -100 values to np.nan or ignore -100 values in calculation.

Again in Arc Pro Python (outside of ArcPro) RasterToNumpyArray converts all NA values to zero regardless of the nodata_to_value value... Again this coulf be solved by using map algebra, specifying a value (e.g. -100) for NA values then converting to np array and running: array[array == -100] = np.nan

In summary the problem appears not to lie with ArcMap but actually in ArcPro when using Python outside of the GUI.

Cheers,

Hugh

DanPatterson_Retired · ‎05-09-2019

Hugh... integer rasters?

Your last approach is what I use since there is no integer equivalent of float/double to np.nan (which is float64). I suspect that if your raster is indeed integer that is maybe the case to avoid upscaling the raster from integer to float or from unsigned integer to integer. Numpy upcasts some of the lower integer and float data types.

np.iinfo(np.int8)
iinfo(min=-128, max=127, dtype=int8)
np.iinfo(np.uint8)
iinfo(min=0, max=255, dtype=uint8)
np.finfo(np.float16)
np.finfo(resolution=0.001, min=-6.55040e+04, max=6.55040e+04, dtype=float16)‍‍‍‍‍‍

HughGraham · ‎05-09-2019

Hi Dan,

Yes, these are integer rasters so that explaination makes perfect sense - an interesting "Gotcha"! Good to know!

Cheers