reclassify using Numpy, out of memory

18833
37
Jump to solution
05-19-2016 03:41 PM
RebeccaStrauch__GISP
MVP Emeritus

EDIT: although I marked Dan's answer re: making the array an integer array, which was the last piece I needed, I created a summary at the end https://community.esri.com/message/610325#comment-610325  to put all the pieces that worked together.  I suggest starting there if running into this problem.

Bottom line: trying to get reclass elevation raster (in m), that include “water” and NoData, and Denali foothills, to 3-4 class (based on feet---keeping the converted layer, not  in memory), then convert to polygons.

For the study are I used for development (and one other), the raster Reclassify and arcpy.RasterToPolygon_conversion this worked relatively fast. (max few minutes)

I’m now running it on an area about 2x larger (~25m cells, 6657 col 13035 row), with a wider range of input values.  When using my script/tool or testing in the python window, I run out of memory after about 4 hours.

So I’m diving into Numpy and https://community.esri.com/people/Dan_Patterson ‘s blog https://community.esri.com/blogs/dan_patterson/2016/03/12/reclassify-raster-data-simply

After a clean reboot, I’m trying my test script in the python window, but I’m still getting an out of error message.  I’m not sure if I have something in error in my script, or it is just too large for the 32bit-python to handle.  So, I’m open to suggestions…..maybe processing half at a time??  I have tried but a full raster with no nodata, and one that has nodata outside the side area boundary.

Test script I am using (parts borrowed from Dan's demo scripts)

import time
import os
import arcpy
from arcpy import env
from arcpy.sa import *
import numpy as np
from textwrap import dedent

arcpy.CheckOutExtension("spatial")

def arr_reclass(a, bins=[], new_bins=[], mask=False, mask_val=None):
  """a    - integer or floating point array to be reclassed using
     bins - sequential list/array of the lower limits of each class
        include one value higher to cover the upper range.
     mask - whether the raster contains nodata values or values to
        be masked with mask_val
     array dimensions will be squeezed 
  """ 
  a_rc = np.zeros_like(a)
  if (len(bins) < 2): # or (len(new_bins <2)):
    print("Bins = {} new = {} won't work".format(bins,new_bins))
    return a
  if len(new_bins) < 2:
    new_bins = np.arange(1,len(bins)+2)   
  new_classes = zip(bins[:-1],bins[1:],new_bins)
  for rc in new_classes:
    q1 = (a >= rc[0])
    q2 = (a < rc[1])
    z = np.where(q1 & q2, rc[2],0)
    a_rc = a_rc + z
  return a_rc

theWorkspace = r'C:\___bear2016\_Skwentna2017\Skwentna_prep.gdb'
arcpy.env.workspace = theWorkspace
arcpy.env.overwriteOutput = True
wsPath, wsGDB = os.path.split(theWorkspace)
print("Project gdb (vector, i.e. for points, lines, polygons): {0}".format(theWorkspace))

inStudy = r'C:\___bear2016\_Skwentna2017\Skwentna_prep.gdb\Study16b'
stdyPath, stdyName = os.path.split(inStudy)
print("Study boundary: {0}".format(stdyName))
spatialref = arcpy.Describe(inStudy).spatialReference

rasterIn = r'C:\___bear2016\_Skwentna2017\Skwentna_raster.gdb\elevExtent'
#rasterIn = r'C:\___bear2016\_Skwentna2017\Skwentna_raster.gdb\elevClip'
inRas = arcpy.Raster(rasterIn)

lowerLeft = arcpy.Point(inRas.extent.XMin, inRas.extent.YMin)
cellSize = inRas.meanCellWidth

rasterGDB, rasterFile = os.path.split(rasterIn)
elevUnits = "meters"
elevCutoffFt = "6000"
flatElevBreak = "2000"
elevBreaks = flatElevBreak.split("; ")
elevBreaks.sort(key=float)  
flatElevBreak = ";".join(elevBreaks)
flatElevBreak
'2000'
hydroBuff = r'C:\___bear2016\_Skwentna2017\Skwentna_prep.gdb\hydroBuff250'
finalTransZones = arcpy.os.path.join(theWorkspace, "TransAreas")
outRiparian = arcpy.os.path.join(theWorkspace, "TransRiparian")
pElevClass = arcpy.os.path.join(theWorkspace, "elevReclassPoly")  
outDEMtmp = arcpy.os.path.join(theWorkspace, "elevReclass")  

if elevUnits == "meters":   #  3.280839895013123
  toElevFeet = 3.28084
else:      # note, IF had to go from feet to meters, factor would be 0.3048
  toElevFeet = 1
elevFeet = (inRas * toElevFeet)
saveElevFeet = arcpy.os.path.join(rasterGDB, "elevFt")
elevFeet.save(saveElevFeet)
elevMin = elevFeet.minimum  - 1
elevMax = elevFeet.maximum  + 1
noDataReclass = 99999

elevSplit = flatElevBreak.split(";")
lstCnt = 1
#reclassArg = []
bins = []
new_bins = []
for reclassValue in elevSplit:
  if lstCnt == 1:
    #reclassArg.append([int(elevMin), int(reclassValue), int(reclassValue)])
    bins.append(int(elevMin))
    bins.append(int(-1))
    bins.append(int(reclassValue))
    new_bins.append(int(-1))
    new_bins.append(int(reclassValue))
  else:
    #reclassArg.append([int(prevValue), int(reclassValue), int(reclassValue)])
    bins.append(int(reclassValue))    
    new_bins.append(int(0))    
    new_bins.append(int(reclassValue))    
  lstCnt += 1
  prevValue = reclassValue
#reclassArg.append([int(prevValue), int(elevCutoffFt), int(elevCutoffFt)])
#reclassArg.append([int(elevCutoffFt), int(elevMax), 9])

bins.append(int(elevCutoffFt))
bins.append(int(elevMax))
new_bins.append(int(elevCutoffFt))
new_bins.append(9)  
new_classes = zip(bins[:-1],bins[1:],new_bins)

print("   ...Old Ranges:\n   {0}".format(bins))
print("   ...New values:\n   {0}".format(new_bins))
print("   ...New classes:\n   {0}".format(new_classes))


arr = arcpy.RasterToNumPyArray(inRas, nodata_to_value=9999999)
mask = False
mask_val = None
print(arr)
arr_rc = arr_reclass(arr, bins=bins, new_bins=new_bins)

The error I am getting is on the last line, however, looking at the resources, there is plenty of RAM available, so I'm assuming it is just the Arcmap/Python limit.

Output from script (in ArcMap python window)

Project gdb (vector, i.e. for points, lines, polygons): C:\___bear2016\_Skwentna2017\Skwentna_prep.gdb

Study boundary: Study16b

...Old Ranges:

   [-107504, -1, 2000, 6000, 20036]

   ...New values:

   [-1, 2000, 6000, 9]

   ...New classes:

   [(-107504, -1, -1), (-1, 2000, 2000), (2000, 6000, 6000), (6000, 20036, 9)]

[[  205.16717529   205.25009155   205.29598999 ...,  1037.9387207

   1037.86242676  1037.83959961]

[  205.22860718   205.3127594    205.37818909 ...,  1037.93701172

   1037.84533691  1037.76965332]

[  205.29447937   205.38237      205.46986389 ...,  1037.91918945

   1037.75964355  1037.76477051]

...,

[ 1481.97961426  1492.3326416   1504.91308594 ...,   510.62741089

    504.97958374   501.19992065]

[ 1469.56079102  1486.19445801  1505.37487793 ...,   506.6071167

    501.30090332   497.77194214]

[ 1453.34509277  1471.94152832  1493.07250977 ...,   501.66564941

    497.09884644   494.20864868]]

Runtime error

Traceback (most recent call last):

  File "<string>", line 115, in <module>

  File "<string>", line 30, in arr_reclass

MemoryError

>>>

If I need to split it due to size, some guidance for that would be good.  The one nice thing about this numpy process, I get the error immediately instead of 4-hours in....

Thanks for any help!

ps - Dan....is this question seen by more than the 13 members/followers?  Nice "place" but want to make sure others have a chance to find/reply

tagging a few more areas  Pythonpython snippetsImagery and Remote Sensing

37 Replies
RebeccaStrauch__GISP
MVP Emeritus

The splitting of the array has solved the memory issues I believe.  FYI - for others that may see this, to make it a little more generic/robust for different size raster, I'm splitting it into 3 raster, based on  len(arr)

...
arr = np.load(r"C:\___bear2016\_Skwentna2017\outras.npy")
# count number of elements so it can be split
arrCnt = len(arr)
arrSplit0 = int(arrCnt/3)
arrSplit1 = (int(arrCnt/3) * 2)
# now reclass in slices and stack back  
arr_rc0 = arr_reclass(arr[:arrSplit0,:], bins=bins, new_bins=new_bins)  
arr_rc1 = arr_reclass(arr[arrSplit0:arrSplit1,:], bins=bins, new_bins=new_bins)  
arr_rc2 = arr_reclass(arr[arrSplit1:,:], bins=bins, new_bins=new_bins)  
arr_stack = np.vstack((arr_rc0,arr_rc1, arr_rc2))  
arr = arcpy.NumPyArrayToRaster(arr_stack) #, value_to_nodata=noDataReclass)

I am having an issue yet trying to save it back to the raster, but Dan, unless you or someone can see quickly what the issue is, I'll keep searching/testing an post another thread if necessary.  I'm not sure if it is a numpy issue or the arcpy issues on

...closing...will continure editing...

EDIT: last response was getting to slow, so closed and editing to complete response....

error message I'm getting

Runtime error

Traceback (most recent call last):

  File "<string>", line 127, in <module>

  File "c:\program files (x86)\arcgis\desktop10.3\arcpy\arcpy\__init__.py", line 2292, in NumPyArrayToRaster

    return _NumPyArrayToRaster(*args, **kwargs)

TypeError: Cannot create raster for numpy array.

I have tried a few varieties...but now I may have to look at the no data.  I had set a nodata value, which should have handled it....but, need to look again..

DanPatterson_Retired
MVP Emeritus

quickly offhand, don't use arr as the variable name for output since it 'exists' try arr_out ... I will  check when on an arc machine

BACK

I just assembled a test using 3 arrays (floating point) and stacked them

Dan demos in python 3.4
Date...
>>> a_rc
array([[ 7.,  6.,  8., ...,  8.,  7.,  7.],
       [ 7.,  7.,  7., ...,  8.,  7.,  8.],
       [ 7.,  7.,  8., ...,  8.,  7.,  7.],
       ...,
       [ 8.,  8.,  7., ...,  8.,  4.,  6.],
       [ 8.,  7.,  7., ...,  7.,  7.,  8.],
       [ 2.,  1.,  8., ...,  8.,  3.,  8.]])
>>> t = np.vstack((a_rc,a_rc,a_rc))
>>> t
array([[ 7.,  6.,  8., ...,  8.,  7.,  7.],
       [ 7.,  7.,  7., ...,  8.,  7.,  8.],
       [ 7.,  7.,  8., ...,  8.,  7.,  7.],
       ...,
       [ 8.,  8.,  7., ...,  8.,  4.,  6.],
       [ 8.,  7.,  7., ...,  7.,  7.,  8.],
       [ 2.,  1.,  8., ...,  8.,  3.,  8.]])
>>>
>>> import arcpy
>>> tt = arcpy.NumPyArrayToRaster(t)
>>> tt.save("f:/test/tt")
>>> t.ndim
2
>>> t.shape
(21000, 14000)
>>>

The results

array_vstack.png

Do note the size of the input arrays and the output raster.  I am just praying that the inability to save was due to the naming convention and that you don't have to save each raster, then

Mosaic To New Raster—Help | ArcGIS for Desktop .  If it is a memory issue you will have to label the LL corner of each array, save it so that they can be merged inside of arcmap into a final array.

Rebecca... time to move to python 3.4 (at least)

RebeccaStrauch__GISP
MVP Emeritus

Dan, I don't think the "arr" name itself the issue (since that made no difference), but I'm using a different array names just in case.

What I am doing now is redefining each array as  "= []" as soon as I don't need it anymore, which seems to free up the space.  From a number of other discussions I've read thru, doing a "del array_variable" only deletes the variable and does not free up the memory.  (just an fyi for anyone reading  this).  So that help.

The next issues (not discovered or resolved in this order) were that I had to specify the lowerleft and the cell-size that I saved at the very beginning of my script from the original raster....I also had to use DefineProjection since that isn't an option in the NumPyArrayToRaster  (snippet of these lines below)

np.save(r"C:\___bear2016\_Skwentna2017\outras.npy", arr)   #save in binary format
arr = []
arrLoad = np.load(r"C:\___bear2016\_Skwentna2017\outras.npy")
# count number of elements so it can be split
arrCnt = len(arrLoad)
arrSplit0 = int(arrCnt/3)
arrSplit1 = (int(arrCnt/3) * 2)
# now reclass in slices and stack back  
arr_rc0 = arr_reclass(arrLoad[:arrSplit0,:], bins=bins, new_bins=new_bins)  
arr_rc1 = arr_reclass(arrLoad[arrSplit0:arrSplit1,:], bins=bins, new_bins=new_bins)  
arr_rc2 = arr_reclass(arrLoad[arrSplit1:,:], bins=bins, new_bins=new_bins)  
arr_stack = np.vstack((arr_rc0,arr_rc1, arr_rc2))  
print(arr_stack.ndim)
print(arr_stack.shape)
arr_rc0 = []
arr_rc1 = []
arr_rc2 = []
print(lowerLeft)
newRas = arcpy.NumPyArrayToRaster(arr_stack, lowerLeft, cellSize)  
newRas.save(outDEMtmp)  
newRas = []
arcpy.DefineProjection_management(outDEMtmp, spatialref)
arcpy.RasterToPolygon_conversion(outDEMtmp, pElevClass, "SIMPLIFY", "Value")

So, now the issue is that the RasterToPolygon does not see the outDEMtmp as a valid raster,

even though it appears to be fine when displaying and/or looking at the properties.

the error in the python window (which I tried first)

>>> arcpy.RasterToPolygon_conversion(outDEMtmp, pElevClass, "SIMPLIFY", "Value")

Runtime error  Traceback (most recent call last):   File "<string>", line 1, in <module>   File "c:\program files (x86)\arcgis\desktop10.3\arcpy\arcpy\conversion.py", line 337, in RasterToPolygon     raise e ExecuteError: ERROR 000864: Input raster: The input is not within the defined domain. ERROR 000367: Invalid GP data type 

>>>

Neither of the error seems to be relevant. A restart didn't help....so on to researching this.     ...maybe I'll copy to a new raster, if that works.  Any chance the NumPyToRaster doesn't work quite right??

0 Kudos
DanPatterson_Retired
MVP Emeritus

del delete the reference and cleans it up... if it is only being used by python and not any other software maybe like arcmap or a geodatabase being used in arcmap if the source or destination is one...

No...I ran tests today and got a valid rasters and huge ones as you can see.  The issue isn't with numpy, NumPyToRaster produces a valid raster.  The only thing you did different is that you defined a coordinate system... don't, just convert it to a polygon, save your work, then see if you can define the projection.  If you are doing this in a geodatabase... move it outside of a geodatabase...make an esri grid in a folder, make a shapefile in a folder, I want to rule out this whole geodatabase thing and its clutches and clucking on everything that is done.  You can always copy stuff back into one, I just want to work with native operating stuff first.

Now... as for the results. Are you planning to mix the results with other data? if so, then you will need the projected featureclass/shapefile, it is not needed for the raster.

keep me posted.

0 Kudos
DanPatterson_Retired
MVP Emeritus

ps .... setting an array to an empty list won't free up the memory

0 Kudos
RebeccaStrauch__GISP
MVP Emeritus

Not sure why then...but doing that did solve the memory issues....and yes I did isolate this as a solution (been testing all afternoon).

0 Kudos
DanPatterson_Retired
MVP Emeritus

The only reason that I warn, is that python and numpy memory mangagement and garbage collection are related but not identical... plain old del works.  Once the memory is allocated...and I emphasis...it can be deleted (ie its allocation freed) once there are no references to it... which is why I work outside Arc* as long as possible if memory, locks and other fluff are a possibility.

0 Kudos
RebeccaStrauch__GISP
MVP Emeritus

Yes, I will need this to line up with my other data as this is one layer that going into a union, and a bunch of other processing.  I defined the projection and did not project it since the lowerleft and cellsize should have actually put it in the correct place.  It does line up, but did not have a projection defined before I did it.

I've found a few discussions and KB on the "raster not within domain" and I'm working on trying to get some of those to work. I even exported the raster to an .img with no luck.

Some of the threads i'm looking at:

There are others that I am still looking at.  My output values are integer values 0, 9, 2000, 6000 so I'm not sure how to apply the raster calculation suggestion.

Dan, are you still testing with Pro and python 3.4?  Again, I'm trying to stick to Desktop and standard install....government budgets and all.  Although I may have all the bells and whistles available to me, those I give this to may not.

0 Kudos
RebeccaStrauch__GISP
MVP Emeritus

RasterToPolygon gives the same error, even if I do not define the projection first.

Dan, I know this all works with your generated random arrays....have you tried doing it with a raster as the source (projected, not geographic)?  I'm wondering it my RasterToNumPyArray is the issue....maybe I need to do something different on that end.

0 Kudos
DanPatterson_Retired
MVP Emeritus

The data type of the array must be integer, I think you still have floating point values, so here is a sample, converting one of my random integer .npy files into an integer, an image, then a polygon

>>> az
array([[  0.,  0.,  6., ...,  0.,  0.,  4.],
      [  0.,  0.,  0., ...,  0.,  2.,  0.],
      [  0.,  4.,  0., ...,  10.,  0.,  0.],
      ...,
      [  2.,  0.,  0., ...,  0.,  0.,  2.],
      [  0.,  0.,  0., ...,  0.,  8.,  0.],
      [ 10.,  4.,  0., ...,  0.,  0.,  6.]])
>>>
>>> eee_zee = az.astype(int)
>>> eee_zee
array([[ 0,  0,  6, ...,  0,  0,  4],
      [ 0,  0,  0, ...,  0,  2,  0],
      [ 0,  4,  0, ..., 10,  0,  0],
      ...,
      [ 2,  0,  0, ...,  0,  0,  2],
      [ 0,  0,  0, ...,  0,  8,  0],
      [10,  4,  0, ...,  0,  0,  6]])
>>> out = arcpy.NumPyArrayToRaster(eee_zee)
>>> out.save("f:/temp/ez.tif")
>>>
>>> p = arcpy.RasterToPolygon_conversion("f:/temp/ez.tif","f:/temp/ez_.shp","NO_SIMPLIFY")
>>>

Now, you have to be aware that it will make many unique areas IF you have many areas. That could be limitation

I just used the no_simplify and sent it out to a shapefile which loaded into arcmap

The keep line is eee_zee = az.astype(int)  ... that sent the floating point array to an integer view.  Then I saved it to a raster (tif) then to a shapefile

Next step... I want to see an image of what you are working with

PS  in Global Reach, it has had 2163 view but only 12 people...

so these people...

  • have no life other than to view this thread multiple times,
  • there a whole load of unsigned in people reading this thread or
  • there is something wrong with global reach views

  Timothy Hales​ can you provide some insight