AnsweredAssumed Answered

reclassify using Numpy, out of memory

Question asked by rastrauch Champion on May 19, 2016
Latest reply on May 24, 2016 by Dan_Patterson

EDIT: although I marked Dan's answer re: making the array an integer array, which was the last piece I needed, I created a summary at the end https://community.esri.com/message/610325#comment-610325  to put all the pieces that worked together.  I suggest starting there if running into this problem.

 

Bottom line: trying to get reclass elevation raster (in m), that include “water” and NoData, and Denali foothills, to 3-4 class (based on feet---keeping the converted layer, not  in memory), then convert to polygons.

 

For the study are I used for development (and one other), the raster Reclassify and arcpy.RasterToPolygon_conversion this worked relatively fast. (max few minutes)

 

I’m now running it on an area about 2x larger (~25m cells, 6657 col 13035 row), with a wider range of input values.  When using my script/tool or testing in the python window, I run out of memory after about 4 hours.

 

So I’m diving into Numpy and https://community.esri.com/people/Dan_Patterson ‘s blog https://community.esri.com/blogs/dan_patterson/2016/03/12/reclassify-raster-data-simply

 

After a clean reboot, I’m trying my test script in the python window, but I’m still getting an out of error message.  I’m not sure if I have something in error in my script, or it is just too large for the 32bit-python to handle.  So, I’m open to suggestions…..maybe processing half at a time??  I have tried but a full raster with no nodata, and one that has nodata outside the side area boundary.

 

Test script I am using (parts borrowed from Dan's demo scripts)

import time
import os
import arcpy
from arcpy import env
from arcpy.sa import *
import numpy as np
from textwrap import dedent

arcpy.CheckOutExtension("spatial")

def arr_reclass(a, bins=[], new_bins=[], mask=False, mask_val=None):
  """a    - integer or floating point array to be reclassed using
     bins - sequential list/array of the lower limits of each class
        include one value higher to cover the upper range.
     mask - whether the raster contains nodata values or values to
        be masked with mask_val
     array dimensions will be squeezed 
  """ 
  a_rc = np.zeros_like(a)
  if (len(bins) < 2): # or (len(new_bins <2)):
    print("Bins = {} new = {} won't work".format(bins,new_bins))
    return a
  if len(new_bins) < 2:
    new_bins = np.arange(1,len(bins)+2)   
  new_classes = zip(bins[:-1],bins[1:],new_bins)
  for rc in new_classes:
    q1 = (a >= rc[0])
    q2 = (a < rc[1])
    z = np.where(q1 & q2, rc[2],0)
    a_rc = a_rc + z
  return a_rc

theWorkspace = r'C:\___bear2016\_Skwentna2017\Skwentna_prep.gdb'
arcpy.env.workspace = theWorkspace
arcpy.env.overwriteOutput = True
wsPath, wsGDB = os.path.split(theWorkspace)
print("Project gdb (vector, i.e. for points, lines, polygons): {0}".format(theWorkspace))

inStudy = r'C:\___bear2016\_Skwentna2017\Skwentna_prep.gdb\Study16b'
stdyPath, stdyName = os.path.split(inStudy)
print("Study boundary: {0}".format(stdyName))
spatialref = arcpy.Describe(inStudy).spatialReference

rasterIn = r'C:\___bear2016\_Skwentna2017\Skwentna_raster.gdb\elevExtent'
#rasterIn = r'C:\___bear2016\_Skwentna2017\Skwentna_raster.gdb\elevClip'
inRas = arcpy.Raster(rasterIn)

lowerLeft = arcpy.Point(inRas.extent.XMin, inRas.extent.YMin)
cellSize = inRas.meanCellWidth

rasterGDB, rasterFile = os.path.split(rasterIn)
elevUnits = "meters"
elevCutoffFt = "6000"
flatElevBreak = "2000"
elevBreaks = flatElevBreak.split("; ")
elevBreaks.sort(key=float)  
flatElevBreak = ";".join(elevBreaks)
flatElevBreak
'2000'
hydroBuff = r'C:\___bear2016\_Skwentna2017\Skwentna_prep.gdb\hydroBuff250'
finalTransZones = arcpy.os.path.join(theWorkspace, "TransAreas")
outRiparian = arcpy.os.path.join(theWorkspace, "TransRiparian")
pElevClass = arcpy.os.path.join(theWorkspace, "elevReclassPoly")  
outDEMtmp = arcpy.os.path.join(theWorkspace, "elevReclass")  

if elevUnits == "meters":   #  3.280839895013123
  toElevFeet = 3.28084
else:      # note, IF had to go from feet to meters, factor would be 0.3048
  toElevFeet = 1
elevFeet = (inRas * toElevFeet)
saveElevFeet = arcpy.os.path.join(rasterGDB, "elevFt")
elevFeet.save(saveElevFeet)
elevMin = elevFeet.minimum  - 1
elevMax = elevFeet.maximum  + 1
noDataReclass = 99999

elevSplit = flatElevBreak.split(";")
lstCnt = 1
#reclassArg = []
bins = []
new_bins = []
for reclassValue in elevSplit:
  if lstCnt == 1:
    #reclassArg.append([int(elevMin), int(reclassValue), int(reclassValue)])
    bins.append(int(elevMin))
    bins.append(int(-1))
    bins.append(int(reclassValue))
    new_bins.append(int(-1))
    new_bins.append(int(reclassValue))
  else:
    #reclassArg.append([int(prevValue), int(reclassValue), int(reclassValue)])
    bins.append(int(reclassValue))    
    new_bins.append(int(0))    
    new_bins.append(int(reclassValue))    
  lstCnt += 1
  prevValue = reclassValue
#reclassArg.append([int(prevValue), int(elevCutoffFt), int(elevCutoffFt)])
#reclassArg.append([int(elevCutoffFt), int(elevMax), 9])

bins.append(int(elevCutoffFt))
bins.append(int(elevMax))
new_bins.append(int(elevCutoffFt))
new_bins.append(9)  
new_classes = zip(bins[:-1],bins[1:],new_bins)

print("   ...Old Ranges:\n   {0}".format(bins))
print("   ...New values:\n   {0}".format(new_bins))
print("   ...New classes:\n   {0}".format(new_classes))


arr = arcpy.RasterToNumPyArray(inRas, nodata_to_value=9999999)
mask = False
mask_val = None
print(arr)
arr_rc = arr_reclass(arr, bins=bins, new_bins=new_bins)

 

The error I am getting is on the last line, however, looking at the resources, there is plenty of RAM available, so I'm assuming it is just the Arcmap/Python limit.

 

Output from script (in ArcMap python window)

Project gdb (vector, i.e. for points, lines, polygons): C:\___bear2016\_Skwentna2017\Skwentna_prep.gdb

Study boundary: Study16b

...Old Ranges:

   [-107504, -1, 2000, 6000, 20036]

   ...New values:

   [-1, 2000, 6000, 9]

   ...New classes:

   [(-107504, -1, -1), (-1, 2000, 2000), (2000, 6000, 6000), (6000, 20036, 9)]

[[  205.16717529   205.25009155   205.29598999 ...,  1037.9387207

   1037.86242676  1037.83959961]

[  205.22860718   205.3127594    205.37818909 ...,  1037.93701172

   1037.84533691  1037.76965332]

[  205.29447937   205.38237      205.46986389 ...,  1037.91918945

   1037.75964355  1037.76477051]

...,

[ 1481.97961426  1492.3326416   1504.91308594 ...,   510.62741089

    504.97958374   501.19992065]

[ 1469.56079102  1486.19445801  1505.37487793 ...,   506.6071167

    501.30090332   497.77194214]

[ 1453.34509277  1471.94152832  1493.07250977 ...,   501.66564941

    497.09884644   494.20864868]]

Runtime error

Traceback (most recent call last):

  File "<string>", line 115, in <module>

  File "<string>", line 30, in arr_reclass

MemoryError

>>>

 

If I need to split it due to size, some guidance for that would be good.  The one nice thing about this numpy process, I get the error immediately instead of 4-hours in....

Thanks for any help!

 

ps - Dan....is this question seen by more than the 13 members/followers?  Nice "place" but want to make sure others have a chance to find/reply

 

tagging a few more areas  Python python snippets Imagery and Remote Sensing

Outcomes