EDIT: although I marked Dan's answer re: making the array an integer array, which was the last piece I needed, I created a summary at the end https://community.esri.com/message/610325#comment-610325 to put all the pieces that worked together. I suggest starting there if running into this problem.
Bottom line: trying to get reclass elevation raster (in m), that include “water” and NoData, and Denali foothills, to 3-4 class (based on feet---keeping the converted layer, not in memory), then convert to polygons.
For the study are I used for development (and one other), the raster Reclassify and arcpy.RasterToPolygon_conversion this worked relatively fast. (max few minutes)
I’m now running it on an area about 2x larger (~25m cells, 6657 col 13035 row), with a wider range of input values. When using my script/tool or testing in the python window, I run out of memory after about 4 hours.
So I’m diving into Numpy and https://community.esri.com/people/Dan_Patterson ‘s blog https://community.esri.com/blogs/dan_patterson/2016/03/12/reclassify-raster-data-simply
After a clean reboot, I’m trying my test script in the python window, but I’m still getting an out of error message. I’m not sure if I have something in error in my script, or it is just too large for the 32bit-python to handle. So, I’m open to suggestions…..maybe processing half at a time?? I have tried but a full raster with no nodata, and one that has nodata outside the side area boundary.
Test script I am using (parts borrowed from Dan's demo scripts)
import time import os import arcpy from arcpy import env from arcpy.sa import * import numpy as np from textwrap import dedent arcpy.CheckOutExtension("spatial") def arr_reclass(a, bins=[], new_bins=[], mask=False, mask_val=None): """a - integer or floating point array to be reclassed using bins - sequential list/array of the lower limits of each class include one value higher to cover the upper range. mask - whether the raster contains nodata values or values to be masked with mask_val array dimensions will be squeezed """ a_rc = np.zeros_like(a) if (len(bins) < 2): # or (len(new_bins <2)): print("Bins = {} new = {} won't work".format(bins,new_bins)) return a if len(new_bins) < 2: new_bins = np.arange(1,len(bins)+2) new_classes = zip(bins[:-1],bins[1:],new_bins) for rc in new_classes: q1 = (a >= rc[0]) q2 = (a < rc[1]) z = np.where(q1 & q2, rc[2],0) a_rc = a_rc + z return a_rc theWorkspace = r'C:\___bear2016\_Skwentna2017\Skwentna_prep.gdb' arcpy.env.workspace = theWorkspace arcpy.env.overwriteOutput = True wsPath, wsGDB = os.path.split(theWorkspace) print("Project gdb (vector, i.e. for points, lines, polygons): {0}".format(theWorkspace)) inStudy = r'C:\___bear2016\_Skwentna2017\Skwentna_prep.gdb\Study16b' stdyPath, stdyName = os.path.split(inStudy) print("Study boundary: {0}".format(stdyName)) spatialref = arcpy.Describe(inStudy).spatialReference rasterIn = r'C:\___bear2016\_Skwentna2017\Skwentna_raster.gdb\elevExtent' #rasterIn = r'C:\___bear2016\_Skwentna2017\Skwentna_raster.gdb\elevClip' inRas = arcpy.Raster(rasterIn) lowerLeft = arcpy.Point(inRas.extent.XMin, inRas.extent.YMin) cellSize = inRas.meanCellWidth rasterGDB, rasterFile = os.path.split(rasterIn) elevUnits = "meters" elevCutoffFt = "6000" flatElevBreak = "2000" elevBreaks = flatElevBreak.split("; ") elevBreaks.sort(key=float) flatElevBreak = ";".join(elevBreaks) flatElevBreak '2000' hydroBuff = r'C:\___bear2016\_Skwentna2017\Skwentna_prep.gdb\hydroBuff250' finalTransZones = arcpy.os.path.join(theWorkspace, "TransAreas") outRiparian = arcpy.os.path.join(theWorkspace, "TransRiparian") pElevClass = arcpy.os.path.join(theWorkspace, "elevReclassPoly") outDEMtmp = arcpy.os.path.join(theWorkspace, "elevReclass") if elevUnits == "meters": # 3.280839895013123 toElevFeet = 3.28084 else: # note, IF had to go from feet to meters, factor would be 0.3048 toElevFeet = 1 elevFeet = (inRas * toElevFeet) saveElevFeet = arcpy.os.path.join(rasterGDB, "elevFt") elevFeet.save(saveElevFeet) elevMin = elevFeet.minimum - 1 elevMax = elevFeet.maximum + 1 noDataReclass = 99999 elevSplit = flatElevBreak.split(";") lstCnt = 1 #reclassArg = [] bins = [] new_bins = [] for reclassValue in elevSplit: if lstCnt == 1: #reclassArg.append([int(elevMin), int(reclassValue), int(reclassValue)]) bins.append(int(elevMin)) bins.append(int(-1)) bins.append(int(reclassValue)) new_bins.append(int(-1)) new_bins.append(int(reclassValue)) else: #reclassArg.append([int(prevValue), int(reclassValue), int(reclassValue)]) bins.append(int(reclassValue)) new_bins.append(int(0)) new_bins.append(int(reclassValue)) lstCnt += 1 prevValue = reclassValue #reclassArg.append([int(prevValue), int(elevCutoffFt), int(elevCutoffFt)]) #reclassArg.append([int(elevCutoffFt), int(elevMax), 9]) bins.append(int(elevCutoffFt)) bins.append(int(elevMax)) new_bins.append(int(elevCutoffFt)) new_bins.append(9) new_classes = zip(bins[:-1],bins[1:],new_bins) print(" ...Old Ranges:\n {0}".format(bins)) print(" ...New values:\n {0}".format(new_bins)) print(" ...New classes:\n {0}".format(new_classes)) arr = arcpy.RasterToNumPyArray(inRas, nodata_to_value=9999999) mask = False mask_val = None print(arr) arr_rc = arr_reclass(arr, bins=bins, new_bins=new_bins)
The error I am getting is on the last line, however, looking at the resources, there is plenty of RAM available, so I'm assuming it is just the Arcmap/Python limit.
Output from script (in ArcMap python window)
Project gdb (vector, i.e. for points, lines, polygons): C:\___bear2016\_Skwentna2017\Skwentna_prep.gdb
Study boundary: Study16b
...Old Ranges:
[-107504, -1, 2000, 6000, 20036]
...New values:
[-1, 2000, 6000, 9]
...New classes:
[(-107504, -1, -1), (-1, 2000, 2000), (2000, 6000, 6000), (6000, 20036, 9)]
[[ 205.16717529 205.25009155 205.29598999 ..., 1037.9387207
1037.86242676 1037.83959961]
[ 205.22860718 205.3127594 205.37818909 ..., 1037.93701172
1037.84533691 1037.76965332]
[ 205.29447937 205.38237 205.46986389 ..., 1037.91918945
1037.75964355 1037.76477051]
...,
[ 1481.97961426 1492.3326416 1504.91308594 ..., 510.62741089
504.97958374 501.19992065]
[ 1469.56079102 1486.19445801 1505.37487793 ..., 506.6071167
501.30090332 497.77194214]
[ 1453.34509277 1471.94152832 1493.07250977 ..., 501.66564941
497.09884644 494.20864868]]
Runtime error
Traceback (most recent call last):
File "<string>", line 115, in <module>
File "<string>", line 30, in arr_reclass
MemoryError
>>>
If I need to split it due to size, some guidance for that would be good. The one nice thing about this numpy process, I get the error immediately instead of 4-hours in....
Thanks for any help!
ps - Dan....is this question seen by more than the 13 members/followers? Nice "place" but want to make sure others have a chance to find/reply
tagging a few more areas Python python snippets Imagery and Remote Sensing
Solved! Go to Solution.
I tend to read rasters directly to arrays in memory as chunks both because of the 2GB memmap limit and because it avoids the doubling up of file system IO (reading it in, saving it back out to npy, reading it back in). But I'm usually dealing with large stacks of timeseries data. Horses for courses
Luke are you doing data prep and processing outside of Arc* ? or are you 'tiling' or equivalent when you mention chunks? In this supposed world of big data, there seems little obvious interest in the data analysis side in the gis field. Most complaints are the 1000 record limits for AGOL
Just did some digging, python prior to 2.5 had the 2GB limit (even for 64 bit systems). Numpy does not, there were some pretty ridiculously large files being read and used
see hpaulj's comment http://stackoverflow.com/questions/37365064/memory-mapping-slows-down-over-time-alternatives
Plus there are alternatives
h5py, pytables and Dask (haven't checked these all out yet, the first is best for simple single content arrays, the 2nd for tabular arrays, Dask is from Continuum, ) Array — dask 0.9.0 documentation
Dask is pretty useful, I've played around with it, but need to get up to speed as it's being used as the distributed processing engine in v.2 of the Aust. Geoscience Data Cube API.
I'll look at memmap again, didn't realise the limit was no longer there. For most of what I do, reading to array then writing to memmap file, then reading in again just increases the amount of IO, but for larger timeseries it would be useful, or especially for windowed operations (aka focal in ArcGIS terminology) so no need to handle tiling edge effects.
Thanks for the references Luke, I will have a long look. As for focal operations, I have only used stride tricks to get the windows I need. At least you can emulate most of the SA functionality, albeit with limits for most people. I will look into the GDC site in more detail. I don't work with time series much except for a foray recently, which you may have seen 6,000+ rasters the follow-up a bit of a patch job but interesting
Luke/Dan - looked briefly at your latest comments, but will look at them in depth in the morning. Thanks again for all the comments and suggestions. I think I am close and will hopefully be able to fold this process back into the full script tomorrow.
Luke, just an fyi, the reason I didn't use that is determining the lower-left for several chunks may have caused issues...the array is a much cleaned split.
It may be split time...
Now... you can create the array, save it to disk and load 'pieces', reclassify, save to disk, repeat then reassemble
I have no clue why you are running out of memory. The arrays aren't that big, the processing within numpy barely makes a glitch time wise or memory wise.
I would suggest you monitor your memory useage. How much ram do you have anyway?
Anyway, the below is a verbose description with results so you can see the process conceptually.
Later...
replace the below with, keeping the reclass function....
if __name__=="__main__": """run demo""" r = 8 c = 10 # uncomment the 2 lines below to create the array only once, then recomment # for subsequent testing #z = np.abs(100 * np.random.random_sample((r,c)) - 100) # make rand value in 0-100 #np.save("z_8x10.npy",z) z = np.load("z_8x10.npy") bins = [0, 10, 20, 30, 40, 60, 70, 80, 90, 100] new_bins = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] z_rc = arr_reclass(z, bins=bins, new_bins=new_bins) z_hist = np.histogram(z_rc,bins=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) result = np.array(list(zip(z_hist[1],z_hist[0]))) print("input array\n{}\nreclassed array\n{}".format(z, z_rc)) print("histogram....\n(class, count)\n{}".format(result)) # now reclass in slices and stack back z_rc0 = arr_reclass(z[:4,:], bins=bins, new_bins=new_bins) z_rc1 = arr_reclass(z[4:,:], bins=bins, new_bins=new_bins) z_stack = np.vstack((z_rc0,z_rc1)) frmt=""" Now split the array into some bits, do the reclass and reformulate If you want to use memmap (see np.load), it just reads the bits from disk If you want, you can reuse the names to clear memory...this was not done in this case. you can save intermediate results to disk as well. .... first split and reclass... {} second... {} all equal? np.allclose(z_stack, z_rc) = {} """ print(dedent(frmt).format(z_rc0,z_rc1, np.allclose(z_stack, z_rc)))
The results
input array [[ 57.04 17.38 74.4 18.04 25. 99.25 18.09 9.79 45.35 67.94] [ 28.73 74.36 32.13 60.14 0.79 14.74 60.81 20.31 53.26 89.08] [ 26.97 70.29 96.59 43.5 97.16 96.33 18. 82.99 35.37 40.31] [ 0.12 52.13 26.93 19.59 48.93 20.31 4.3 24.43 29.33 20.46] [ 85.84 12.67 94.1 43.57 76.68 22.15 93.13 66.64 61.31 49.3 ] [ 55.35 25.44 23.02 62.98 12.44 83.66 42.53 92.32 48.41 24. ] [ 70.27 45.64 3. 42.58 73.27 66.55 1.68 67.29 48.39 2.47] [ 96.38 4.19 78.12 82.1 13.83 73.13 52.77 28.02 57.7 5.44]] reclassed array [[ 5. 2. 7. 2. 3. 9. 2. 1. 5. 6.] [ 3. 7. 4. 6. 1. 2. 6. 3. 5. 8.] [ 3. 7. 9. 5. 9. 9. 2. 8. 4. 5.] [ 1. 5. 3. 2. 5. 3. 1. 3. 3. 3.] [ 8. 2. 9. 5. 7. 3. 9. 6. 6. 5.] [ 5. 3. 3. 6. 2. 8. 5. 9. 5. 3.] [ 7. 5. 1. 5. 7. 6. 1. 6. 5. 1.] [ 9. 1. 7. 8. 2. 7. 5. 3. 5. 1.]] histogram.... (class, count) [[ 1 9] [ 2 9] [ 3 14] [ 4 2] [ 5 17] [ 6 8] [ 7 8] [ 8 5] [ 9 8]] Now split the array into some bits, do the reclass and reformulate If you want to use memmap (see np.load), it just reads the bits from disk If you want, you can reuse the names to clear memory...this was not done in this case. you can save intermediate results to disk as well. .... first split and reclass... [[ 5. 2. 7. 2. 3. 9. 2. 1. 5. 6.] [ 3. 7. 4. 6. 1. 2. 6. 3. 5. 8.] [ 3. 7. 9. 5. 9. 9. 2. 8. 4. 5.] [ 1. 5. 3. 2. 5. 3. 1. 3. 3. 3.]] second... [[ 8. 2. 9. 5. 7. 3. 9. 6. 6. 5.] [ 5. 3. 3. 6. 2. 8. 5. 9. 5. 3.] [ 7. 5. 1. 5. 7. 6. 1. 6. 5. 1.] [ 9. 1. 7. 8. 2. 7. 5. 3. 5. 1.]] all equal? np.allclose(z_stack, z_rc) = True
Dan, I'll take a look at this tomorrow. I did test your other script, i.e. creating the random array, on my home win10 machine. Again, 10.3.1 gave the memory error. I then ran it in Pro (using the file written before the other crashed) and it worked, so that does point to the version of Python and/or numpy being an issue. However, I do need to make this work in ArcMap, not Pro, so the split option seems like a better solution.
Luke, I think the speed of the numpy or similar is the way for me to go, especially if I can quickly split the array and remap. With the arcpy remap taking +4 hours of running before it ran into issues with memory, at least the numpy version let me know in a matter of second (< min). I had looked at chunking it up with the options available in RasterToNumpyArray, and that would probably work, if I figure out the logic to find the lowerleft corners of each split...and that might still be an option. I was playing areound with the SpitRaster command, but my first attempt was a disaster since I messed up the attributes. Anyway, I appreciate the effort, and in the long run I'll probably use bits of all the suggestions.
yes it is python version... split works too