stack 2D array using numpy dstack memory error

5615
3
Jump to solution
09-22-2015 07:32 PM
Leo_KrisPalao
New Contributor II

My intention is to stack my 2D array created from my MODIS rasters using numpy dstack.  After that, I hope to implement savitzky-golay filtering algorithm from scipy signal processing so I think it is better to stack my 2D array to access the time series values of a particular pixel. Right now, I am just testing for 10 MODIS rasters (but later on will use my whole dataset -- approx 800 rasters (15 years of data)) only to test how should I create my script. So my initial workflow is to:

1) Access my raster files

2) For each file I need to convert it to a 2D array using RasterToNumPyArray

3) Append all those created 2D arrays to a list, and

4) Implement numpy.dstack() to stack my 2D arrays and hopefully I can begin some analysis.

I am getting an error in the 4th step (it says memory error) -- converting my list of 2D arrays to a stack (please see script 1 below) but if I revise my script and load in dstack using the index of each element in my 2D array it runs okay. Could you guide how to approach my problem. I am just building my script so for now its not so much. please see below. Thanks.

Script 1

import arcpy
from arcpy.sa import *
import os, sys
import numpy as np
from scipy.signal import savgol_filter

arcpy.CheckOutExtension("Spatial")

rasters = []

ws = 'G:/Test/Raster'
for folder, subs, files in os.walk(ws):
  for filename in files:
  aSrc = arcpy.RasterToNumPyArray(os.path.join(folder,filename))
  rasters.append(aSrc)

stackRast = np.dstack(rasters)

Script 2

import arcpy
from arcpy.sa import *
import os, sys
import numpy as np
from scipy.signal import savgol_filter

arcpy.CheckOutExtension("Spatial")

rasters = []

ws = 'G:/Test/Raster'
for folder, subs, files in os.walk(ws):
  for filename in files:
  aSrc = arcpy.RasterToNumPyArray(os.path.join(folder,filename))
  rasters.append(aSrc)

stack = np.dstack((rasters[0], rasters[1], rasters[2], rasters[3], rasters[4], rasters[5],rasters[6], rasters[7], rasters[8], rasters[9]))
0 Kudos
1 Solution

Accepted Solutions
Luke_Pinner
MVP Regular Contributor

You haven't defined the nrows or ncols variables (number of rows, number of columns).

The RasterToNumPyArray function accepts lower_left_corner, ncols, nrows parameters which you can use to extract a subset of each raster.

View solution in original post

3 Replies
Luke_Pinner
MVP Regular Contributor

How much RAM have you got?


Make sure you're using 64bit python, 32bit can't make use of much of your RAM.

Prebuild the stack then populate it rather than using a list so less memory is required (due to the way numpy makes copies of arrays) i.e

stack = numpy.zeroes((nrasters, nrows, ncols), dtype = numpy.some_appropriate_dtype)

i=0

for folder, subs, files in os.walk(ws): 

    for filename in files: 

        stack[i, nrows, ncols] = arcpy.RasterToNumPyArray(os.path.join(folder,filename))

Even with 64bit and loads of RAM, you'll probably still have issues with memory, you may need to process the timeseries in chunks, i.e the full time series for a portion of the  image.

0 Kudos
Leo_KrisPalao
New Contributor II

Hi Luke,

Currently I am using 12GB RAM but my Python is only 32bit. The reason why is that I have a problem before installing 64-bit GDAL for python.

I tried your suggestion but I get errors in nrows saying not defined. Did I get the syntax/expression right?

import arcpy

from arcpy.sa import *

import os, sys

import numpy as np

arcpy.CheckOutExtension("Spatial")

stack = np.zeros((2,4800,4800),dtype=np.int)

i=0

ws = 'G:/Test/Raster'

for folder, subs, files in os.walk(ws):

  for fname in files:

  stack[i,nrows,ncols] = arcpy.RasterToNumPyArray(os.path.join(folder,fname))

By the way Luke, dividing the time series by tile is a good way. I have seen some of the script in R they do such technique for signal processing. Can you guide me how to approach this tiling method in Python?

Thanks,

-Leo

0 Kudos
Luke_Pinner
MVP Regular Contributor

You haven't defined the nrows or ncols variables (number of rows, number of columns).

The RasterToNumPyArray function accepts lower_left_corner, ncols, nrows parameters which you can use to extract a subset of each raster.