Heavy-Duty Raster Processing (python, numpy, arrays?)

ZoeZaloudek · ‎01-12-2016

I am starting with six rasters. One is basically a DEM (topographic elevation). The other five rasters represent water surface elevation for different recurrence intervals (10%, 4%, 2%, 1%, and 0.2%). The raster I want to create will represent the percent-annual-chance of flooding for each cell.

For example, I might have a topo elevation of 601 ft for one cell. The 10%, 4%, 2%, 1%, and 0.2%, water surface elevation rasters have values of 590, 595, 600, 605, and 615 ft, respectively. In order to find the percent-annual-chance of flooding for this cell, I need to graph a line using the known values. On the X-axis I put the natural log of the percents, and I put the elevation values on the Y-axis. I use numpy’s polyfit function to find the m and b of that line, and with a little algebra find that the percent-annual-chance of flooding at this cell is 1.9%.

I have written a few python scripts that can do this, but they only work for small datasets.

One works by turning the topo raster into points, extracting the WSEL values from those rasters to the points, and then running the calculation using UpdateCursor. This works great when my rasters are small and have low resolution, but it takes a VERY long time (non-feasible) to run for the size of a county.

Another script will turn the rasters into numpy arrays and interate through each row/column combination to perform the calculation. This runs much faster than using points, however it requires that all rasters are clipped to the exact same extent. Also, with the county-size extent and small cellsize (as small as 3 feet), my computer’s memory cannot handle turning all of the rasters into arrays. It will try, but eventually I get a memory error.

Has anyone else done raster processing like this? It would be great to be able to just use Raster Calculator / Map Algebra, but I haven’t been able to figure out how to use the numpy polyfit function with it.

I'm also attaching an image from my guidance document to help explain the calculation.

XanderBakker · ‎01-12-2016

Look at this help page: RasterToNumPyArray—Help | ArcGIS for Desktop and scroll down to the second sample. It creates blocks of data to process all the data in smaller parts.

View solution in original post

DanPatterson_Retired · ‎01-12-2016

could you cite the number of rows and columns in the raster

ZoeZaloudek · ‎01-12-2016

One that I'm working with, at its original raster cellsize of 3.75ft, has 13,747 rows and 25,245 columns.

DanPatterson_Retired · ‎01-12-2016

first question would be...do you really need that resolution? aggregating the cell size by a factor of 2 will go a long way. I would begin with a little experiment

filter/smooth, with a running average what do you get when you subtract that from the original?
aggregate by a factor of 2 (aka double the cell size) using the average, subtract
pick a small representative sample area, clip, run your process and see what you get.

sometimes it the data simplification has little to no affect, meaning that it would be safe to do so before trying to combine multiple biggies into one process or trying to process one huge area in one chunk when multiple chunks will enable the task to be completed and/or speed up the process in the long run

ZoeZaloudek · ‎01-12-2016

Technically, no, the cell size is allowed to be as large as 10 meters. I was hoping to not have to resample the rasters from their original cell size. I have on occasion run into issues while batch resampling, most notably the appearance of some blocks of the output rasters being set to nodata values.

I will at least try the aggregation experiment you mention. It might make me feel better about doing it, if it comes down to that.

It seems more and more like I am also just going to have to split these rasters up into smaller pieces for processing. I'm looking into adding the Split Raster tool into the script.

XanderBakker · ‎01-12-2016

Look at this help page: RasterToNumPyArray—Help | ArcGIS for Desktop and scroll down to the second sample. It creates blocks of data to process all the data in smaller parts.

ZoeZaloudek · ‎01-13-2016

Ah, this has turned out to be very helpful. Instead of resampling or splitting up the rasters (and saving the pieces to disk), I used this sample to edit my script. Now it will create arrays of a limited size and then only save each piece of resulting raster to disk (mosaicking them all at the end). I have been able to run this on a raster with rows*columns of 2864*5259, and the script was done within 30 minutes. I set the blocksize to 1000 (so each array chunk will have a maximum of 1,000,000 cells).

Thank you for the suggestion!

XanderBakker · ‎01-13-2016

You're welcome. I'm glad it worked!