Problems with large rasters: $$rowmap, numpy, Memory Error, and numeric precision

deleted-user-De598MZ4-kwO · ‎11-30-2011

I've been practicing recently with numpy for certain tasks. It works nicely (using indices) for returning the row and column numbers for small and medium-sized rasters, up to a few thousand cells on a side. When I work with large rasters (10000-by-10000 or larger), I get a Python Memory Error. It seems that numpy is not an option for doing computations with data for large rasters.

(I'm running ArcGis 10 experimentally on a Windows Server 2008 machine, 64-bit, with supposedly 48 GB of RAM. Yes, I understand that by my definition a large raster could take up several GB when held in memory.)

Is there a practical procedure out there for returning row and column numbers for large rasters? I've seen suggestions for using fishnet or flowaccumulation, but am guessing that the steps involved would make for verrry slow processing.

I'm also running into precision problems when working with large numbers in Map Algebra. Integer arithmetic works nicely up to 2^31 (about 2 billion), but I have to go to floating-point for larger numbers. Computations involving large numbers, such as sums of squares, begin to lose accuracy and produce unacceptable artifacts in the output.

I can get around the precision problems in Map Algebra by converting my raster objects to numpy arrays and defining them as float64. However, as the rasters get larger, then I run into the memory errors again. For example, if I have a 5000-by-5000 raster, I can't keep more than about three variables (as numpy arrays) in play without getting memory errors.

Running calculations on data in numpy arrays is very fast, I suppose because everything is held in memory. The limits to that memory, however, render "the numpy route" unusable for working with larger rasters.

I like the new Map Algebra, but what was ESRI thinking when they decided to give us this set of "features"? Please give us back $$rowmap and $$colmap. Please let us define rasters to have higher levels of precison. Please give us a supplemental programming option that lets us handle large rasters (arrays).

Currently my only option is to write programs that won't work for large rasters. Because I do a lot of work with LiDAR-based elevation data, that's not much of an option. Does anyone have workarounds for these issues?

Regards, Tim.L

KyleShannon · ‎11-30-2011

Tim,
You'll have to forgive my ignorance with rasters in arcpy, I am new to arc stuff, coming from mostly open source gis. I have a few questions/comments.

Questions:
* What is the scope of your task? Are you just doing calculations on the raster data, or are you using some other functionality?

* Can you 'tile' your raster and still achieve your goal? Meaning if you read in 512 rows x 512 columns starting from the origin, iterating over the 'blocks' or 'tiles'?

Comments:
* You can't read any raster over 2 gb in arcpy, it is a 32 bit process.

* If you don't need to do it in arc, try using gdal python bindings, see gdal.org

* I would have to look at the precision issues separately. Not sure about that.

deleted-user-De598MZ4-kwO · ‎11-30-2011

I'm doing a lot of calculations based on elevation values. The current problem involves alternative methods for calculating and manipulating slope and aspect and derivative products, but that's not the point now.

I could tile the datasets into smaller pieces, and maybe will have to do that if no other method presents itself, but that's an inelegant and unsatisfactory solution. Because of edge effects, the tiles would need to overlap, and then the overlaps would need to be removed when joining the processed tiles back together. It could be automated, and perhaps someone else has already written a script to do that, but I'm looking for a better and more fundmental answer.

I'm trying to use the ESRI geoprocessor and related available tools, without the user needing to install anything extra. All of our GIS work is done in the ESRI environment. One good reason for staying with SA is the ease in making calculations based on movng (running) windows, such as focal stats and kernels.

I don't see any way around the precision problem for large rasters, with the current version of ArcGis. Map Algebra can handle large rasters (albeit slowly), and Python/numpy can handle high-precision numbers, but I don't have a way to handle both at the same time.

Luke_Pinner · ‎12-01-2011

I've been practicing recently with numpy for certain tasks. It works nicely (using indices) for returning the row and column numbers for small and medium-sized rasters, up to a few thousand cells on a side. When I work with large rasters (10000-by-10000 or larger), I get a Python Memory Error. It seems that numpy is not an option for doing computations with data for large rasters.

Check out pytables or numpy.memmap

Is there a practical procedure out there for returning row and column numbers for large rasters? I've seen suggestions for using fishnet or flowaccumulation, but am guessing that the steps involved would make for verrry slow processing.

import arcgisscripting
gp = arcgisscripting.create(9.3) #This works in ArcGIS 10 for backwards compatibility.
#Make sure extent and cellsize are set in the arcgisscripting environment...
expr='$$rowmap'
output=r'C;\Temp\rowmap'
result=gp.SingleOutputMapAlgebra(expr,output)

curtvprice · ‎12-02-2011

See my post for a method to get to $$ROWMAP using the unsupported back door to MOMA in 10.0. (Hi Tim!!)

[thread]45087[/thread]

As for the 32-bit precision issue - you are kind of stuck there as the tools were all written to work on grid floats (32-bit). Putting a raster into a high-precision raster stored in the file geodatabase may help somewhat (because of the new direct read/write at 10.0) - but probably only for some tools. Can your numpy arrays get bigger than 4GB? I would think they'd be limited by the fact that you need to use 32-bit python with ArcGIS.

To do what you want effectively you may have to resort to what ESRI does with large data sets (vector and raster) -- tile your data (you could merge the results back together with Mosaic pretty easily).

curtvprice · ‎02-24-2012

Because of edge effects, the tiles would need to overlap, and then the overlaps would need to be removed when joining the processed tiles back together. It could be automated, and perhaps someone else has already written a script to do that

Should be noted that the Mosaic tool (and mosaic data sets on the fly) can handle the stitching back together with averaging overlapping areas. Splitting apart, well, we still need a script for that.