I've been practicing recently with numpy for certain tasks. It works nicely (using indices) for returning the row and column numbers for small and medium-sized rasters, up to a few thousand cells on a side. When I work with large rasters (10000-by-10000 or larger), I get a Python Memory Error. It seems that numpy is not an option for doing computations with data for large rasters.
(I'm running ArcGis 10 experimentally on a Windows Server 2008 machine, 64-bit, with supposedly 48 GB of RAM. Yes, I understand that by my definition a large raster could take up several GB when held in memory.)
Is there a practical procedure out there for returning row and column numbers for large rasters? I've seen suggestions for using fishnet or flowaccumulation, but am guessing that the steps involved would make for verrry slow processing.
I'm also running into precision problems when working with large numbers in Map Algebra. Integer arithmetic works nicely up to 2^31 (about 2 billion), but I have to go to floating-point for larger numbers. Computations involving large numbers, such as sums of squares, begin to lose accuracy and produce unacceptable artifacts in the output.
I can get around the precision problems in Map Algebra by converting my raster objects to numpy arrays and defining them as float64. However, as the rasters get larger, then I run into the memory errors again. For example, if I have a 5000-by-5000 raster, I can't keep more than about three variables (as numpy arrays) in play without getting memory errors.
Running calculations on data in numpy arrays is very fast, I suppose because everything is held in memory. The limits to that memory, however, render "the numpy route" unusable for working with larger rasters.
I like the new Map Algebra, but what was ESRI thinking when they decided to give us this set of "features"? Please give us back $$rowmap and $$colmap. Please let us define rasters to have higher levels of precison. Please give us a supplemental programming option that lets us handle large rasters (arrays).
Currently my only option is to write programs that won't work for large rasters. Because I do a lot of work with LiDAR-based elevation data, that's not much of an option. Does anyone have workarounds for these issues?
Regards, Tim.L