sa.Sample Performance

JamesCrandall · ‎03-03-2014

I am invoking arcpy.sa.Sample() method on 540 individual rasters in a FGDB using a point feature class in the same workspace that contains 50 points. S-L-O-W is the word. Alternatives that have better performance? (prelim testing with ExtractMultiValuesToPoints doesn't offer any better performance).

The 540 rasters I am processing:

236 cols
552 rows
2400, 2400 cellsize

The Point FC has 50 points.

It take about 3 hours to generate an output result and I need to drastically reduce this. Non-ESRI solutions are acceptable!

Thanks,
James

JamesCrandall · ‎03-03-2014

Well, I started a test run with the ExtractMultiValuesToPoints method --- going on 5hrs now and my trigger finger is itching to kill the process as it is worse than sa.Sample at this point.

Any input is appreciated!
j

curtvprice · ‎03-04-2014

This raster is pretty small.

If you are at 10.1 or later, have you considered copying it to the in_memory workspace and seeing if that runs faster (I would expect it too, though you may have to try the different sampling tools to see which one works best.)

JamesCrandall · ‎03-04-2014

This raster is pretty small.

If you are at 10.1 or later, have you considered copying it to the in_memory workspace and seeing if that runs faster (I would expect it too, though you may have to try the different sampling tools to see which one works best.)

Hi Curt,

I'd have to copy the whole set of 540 rasters into in_memory. Not adverse to that, and I will give it a go!

ShaunWalbridge · ‎03-06-2014

James,

Assuming 32bit values, you're still only looking at something like 268MB (236*552*540*4 bytes) of data, which should fit into memory without a problem. Doing that for both your rasters and points would be a good place to start as Curtis mentions, as your sampling has to iterate over all points and rasters which isn't particularly fast. Another option that would take more work is to stack the rasters using multidimensional arrays, such as with NumPy or NetCDF. Then, you can pull out the values of all 540 rasters by sampling a single vector (all rasters at one point) which should greatly improve performance.

cheers,
Shaun

JamesCrandall · ‎03-10-2014

James,

Assuming 32bit values, you're still only looking at something like 268MB (236*552*540*4 bytes) of data, which should fit into memory without a problem. Doing that for both your rasters and points would be a good place to start as Curtis mentions, as your sampling has to iterate over all points and rasters which isn't particularly fast. Another option that would take more work is to stack the rasters using multidimensional arrays, such as with NumPy or NetCDF. Then, you can pull out the values of all 540 rasters by sampling a single vector (all rasters at one point) which should greatly improve performance.

cheers,
Shaun

I'm all ears.

Could you provide just a tad more guidance on this? What would the approach look like?

1. Loop all 540 and convert to a list of numpy arrays


rasterarray = []
arcount = 1
for conraster in concrasters:
   rasArray = "rasArray" + str(arcount)
   rasArray = arcpy.RasterToNumPyArray(conraster)
   rasterarray.append(rasArray)
   arcount = arcount + 1

2. Sample with my input polyine. I am not exactly sure of the options here. Suggestions?

Thanks!

curtvprice · ‎03-12-2014

This sounds like a great idea. NumPy arrays stored in my experience very efficiently, so if you have zero's/nodata etc you probably will be just fine memory wise.

Sample with my input polyline.

]

You'd have to convert your line to points and then convert the points to numpy row-col coordinates using basic geometric arithmetic against your extent and cell size.

JamesCrandall · ‎03-17-2014

This sounds like a great idea. NumPy arrays stored in my experience very efficiently, so if you have zero's/nodata etc you probably will be just fine memory wise.

You'd have to convert your line to points and then convert the points to numpy row-col coordinates using basic geometric arithmetic against your extent and cell size.

Thanks for your input, Curtis.

So if I have converted my rasters to a list of numpy arrays and my polyline as a numpy row/col array too, what do you envision as the next step?