def multi_watershed(pnts, branchID, flowdir, flowacc, scratchWks): direc = tempfile.mkdtemp(dir = scratchWks) # If called in a pll process, needs to write to seperate directories arcpy.env.scratchWorkspace = direc polylist = [] for i, p in enumerate(pnts): pnt = arcpy.PointGeometry(arcpy.Point(p.x, p.y, ID=i)) #Convert the shapely point to an arcpy point pourpt = sa.SnapPourPoint(pnt, flowacc, 1000) ws = sa.Watershed(flowdir, pourpt) out = os.path.join(direc, "pol_%i"%i) #Generate a filename for the output polygon arcpy.RasterToPolygon_conversion(ws, out) polylist.append(out) #Append the output file to the list to be returned res = (branchID, polylist) return res
def watershed_pll(data, flowdir, flowacc, tempfolder, proc=4): """ Calculate the watershed for each station point using parallel processing """ pool = Pool(processes = proc) jobs = [] for key, val in data.iteritems(): jobs.append(pool.apply_async(multi_watershed, (val, key, flowdir, flowacc, temp))) pool.close() pool.join() return jobs
Hi James and Cody
I'm currently busy with my Masters Thesis where I'm developing a geostatistical monte carlo model (conditional sequential guassian simulation) to measure uncertainty within stream networks derived from DEM using Arc Hydro D8 algorithm. My simulation requires a 100 simulations of my study area using Arc Hydro. I'm looking to multi-process the Flow Direction and Flow Accumulation process, but battling to figure out how to cut up the DEM as the following processes require the entire dem to generate the flow direction and flow accumulation. Any ideas or advice would be appreciated.
Regards
Peter,
Could you clarify the 100 simulations bit.
Are they 100 different versions of the Flow Direction/Accumulation, or is there only one base Flow Direction/Accumulation that your GA model uses?
Taking a guess, if your creating something that does flow direction/accumulation it would seem imperative that having the whole raster would be critical to determining the correct information for flow.
My other guess would be if you could determine where the watersheds are in the area, and break it down into multiple smaller sections you could split the DEM that way potentially.
Hard to say without having an understanding of the data though, so hopefully i can help more in a bit.
Cheers.
Hi Cody
I'l try to explain the workflow of the Monte Carlo Simulation.
The Monte Carlo simulation takes a raw hydrological DEM\DTM as input and using statistics generates a new DEM\DTM that represents error, by adjusting the elevation values. The following is repeated a certain amount o times, the more the better to produce a better probability distribution of the error. Each DEM\DTM that is produced from the Monte Carlo simulation is then processed using Arc Hydro Terrain Preprocessing to generate a stream network from the DEM. The Spatial Analyst Hydrology Tools are the same tools (i.e. Flow Direction ; Flow Accumulation ; Stream Definition). My simulation requires that I generate 1000 simulations to produce a new version of the stream network. The 1000 version of the stream network are then compared to quantify where the uncertainty\error is found within the DEM by how many times the stream network is the same for each version and where it differs due to the change in heights introduced by the simulation.
The Flow Direction and Flow Acummulation processes are exceptionally computational and currently the tools don't make use of multiprocessing or the the additional memory available as part of the 64bit architecture. I'm looking for a way to split the DEM\DTM and process it using multiprocessing and stick it back together at the end of the process. As you mentioned if there was a way to identify the location of the catchments before hand one could split the DEM\DTM accordingly.
Regards
My answer at the moment would be to look elsewhere than arcpy/arcgis for numerical modelling.
Perhaps arcobject would allow better control, but there are certainly faster algorithms out there that use less memory consumption than arc hydro's offerings.
D8 algorithms are not hard to program yourself and you do not need to look at a 'whole raster' at once. D8 algorithms only look at 9 cells at a time - the current cell and it's neighbours.
The difficulty is in how to manage edge cases.
I did a quick example a while ago for work showing that with python/cython and gdal, you could process large rasters (in a format supported by gdal) far quicker than archydro does it by streaming 3 rows at a time and shipping these out to parallel processes.