I'm busy rewriting my hydrological model python scripts into functions that are modular and found that my original python script runs twice as fast as my updated python script. The only reason I can find is that the original scripts outputs are saved in memory before being passed onto the next function. My new python script is writing and reading each output from disk. I might be incorrect in my understanding. If anyone can give me guidance on improving my new python script in both pythonic structure and performance.
Original Python Script
''' Created on May 20, 2015 @author: PeterW ''' # import system modules and site packages import os import time import arcpy import ArcHydroTools # check out Spatial Analyst Extension arcpy.CheckOutExtension("Spatial") # set environment settings arcpy.env.overwriteOutput = True # set input and output arguments raw = r"E:\Python\Temp\Model04\DEM\raw" rasWs = r"E:\Python\Temp\Model04\Layers04" outWs = r"E:\Python\Temp\Model04\Model04.gdb" # Processing time def hms_string(sec_elapsed): h = int(sec_elapsed / (60 * 60)) m = int(sec_elapsed % (60 * 60) / 60) s = sec_elapsed % 60 return "{}h:{:>02}m:{:>05.2f}s".format(h, m, s) start_time1 = time.time() # archydro variables fill_sinks = os.path.join(rasWs, "fil") flow_dir = os.path.join(rasWs, "fdr") flow_acc = os.path.join(rasWs, "fac") streams = os.path.join(rasWs, "str") stream_seg = os.path.join(rasWs, "strlnk") catchment_grid = os.path.join(rasWs, "cat") catchment_poly = os.path.join(outWs, "Layers","Catchment") drainage_line = os.path.join(outWs, "Layers", "DrainageLine") adj_catch = os.path.join(outWs, "Layers", "AdjointCatchment") try: # calculate the fill sinks arcpy.AddMessage("Processing Fill Sinks") ArcHydroTools.FillSinks(raw, fill_sinks) # calculate the flow direction arcpy.AddMessage("Processing Flow Direction") ArcHydroTools.FlowDirection(fill_sinks, flow_dir) # calculate the flow accumulation arcpy.AddMessage("Processing Flow Accumulation") ArcHydroTools.FlowAccumulation(flow_dir, flow_acc) # calculate the maximum flow accumulation arcpy.AddMessage("Processing Flow Accumulation Maximum") maxcellsResult = arcpy.GetRasterProperties_management(flow_acc, "MAXIMUM") maxcells = maxcellsResult.getOutput(0) print maxcells # calculate the stream threshold number of cells arcpy.AddMessage("Processing Stream Threshold") stream_threshold_numcells = (int(maxcells)*0.125/100) print stream_threshold_numcells # calculate the stream definition arcpy.AddMessage("Processing Stream Definition") ArcHydroTools.StreamDefinition(flow_acc, stream_threshold_numcells, streams) # calculate the stream segmentation arcpy.AddMessage("Processing Stream Segmentation") ArcHydroTools.StreamSegmentation(streams, flow_dir, stream_seg) # calculate the catchment grid delineation arcpy.AddMessage("Processing Catchment Grid Delineation") ArcHydroTools.CatchmentGridDelineation(flow_dir, stream_seg, catchment_grid) # calculate the catchment polygons from the catchment grid arcpy.AddMessage("Processing Catchment Polygons") ArcHydroTools.CatchmentPolyProcessing(catchment_grid, catchment_poly) # calculate the drainage lines from the stream segmentation grid arcpy.AddMessage("Processing DrainageLines") ArcHydroTools.DrainageLineProcessing(stream_seg, flow_dir, drainage_line) # calculate the adjoint catchment polygons arcpy.AddMessage("Processing Ajdoint Catchments") ArcHydroTools.AdjointCatchment(drainage_line, catchment_poly, adj_catch) arcpy.AddMessage("Completed Processing archydro Main Model") except: print(arcpy.GetMessages(2)) pass # Determine the time take to process hydrological characteristics end_time1 = time.time() print ("It took {} to process hydrological characteristics".format(hms_string(end_time1 - start_time1))) arcpy.CheckInExtension("Spatial")
New Python Script
''' Created on Feb 24, 2016 @author: PeterW ''' # import system modules and site packages import time from pathlib import Path import arcpy import ArcHydroTools # check out extension arcpy.CheckOutExtension("Spatial") # set environment settings arcpy.env.overwriteOutput = True # set input and output arguments dem = r"E:\Python\Temp\Model04\DEM\raw" raster_workspace = r"E:\Python\Temp\Model04\Layers04" fgdb = Path(r"E:\Python\Temp\Model04\Model04.gdb") # Processing time def hms_string(sec_elapsed): h = int(sec_elapsed / (60 * 60)) m = int(sec_elapsed % (60 * 60) / 60) s = sec_elapsed % 60 return "{}h:{:>02}m:{:>05.2f}s".format(h, m, s) start_time1 = time.time() # generate fill sinks grid def fill_sinks(dem, raster_workspace): fill_sinks = "{0}\\{1}".format(raster_workspace, "fil") arcpy.AddMessage("Processing Fill Sinks") ArcHydroTools.FillSinks(dem, fill_sinks) fill_sinks(dem, raster_workspace) # generate flow direction grid def flow_direction(raster_workspace): fill_sinks = "{0}\\{1}".format(raster_workspace, "fil") flow_direction = "{0}\\{1}".format(raster_workspace, "fdr") arcpy.AddMessage("Processing Flow Direction") ArcHydroTools.FlowDirection(fill_sinks, flow_direction) flow_direction(raster_workspace) # generate flow accumulation grid def flow_accumulation(raster_workspace): flow_direction = "{0}\\{1}".format(raster_workspace, "fdr") flow_accumulation = "{0}\\{1}".format(raster_workspace, "fac") arcpy.AddMessage("Processing Flow Accumulation") ArcHydroTools.FlowAccumulation(flow_direction, flow_accumulation) flow_accumulation(raster_workspace) # calculate stream threshold based on 0.5% of maximum flow accumulation def stream_threshold(raster_workspace): flow_accumulation = "{0}\\{1}".format(raster_workspace, "fac") arcpy.AddMessage("Processing Flow Stream Threshold") maxcellsResult = arcpy.GetRasterProperties_management(flow_accumulation, "MAXIMUM") maxcells = maxcellsResult.getOutput(0) arcpy.AddMessage("{} Maximum Cells".format(maxcells)) stream_threshold_numcells = (int(maxcells)*0.5/100) arcpy.AddMessage("{} Stream Threshold".format(stream_threshold_numcells)) return stream_threshold_numcells numcells = stream_threshold(raster_workspace) # generate the stream definition grid def stream_definition(raster_workspace): flow_accumulation = "{0}\\{1}".format(raster_workspace, "fac") stream = "{0}\\{1}".format(raster_workspace, "str") arcpy.AddMessage("Processing Stream Definition") ArcHydroTools.StreamDefinition(flow_accumulation, numcells, stream) stream_definition(raster_workspace) # generate the stream segmentation grid def stream_segmentation(raster_workspace): stream = "{0}\\{1}".format(raster_workspace, "str") flow_direction = "{0}\\{1}".format(raster_workspace, "fac") stream_link = "{0}\\{1}".format(raster_workspace, "strlnk") arcpy.AddMessage("Processing Stream Segmentation") ArcHydroTools.StreamSegmentation(stream, flow_direction, stream_link) stream_segmentation(raster_workspace) # calculate the catchment grid delineation def catchment_grid(raster_workspace): flow_direction = "{0}\\{1}".format(raster_workspace, "fdr") stream_link = "{0}\\{1}".format(raster_workspace, "strlnk") catchment_grid = "{0}\\{1}".format(raster_workspace, "cat") arcpy.AddMessage("Processing Catchment Grid Delineation") ArcHydroTools.CatchmentGridDelineation(flow_direction, stream_link, catchment_grid) catchment_grid(raster_workspace) # calculate the catchment polygons from the catchment grid def catchment_polygon(raster_workspace, fgdb): catchment_grid = "{0}\\{1}".format(raster_workspace, "cat") catchment_polygon = "{0}\\{1}".format(Path(fgdb, "Layers"), "Catchment") arcpy.AddMessage("Processing Catchment Polygon") ArcHydroTools.CatchmentPolyProcessing(catchment_grid, catchment_polygon) catchment_polygon(raster_workspace, fgdb) # calculate the drainage lines from the stream segmentation grid def drainage_line(raster_workspace, fgdb): stream_link = "{0}\\{1}".format(raster_workspace, "strlnk") flow_direction = "{0}\\{1}".format(raster_workspace, "fdr") drainage_line = "{0}\\{1}".format(Path(fgdb, "Layers"), "DrainageLine") arcpy.AddMessage("Processing DrainageLine") ArcHydroTools.DrainageLineProcessing(stream_link, flow_direction, drainage_line) drainage_line(raster_workspace, fgdb) # calculate the adjoint catchment polygons def adjoint_catchment(fgdb): drainage_line = "{0}\\{1}".format(Path(fgdb, "Layers"), "DrainageLine") catchment_polygon = "{0}\\{1}".format(Path(fgdb, "Layers"), "Catchment") adjoint_catchment = "{0}\\{1}".format(Path(fgdb, "Layers"), "AdjointCatchment") arcpy.AddMessage("Processing Adjoint Catchment") ArcHydroTools.AdjointCatchment(drainage_line, catchment_polygon, adjoint_catchment) adjoint_catchment(fgdb) arcpy.CheckInExtension("Spatial") # Determine the time take to process hydrological characteristics end_time1 = time.time() print ("It took {} to process hydrological characteristics".format(hms_string(end_time1 - start_time1)))
Solved! Go to Solution.
These won't improve performance*, but will make it more readable, less code to manage, and less code to mess up.
*It may insignificantly improve by nanoseconds by not having to re-create the same variable over and over again!
try timing the individual components to see where your bottleneck is. This decorator needs your functions to return something... even if it is nothing. Your functions don't return anything
from functools import wraps #------- decorator ----------------------- def delta_time(func): """simple timing decorator function""" @wraps(func) def wrapper(*args, **kwargs): t0 = time.time() # start time result = func(*args, **kwargs) # ... run the function ... t1 = time.time() # end time print("Results for... {}".format(func.__name__)) print(" time taken ...{:12.9f} sec.".format(t1-t0)) #print("\n {}".format(result)) # uncomment to print within wrapper return result # return the result of the function return wrapper
useage... just put the @delta_time line above functions that you want to time
@delta_time def alpha(input=0): """useless function 1 """ return input
Hi Dan
Thanks for the following, I'll have a look at the following this evening and get back to you if I get stuck, if that fine with you.
Totally unrelated to your problem, but I noticed you are using string processing to build file paths. There is a built in python module that handles this for you, and works across operating systems etc if this is ever a concern.
In simple terms, it joins parameters, adding a slash in between.
fill_sinks = "{0}\\{1}".format(raster_workspace, "fil")
Becomes:
fill_sinks = os.path.join(raster_workspace, "fil")
You can also do as many parameters as you like to build longer paths.
fill_sinks = os.path.join(raster_workspace, "fil", "output", "test.shp")
Returns: 'E:\Python\Temp\Model04\Layers04\fil\output\test.shp'
Hi Luke
If you look at my original Python Script I used os.path.join and decided to move to PathLib Path. The unfortunate thing is that all ArcPy scripts require paths to be string and not objects hence using strings to create the paths from the Path objects.
Peter, I suspect most are still using python 2.7.x. For those that are interested, pathlib became available in python 3.4 more documentation is given here
11.1. pathlib — Object-oriented filesystem paths — Python 3.5.1 documentation
Yeah.... I'm still on 2.7.x. What's the problem with os.path.join at 3.4? Not having 3.7 installed, but just looking through the documentation and playing around with a 3.x shell online, it seems that os.path.join returns a string object as it always has. It looks like the new pathlib is a way to store paths as objects instead of as strings so instead of having to do os.path.xyz(), you now may be able to do path.xyz.
Where I'm confused is how this affects the script. At the end of the day, Peter Wilson is just formatting the path object back to a string. I don't see why os.path.join wouldn't have worked here (not that it's better than using the pathlib). Also it looks path has its own ways to concatenate paths.
I went ahead and installed pathlib into 2.7. It looks like you can do this:
# constructor p = Path('a', 'b', 'c') print(p) # append paths to end with the "/" p = p / 'd' print(p) print(str(p))
and it will print
WindowsPath('a/b/c) WindowsPath('a/b/c/d') a\b\c
Either way
os.path.join str(Path) # and "{}".format(Path)
should all get you to the same end goal: A string representing a path.
Jason
It is an alternate... that is all, the initial question was why Peter was doing it... the answer is because he is using python 3.4 and for his environment, he must prefer it.
I haven't completed my tests to see how it handles escape characters should one forget to use raw notation (there is a current thread on paths and path constructs and Darren posted a poll). One can also use glob as an alternate to os.
I think pathlib may be an attempt to further unify file handling under multiple platforms and environments. Further testing might reveal advantages that I haven't come across yet.
ADDENDUM
Filenames and file paths in Python
for addition quasi-related materials
I misread his comment. I thought that he had moved from os.path.join to Path because he was having issues with os.path.join() and arcpy. I now see that he moved *in favor* of Path but that arcpy isn't expecting a Path object but a string.
From a python perspective the Path object is an improvement because you are now dealing with a smarter object than a dumb string. The string doesn't know it is a path so you have to call some function to get the file name or directory or to append a subdirectory to the string. From looking at the methods, it looks like you can even create a directory or check if a path exists or not, all from the object itself without having to call another function.
As for escape characters, it looks like it doesn't handle them unless you handle them: same as a string or os.path.join(). At the end of the day, I think you're still better off letting python add your slashes and just pass in your directory names in a list and let python make your path for you, whether that be os.path.join(a,b,c,d,e,...) or Path(a,b,c) / d / e / ...
# WRONG -- Don't use \t or \a p = Path("dog\turtle\ant") p >> WindowsPath('dog\turtle\x07nt') print(p) >> dog urtlent # RIGHT p = Path("dog", "turtle", "ant") p >> WindowsPath('dog/turtle/ant') print(p) >> dog\turtle\ant # If you need to append, you are using the / operator, not the literal character '/' p / 'cat' / 'mouse >> WindowsPath('dog/turtle/ant/cat/mouse')
Jason...
Good, this will be useful to know for people working in ArcGIS Pro which is using python 3.4.x. ArcMap 10.4 is sticking with the python 2.7.x version.
So as people ... (transition from/stick with )... (ArcMap/ArcGIS Pro) things with become ..(clearer/more confusing).