Python Functions?

PeterWilson · ‎07-06-2015

I've been writing Python scripts for a while now, and when a colleague of mine viewed my code, he cringed that it wasn't structured into Python Functions. I'd like some advice from the ESRI community in how I could structure my Python scripts into functions\modules that will make it easier to reuse and call within new scripts. I've attached one of my Python scripts that uses ArcPy and ArcHydro Functions.

'''
Created on May 20, 2015
@author: PeterW
'''
# import system modules and site packages
import os
import arcpy
import ArcHydroTools
# check out Spatial Analyst Extension
arcpy.CheckOutExtension("Spatial")
# set environment settings
arcpy.env.overwriteOutput = True
# set input and output arguments
raw = r"F:\Projects\2015\G111443\ArcHydro\Methodology_Models\Section03\Sect3A\DEM04\raw"
rasWs = r"F:\Projects\2015\G111443\ArcHydro\Methodology_Models\Section03\Sect3A\Layers04"
outWs = r"F:\Projects\2015\G111443\ArcHydro\Methodology_Models\Section03\Sect3A\Model04.gdb"

# ArcHydro variables
fill_sinks = os.path.join(rasWs, "fil")
flow_dir = os.path.join(rasWs, "fdr")
flow_acc = os.path.join(rasWs, "fac")
streams = os.path.join(rasWs, "str")
stream_seg = os.path.join(rasWs, "strlnk")
catchment_grid = os.path.join(rasWs, "cat")
catchment_poly = os.path.join(outWs, "Layers","Catchment")
drainage_line = os.path.join(outWs, "Layers", "DrainageLine")
adj_catch = os.path.join(outWs, "Layers", "AdjointCatchment")

try:
    # calculate the fill sinks
    arcpy.AddMessage("Processing Fill Sinks")
    ArcHydroTools.FillSinks(raw, fill_sinks)
    
    # calculate the flow direction
    arcpy.AddMessage("Processing Flow Direction")
    ArcHydroTools.FlowDirection(fill_sinks, flow_dir)
        
    # calculate the flow accumulation
    arcpy.AddMessage("Processing Flow Accumulation")
    ArcHydroTools.FlowAccumulation(flow_dir, flow_acc)
    
    # calculate the maximum flow accumulation
    arcpy.AddMessage("Processing Flow Accumulation Maximum")
    maxcellsResult = arcpy.GetRasterProperties_management(flow_acc, "MAXIMUM")
    maxcells = maxcellsResult.getOutput(0)
    print maxcells
    
    # calculate the stream threshold number of cells
    arcpy.AddMessage("Processing Stream Threshold")
    stream_threshold_numcells = (int(maxcells)*0.25/100)
    print stream_threshold_numcells
    
    # calculate the stream definition
    arcpy.AddMessage("Processing Stream Definition")
    ArcHydroTools.StreamDefinition(flow_acc, stream_threshold_numcells, streams)
    
    # calculate the stream segmentation
    arcpy.AddMessage("Processing Stream Segmentation")
    ArcHydroTools.StreamSegmentation(streams, flow_dir, stream_seg)
    
    # calculate the catchment grid delineation
    arcpy.AddMessage("Processing Catchment Grid Delineation")
    ArcHydroTools.CatchmentGridDelineation(flow_dir, stream_seg, catchment_grid)
    
    # calculate the catchment polygons from the catchment grid
    arcpy.AddMessage("Processing Catchment Polygons")
    ArcHydroTools.CatchmentPolyProcessing(catchment_grid, catchment_poly)
    
    # calculate the drainage lines from the stream segmentation grid
    arcpy.AddMessage("Processing DrainageLines")
    ArcHydroTools.DrainageLineProcessing(stream_seg, flow_dir, drainage_line)
    
    # calculate the adjoint catchment polygons
    arcpy.AddMessage("Processing Ajdoint Catchments")
    ArcHydroTools.AdjointCatchment(drainage_line, catchment_poly, adj_catch)
    
    arcpy.AddMessage("Completed Processing ArcHydro Main Model")
except:
    print(arcpy.GetMessages(2))
    pass
arcpy.CheckInExtension("Spatial")

Any advice and assistance will be appreciated.

Regards

Peter Wilson

RebeccaStrauch__GISP · ‎07-06-2015

I'm not one to answer this in full, but I have a few files that I read in each time (thanks to Freddie Gibson and/or Jeff Barrette ) that take care of a few things for me.

a util file that takes care of the arcpy.AddMessage vs. the debugging print statement; appends a datetime to the message (since that isn't always done), and creates a log file. (mine is called ADFGutils.py)
a "decorator" file that takes care of the try...except part for me. (gpdecorators.py)

These can be place in the (example) c:\Python27\ArcGIS10.3\Lib folder, or in another relative location to your calling script.

the ADFGutils.py (...named for my dept, but you can name whatever you want)

call in your program:

from ADFGutils import *

Use similar to print or arcpy.addMessage (may duplicate the message in some debuggers)

myMsgs()

import time
import arcpy
import os
from time import localtime

def timeStamp():
    """
    returns time stamp.
    """
    return time.strftime(' --  %B %d - %H:%M:%S')

def myMsgs(message):
    arcpy.AddMessage(message + ' %s' %(timeStamp()))
    print(message + ' %s' %(timeStamp()))

    global messageCount
    logFolder = r"C:\ESRITEST"
    if not arcpy.Exists(logFolder):
        arcpy.CreateFolder_management(os.sep.join(logFolder.split(os.sep)[:-1]), logFolder.split(os.sep)[-1])
    mdy = curDate()
    logName = "logfile_" + "_".join(mdy.split("/")) + ".log"
    logFile = open(os.path.join(logFolder, logName), "a")  #a=append, w=create new
    if message.lower() == "blank line":
        logFile.write("\n\n")
        print "\n\n"
    elif message.lower() == "close logfile":
        logFile.write("\n\n*****  finished  *****\n\n")
        logFile.close()
    else:
        messageCount += 1
        logFile.write("0" * (5 - len(str(messageCount))) + str(messageCount) + ".   ")
        logFile.write(message)
        logFile.write("\n")
        #print message
        #arcpy.AddMessage(message)

def curDate():
    rawTime = localtime()
    yr = str(rawTime[0]) # Collect the year from the rawTime variable
    mo = str(rawTime[1]) # Collect the month from the rawTime variable
    dy = str(rawTime[2]) # Collect the day from the rawTime variable
    return "/".join([mo, dy, yr])

messageCount = 0

The gpdecorator.py

To call in your program

from gpdecorators import *

To use....at the

# catch_errors decorator must preceed a function using the @ notation.

@catch_errors

def main():

"""

Main function to create the new master feature dataset.

"""

# Script arguments...

'''.......your program'''

myMsgs('!!! Success !!! ')

# End main function

if __name__ == '__main__':

main()

"""
A decorator to wrap error handling.
"""
import sys as _sys
import traceback as _traceback
import arcpy

def catch_errors(func):
    """
    Decorator function to support error handling
    """
    def decorator(*args, **kwargs):
        """
        Decorator function
        """
        try:
            f = func(*args, **kwargs)
            return f
        except Exception:
            tb = _sys.exc_info()[2]
            tbInfo = _traceback.format_tb(tb)[-1]
            arcpy.AddError('PYTHON ERRORS:\n%s\n%s: %s\n' %
                             (tbInfo, _sys.exc_type, _sys.exc_value))
            print('PYTHON ERRORS:\n%s\n%s: %s\n' %
                             (tbInfo, _sys.exc_type, _sys.exc_value))
            gp_errors = arcpy.GetMessages(2)
            if gp_errors:
                arcpy.AddError('GP ERRORS:\n%s\n' % gp_errors)
                print('GP ERRORS:\n%s\n' % gp_errors)
    # End decorator function
    return decorator
# End catch_errors function

if __name__ == '__main__':
    pass

That may not be all your colleague was pointing out, but I know those help me. These most likely could be in one file, btw, but I've never combined them. fwiw.

Luke_Pinner · ‎07-06-2015

For me, the main reasons to refactor code into functions (and classes, modules, packages...) are readability and reusability.

On readability - if I have a long script with a number of steps, each with a number of lines of code, I will usually bust each step out into a function so it's easier to follow the overall logic of the script.

On reusability - if I find myself writing the same code over and over again to do the same thing with different inputs, I will turn that code into a function.

I don't see anything in your code that requires moving into a function. It's a very straightforward script, is very readable and there's nothing in it that is reusable.

A cautionary note... it's easy to get too caught up in refactoring and go too far. I was recently reviewing another programmers code where everything was a function, and functions called functions which called more functions ad-infinitum and understanding what the script was doing meant jumping all over the place in the file. It was basically unreadable.

PeterWilson · ‎07-07-2015

Hi Luke

Thanks for you advice, it's truly appreciated. Is there anyway of better handling my Arc Hydro Variables. The inputs\outputs are either rasters or feature classes\tables and they are being read\written to two workspaces.

The rasters are being saved or read from a single folder and the feature classes\tables are being saved or read from a File Geodatabase.

Regards

Peter Wilson

Luke_Pinner · ‎07-08-2015

Peter, your variables look fine to me. The only suggestion I might make is to pass your inputs in to the script using arguments and access them with sys.argv or arcpy.GetParameterAsText. And I'd only recommend that if you're going to be using the code as a commandline or ArcToolbox script tool and you may want to use different inputs/workspaces when running the code.

I write code that ranges from simple step-by-step scripts to large libraries that are spread across multiple files in python packages. Here's how I roughly think about when deciding whether to modularise/refactor code:

For straightforward scripts, I will hardcode variables such as you have done.
If I want to reuse the script with different inputs, I will pass the inputs as parameters/arguments and access them with sys.argv or arcpy.GetParameterAsText
If it's a long and complex script, I might turn some of the steps into functions to enhance readability or if I want to reuse a bit of code within the script
if parts of the code could be reused in other scripts, I will turn that bit of code into a function or class and put it in a module that can be imported. I'll group similar bits of code in individual modules, i.e a module for statistics related code, one for certain types of filesystem IO, one for geometry operations, one for interacting with ArcObjects, etc...

ChrisSmith7 · ‎07-07-2015

I would like to agree with your cautionary note. Also, regarding OOP, I always keep this in mind:

The problem with object-oriented languages is they've got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.
-Joe Armstrong, developer of Erlang programming language.

I like OOP, I do, but sometimes I've seen cases where this has definitely been true.

JoshuaBixby · ‎07-09-2015

I don't have any comments regarding the formatting/structure of the code. I think others have already provided good feedback on that topic.

In taking a quick look over your code, I do have a couple questions regarding content, if you are open to discussing content as well as structure.

What is the purpose of the pass statement on Line #80 or what do you think it is doing?
Seeing the use of ArcPy messaging, I assume you are running this as a script tool. Am I right?
- I ask because the current structure of the code will basically hide errors from the user, i.e., the Results window will always show the script as completing even if it generated an error and didn't complete. As The Zen of Python states: "Errors should never pass silently."