Script running slow, how to speed up?

LauraBlackburn · ‎06-03-2015

I have a script that is creates a buffer around points, converts the round buffers to squares, clips a road layers with the square, then calculates the length for the clipped roads and exports data to a csv file. I have run this script before and it took roughly 3 days to complete. This time when I ran the code it's projected to take a month to mine the data. I have no idea why it might be slowing down and could really use some help figuring this out. Does anyone know how I could speed up this process?

# -*- coding: utf-8 -*-
# ---------------------------------------------------------------------------
# 2_TrapStatsAndFloatingRoadDensity.py
# Altered on: 2015-6-2 10:00 am
# Created by: L.M.Blackburn
# ArcGIS 10.1
# Notes: iterates thru orders shapefiles and creates a buffer
#        around each point. The round buffer is converted to a feature envelope
#        which makes a square polygon around the point. This square is used to
#        clip the roads layer. Then I calculate length for the clipped roads by adding
#        a field and calculating geometry. Run statistics to get the sum length
#        of roads in each square. export sum length, block name, centroid of
#        square out to a text file
#        (2nd script for 4th part of analysis)
# Needed data: Order sublayer from VRP
#              List of fields to drop
#              Roads network to get length data               
# ---------------------------------------------------------------------------

# Import arcpy module
import arcpy, os, sys, traceback, csv

# Set the necessary product code
import arceditor
import arcinfo

# Set workspace environment - this is where the solved route layers are located
arcpy.env.workspace = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Orders\\'
arcpy.env.overwriteOutput = True

# --------------------Set local variables ------------------
# ---workspace for order layers
ordersLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Orders\\'
# ---workspace for buffer layers
bufferLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Buffers\\'
# ---set default buffer distance
bufDefault = "6000 Meters"
# ---workspace for envelope layers
envelopeLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Envelopes\\'
# ---workspace for route layers
routeWkspc = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Routes\\'
# ---set workspace path for clipped roads
outDataPath = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\ClipRoads\\'
# ---set road layer - for use in clip & length
Road = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\RoadsForSnap\\SDC Edge Source.shp'
# ---fields to delete from Orders layers
dropFields = ["Descriptio", "ServiceTim", "TimeWind_1", "TimeWind_2", "TimeWind_3", "TimeWindow", "MaxViolati", "MaxViola_1", \
              "PickupQuan", "DeliveryQu", "Revenue", "SpecialtyN", "Assignment", "RouteName", "ViolatedCo", "CumulTrave", \
              "CumulDista", "CumulTime", "ArriveCurb", "DepartCurb", "DepartTime", "WaitTime", "ViolationT", "CumulWaitT", \
              "CumulViola", "ArriveTime", "SourceID", "SourceOID", "PosAlong", "SideOfEdge", "CurbApproa", "Status"]
              
try:
    print 'Opening CSV file...'
    # Step 1 - Create table for output data
    # Needed fields: From script [Block, Scale, Sequence, FromPrevTr, FromPrevDi, totTime, totDist, x, y, totalRdLength]
    CSVFile = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\AllTrapsData_June2.csv'
    f = open (CSVFile, "wb")
    w = csv.writer(f, delimiter=',', lineterminator='\n')
    fieldNames = ['Block', 'Scale', 'BufferType', 'Sequence', 'FromPrevTr', 'FromPrevDi', 'x', 'y', 'bufferDist','totTime', \
                  'totDist', 'totalRdLength', '\n']
    w.writerow(fieldNames)

    # Step 2 - Loop through files in orders folder
    # Create variable to hold orders feature class files
    '''
    orderList = arcpy.ListFiles("*.shp")

    # Loop through layers & break name into segments to be used when naming orders lyrs
    print 'Creating Buffers...' 
    for shp in orderList:
        lyrs = arcpy.mapping.Layer(ordersLyr + shp)
        splitName =shp.split('_')
        block = splitName[0]
        scale = splitName[1].rstrip('.shp')

        # define buffer distance 
        if scale == "1K":
            bufDist = "1000 Meters"
        elif scale == "2K":
            bufDist = "2000 Meters"
        elif scale == "3K":
            bufDist = "3000 Meters"
        elif scale == "4K":
            bufDist = "4000 Meters"
        elif scale == "6K":
            bufDist = "6000 Meters"
        elif scale == "500":
            bufDist = "500 Meters"
        else:
            bufDist = bufDefault
   
        print 'Order: ' + shp + ', Buffer: ' + bufDist
        # Jump into orderLayers to delete unneeded fields & add XY
        arcpy.DeleteField_management(shp, dropFields)
        arcpy.AddXY_management(shp)
                    
        # Step 4 - Run buffer on order layers     
        arcpy.Buffer_analysis(shp, bufferLyr + 'buf' + '_' + block + '_' + scale, bufDist, 'FULL', 'ROUND', 'NONE', '#')
        arcpy.Buffer_analysis(shp, bufferLyr + 'buf2' + '_' + block + '_' + scale, bufDefault, 'FULL', 'ROUND', 'NONE', '#')
    
    bufferList = []
    for dpath, dnames, fnames in arcpy.da.Walk(bufferLyr, datatype = 'FeatureClass', type = 'Polygon'):
        for files in fnames:
            bufferList.append(os.path.join(files))
    
    # Step 5 - Run feature envelope to polygon on buffer layers
    print 'Converting round buffers to squares...' 
    for bufShp in bufferList:
        bufSplitName = bufShp.split('_')
        blockID = bufSplitName[1]
        scaleID = bufSplitName[2].rstrip('.shp')
        bufID = bufSplitName[0]
        print 'Buffer: ' + bufShp + ' Block: ' + blockID + ' Scale: ' + scaleID + ' BufferType: ' + bufID

        arcpy.FeatureEnvelopeToPolygon_management(bufferLyr + bufShp, envelopeLyr + bufID + '_' +  blockID + '_' +  scaleID, 'SINGLEPART')
    '''
    # Step 6 - Calculate totalTime and totalDistance using insert cursor
    # loop through each record in the table - calculate values and...
    # use selected features to clip roads layer & calculate geometry
    # add values to the CSV table
    print 'Populating CSV file...'
    envelopeList = []
    for dirpath, dirnames, filenames in arcpy.da.Walk(envelopeLyr, datatype = 'FeatureClass', type = 'Polygon'):
        for filename in filenames:
            envelopeList.append(os.path.join(filename))
    for eLayer in envelopeList:
        eLayerName = eLayer.split('_')
        blockEID = eLayerName[1]
        scaleEID = eLayerName[2].rstrip('.shp')
        bufEID = eLayerName[0]
        print 'eLayer: ' + eLayer + ' Block: ' + blockEID + ' Scale: ' + scaleEID + ' BufferType: ' + bufEID
        eLyrPath = envelopeLyr + eLayer
        
        #Make a layer from the feature class - needed for selecting records to be used in the clipping
        arcpy.MakeFeatureLayer_management(eLyrPath, "clipLayer")
        
        # use search cursor to grab data row by row to populate CSV file
        with arcpy.da.SearchCursor(eLyrPath, ("Sequence", "FromPrevTr", "FromPrevDi", "POINT_X", "POINT_Y",
                                                          "BUFF_DIST")) as cursor:
            for row in cursor:
                dataList = []
                dataList.append(str(blockEID))
                dataList.append(str(scaleEID))
                dataList.append(str(bufEID))
                dataList.append(str(row[0]))
                dataList.append(str(row[1]))                
                dataList.append(str(row[2]))
                dataList.append(str(row[3]))
                dataList.append(str(row[4]))
                dataList.append(str(row[5]))
                n = int(row[0])
                sN = "=" + str(n)
                n2 = n +1
                sel = "=" + str(n2)
                prevTime = float(row[1])
                prevDist = float(row[2])
                # print sN
                postTime = ""
                # use a nested search to get values for total time and total distance
                # there are some cases where the below expression will be false in that
                # event we will need to cacluclate the distance to the depot or just leave
                # the values for prevTime & prevDist as place holders, otherwise I end up with
                # road length being written in the wrong field
                expression = arcpy.AddFieldDelimiters(eLyrPath, "Sequence")+ sel
                for row2 in arcpy.da.SearchCursor(eLyrPath, ("FromPrevTr", "FromPrevDi"), expression):
                    postTime = row2[0]                    
                    totTime = prevTime + float(postTime)
                    # print totTime
                    dataList.append(str(totTime))
                    postDist = row2[1]
                    # print postDist
                    totDist = prevDist + float(postDist)
                    # print totDist
                    dataList.append(str(totDist))

                # if above for loop yeilds no results, calculate postTime and postDist differently
                # grab DepotVisits layer - if VisitType = "End" then grab values for "FromPrevTravelTime" & "FromPrevDistance"
                if postTime == "":  
                    
                    #select corresponding route layer file and grab sublayer: DepotVisits
                    print "extracting time & distance to Depot"
                    routeLyr = arcpy.mapping.Layer(routeWkspc + "TrapRoute_" + blockEID + "_" + scaleEID + ".lyr")
                    if routeLyr.isGroupLayer:
                        for sublyrs in routeLyr:
                            # print sublyrs.name
                            if sublyrs.name == 'Depot Visits':
                                arcpy.MakeFeatureLayer_management(sublyrs, "depotVisits")
                                qLast = "= 2"
                                expression2 = arcpy.AddFieldDelimiters("depotVisits", "VisitType")+ qLast
                                for row3 in arcpy.da.SearchCursor("depotVisits", ("FromPrevTravelTime", "FromPrevDistance"), expression2):
                                    postTime2 = row3[0]
                                    totTime2 = prevTime + float(postTime2)
                                    # print ("Depot Visit Time: {0}, {1}, {2}".format(prevTime, postTime2, totTime2))
                                    dataList.append(str(totTime2))
                                    postDist2 = row3[1]
                                    totDist2 = prevDist + float(postDist2)
                                    # print ("Depot Visit Distance: {0}, {1}, {2}".format(prevDist, postDist2, totDist2))
                                    dataList.append(str(totDist2))
               

                # select single record to use when clipping - this selection must be on a layer not fc
                # always create where clause using AddFieldDelimiters
                seqField = "Sequence"
                where_clause = arcpy.AddFieldDelimiters("clipLayer", seqField)+ sN
                arcpy.SelectLayerByAttribute_management("clipLayer", "NEW_SELECTION", where_clause)
                
                # Clip roads w/ selected polygon - for this I will need a search cursor
                clipRoads = outDataPath + 'rdsClip_' + blockEID + '_' + scaleEID + '_' + bufEID + '_' + str(n)
                arcpy.Clip_analysis(Road, "clipLayer", clipRoads)
                clipRoadsShp = clipRoads + ".shp"        

                # Use geometry/length to get the total length of clipped roads(in meters)
                g = arcpy.Geometry()
                geometryList = arcpy.CopyFeatures_management(clipRoadsShp, g)
                length = 0
                for geometry in geometryList:
                    length +=geometry.length

                # append length (meters) at end of line to csv dataList
                dataList.append(str(length))
                # dataList.append('\r\n')
                # print length 
                
                # Write dataList to csv file ---- may need to dedent this one more time ---
                w.writerow(dataList)
                # print dataList
        
    print 'Script complete'

    f.close()
except:
    if arcpy.Exists(CSVFile):
        w.writerow(dataList)
        print 'Data written in CSV file'
    f.close()
    print 'Program failed.'
    print 'Check python errors.'
    
    
    tb = sys.exc_info()[2]
    tbinfo = traceback.format_tb(tb)[0]
    pymsg = "PYTHON ERRORS:\nTraceback Info:\n" + tbinfo + "\nError Info:\n     " +        str(sys.exc_type) + ": " + str(sys.exc_value) + "\n"
    msgs = "ARCPY ERRORS:\n" + arcpy.GetMessages(2) + "\n"
    
    arcpy.AddError(msgs)
    arcpy.AddError(pymsg)

    print msgs
    print pymsg
    
    arcpy.AddMessage(arcpy.GetMessages(1))
    print arcpy.GetMessages(1)
    # print ("Points to select: {0}, Radius: {1}".format(centroidPath, radius))

LauraBlackburn · ‎06-04-2015

I am getting a bit hung up on implementing the dictionaries, mainly because I am grabbing values from a feature class and putting them into a CSV file. So, I don't think I can use an update cursor, instead I am using w.writerow with a list of records, right? Please correct me if I am wrong.

Also, I am not just joining data from one table to another. I am grabbing data one row at a time so that I can perform some data manipulations. Basically, I have a point (n) with information about the time and distance from the previous point (stops along a route). My code is adding the values for time and distance for point (n) to the time and distance for point (n+1). So, I am querying the same feature class twice to get data for point(n) and adding that to point(n+1).

Richard, any help or insight you can provide is greatly appreciated. I am a self-taught coder and I feel that I know just enough to be dangerous. If there are any books or courses that you found helpful, I would love to hear about it!

RichardFairhurst · ‎06-04-2015

You are correct that the UpdateCursor would be replaced by the w.writerow code, but that has nothing to do with the look up process. Anyway, dictionaries need to build the list object that you later feed to the w.writerow portion of your code. Additionally using a key that looks up the next key in the dicitionary works perfectly well and embedding cursors that point to the same data source gives even worse performance that embedded cursors accessing separate datasources.

Here is a stab at converting the code. Without knowing your data I assumed the Sequence field is a string field and that Sequence is a Unique key without duplicates in the data source. If Sequence is not a Unique key and can refer to multiple rows you will need to let me know so I can implement it as a list of lists value. I assumed a one to many relationship for the depot visits.

Also, I am unfamiliar with what expression2 creates and how it is used in the cursor expression:

expression2 = arcpy.AddFieldDelimiters("depotVisits", "VisitType")+ qLast

for row3 in arcpy.da.SearchCursor("depotVisits", ("FromPrevTravelTime", "FromPrevDistance"), expression2):

wouldn't that convert to this invalid invalid cursor set up (ignoring whatever delimiters apply)?

for row3 in arcpy.da.SearchCursor("depotVisits", ("FromPrevTravelTime", "FromPrevDistance"), "depotVisits", "VisitType"=2):

So I am ignoring the depotVisits and assuming it works with VisitType=2 until you explain it to me.

The code is untested, so expect some debugging effort to make sure everything is indented correctly. I free pasted and then modified a fair amount of the code from multiple sources so I could have left some mismatched syntax in the code. Let me know of any errors that are not obvious omissions or misspellings. The code below should replace lines 137 through 198 in your original code posting. The relative indentation of the code should make line 1 below indented at the same level as line 137 in your original code.

        sourceFC = None
        #select corresponding route layer file and grab sublayer: DepotVisits 
        print "extracting time & distance to Depot" 
        routeLyr = arcpy.mapping.Layer(routeWkspc + "TrapRoute_" + blockEID + "_" + scaleEID + ".lyr") 
        if routeLyr.isGroupLayer: 
            for sublyrs in routeLyr: 
                # print sublyrs.name 
                if sublyrs.name == 'Depot Visits': 
                    arcpy.MakeFeatureLayer_management(sublyrs, "depotVisits")
                    sourceFC = "depotVisits" 

        depotVisitsDict = {}
        if sourceFC == "depotVisits":

            sourceFieldsList = ["VisitType", "FromPrevTravelTime", "FromPrevDistance"]

            # Set up a One to Many Dictionary  
            with arcpy.da.SearchCursor(sourceFC, sourceFieldsList) as depots:
                 for depot in depots:
                    relatekey = depot[0]
                    if relatekey in depotVisitsDict:
                        depotVisitsDict[relatekey].append(depot[1:])
                    else:
                        depotVisitsDict[relatekey] = [depot[1:]]

        sourceFC = eLyrPath  

        sourceFieldsList = ["Sequence", "FromPrevTr", "FromPrevDi", "POINT_X", "POINT_Y", "BUFF_DIST"]  
  
        # Use list comprehension to build a dictionary from a da SearchCursor  
        sequenceDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)}

        for key in sequenceDict: 
            dataList = []  
            dataList.append(str(blockEID))  
            dataList.append(str(scaleEID))  
            dataList.append(str(bufEID))  
            dataList.append(str(key))  
            dataList.append(str(list(sequenceDict[key])[0]))                  
            dataList.append(str(list(sequenceDict[key])[1]))  
            dataList.append(str(list(sequenceDict[key])[2]))  
            dataList.append(str(list(sequenceDict[key])[3]))  
            dataList.append(str(list(sequenceDict[key])[4]))  
            n = int(key)  
            sN = "=" + str(n)  
            n2 = n +1  
            sel = "=" + str(n2)  
            prevTime = float(list(sequenceDict[key])[0])  
            prevDist = float(list(sequenceDict[key])[1])  
            # print sN  
            postTime = ""  
            # use a nested search to get values for total time and total distance  
            # there are some cases where the below expression will be false in that  
            # event we will need to cacluclate the distance to the depot or just leave  
            # the values for prevTime & prevDist as place holders, otherwise I end up with  
            # road length being written in the wrong field  
            if str(n2) in sequenceDict:
                values = list(sequenceDict[str(n2)])
                for row2 in values: 
                    postTime = row2[0]                      
                    totTime = prevTime + float(postTime)  
                    # print totTime  
                    dataList.append(str(totTime))  
                    postDist = row2[1]  
                    # print postDist  
                    totDist = prevDist + float(postDist)  
                    # print totDist  
                    dataList.append(str(totDist))  
  
                # if above for loop yeilds no results, calculate postTime and postDist differently  
                # grab DepotVisits layer - if VisitType = "End" then grab values for "FromPrevTravelTime" & "FromPrevDistance"  
                if postTime == "":    
                  
                    #select corresponding route layer file and grab sublayer: DepotVisits  
                    print "extracting time & distance to Depot"  
                    if 2 in depotVisitsDict:
                         for row3 in depotVisitsDict[2]:
                            postTime2 = row3[0]  
                            totTime2 = prevTime + float(postTime2)  
                            # print ("Depot Visit Time: {0}, {1}, {2}".format(prevTime, postTime2, totTime2))  
                            dataList.append(str(totTime2))  
                            postDist2 = row3[1]  
                            totDist2 = prevDist + float(postDist2)  
                            # print ("Depot Visit Distance: {0}, {1}, {2}".format(prevDist, postDist2, totDist2))  
                            dataList.append(str(totDist2))

BlakeTerhune · ‎06-08-2015

I hate to be a nitpicker, Richard, especially when you have all the answers, but you forgot a colon after your if statement on line 13 and are missing an extra right parenthesis at the end of lines 48 and 49. Great work though!

LauraBlackburn · ‎06-08-2015

Hi Richard,

Thank you for walking me through the use of the dictionary. Dictionaries of this type are completely new to me, so I have been referring to your blog "Turbo Charging Data Manipulation..." to try and understand what the code is doing.

Here's a little background on what I am doing. I used Network analyst to create routes for me. Now I am querying those routes to get data on the time and distance traveled from one point to the next. This works great until we get to the very last point or location in the route, identified by the final sequence #. For the final point in the route, I still need to know how long (time & dist) it takes to get to the depot (ie. the depot visit where visit type =2). This is a one to one join - there are only two records in the depot visit table and I am only interested in the one where visit type =2. Hopefully this helps to clarify what I am doing with expression2.

So, I don't think I need a one to many dictionary for the depot visits. There are only two records here and I really only need data from the second record. Is it worth making a dictionary for one record?

I have added your code Richard, and started debugging it, but I continue to get an error at line 24 in your code above.

I seem to keep getting an error at

else:

depotVisitsDict[relatekey].append(depot[1:])

Error Info:

<class 'Queue.Empty'>:

RichardFairhurst · ‎06-08-2015

I have fixed the items pointed out by Blake. I also corrected the code in line 22 and 24 to reverse them. I had them backwards (or else I had forgotten to put the word "not" in line 21).

Since the depot visits table only has two records then you could replace lines 13 through 24 with (using the same indentation as line 13):

qLast = "= 2"

expression2 = arcpy.AddFieldDelimiters("depotVisits", "VisitType")+ qLast

dopotVisitsDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList, expression2)}

Then you would also have to replace line 77 with (using the same indentation as line 77):

row3 = list(depotVisitsDict[2])

LauraBlackburn · ‎06-08-2015

# -*- coding: utf-8 -*-
# ---------------------------------------------------------------------------
# 2_FastTrapStatsAndFloatingRoadDensity.py
# Altered on: 2015-6-8 10:00 am
# Created by: L.M.Blackburn
# ArcGIS 10.1
# Notes: Buffers and envelopes were created in first run-through of this code.
#        Now, I calculate length for the clipped roads by adding
#        a field and calculating geometry. Run statistics to get the sum length
#        of roads in each square. export sum length, block name, centroid of
#        square out to a text file.  I'm making edits on this version to run
#        the code on Citrix and to speed up the entire process using dictionary
#        keys instead of nested cursors.
#        (2nd script for 4th part of analysis)
# Needed data: Order sublayer from VRP
#              List of fields to drop
#              Roads network to get length data               
# ---------------------------------------------------------------------------

# Import arcpy module
import arcpy, os, sys, traceback, csv

from time import strftime

# Set the necessary product code
import arceditor
import arcinfo

# Set workspace environment - this is where the solved route layers are located
arcpy.env.workspace = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Orders\\'
arcpy.env.overwriteOutput = True

# --------------------Set local variables ------------------
# ---workspace for order layers
ordersLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Orders\\'
# ---workspace for buffer layers
bufferLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Buffers\\'
# ---set default buffer distance
bufDefault = "6000 Meters"
# ---workspace for envelope layers
envelopeLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Envelope\\'
# ---workspace for route layers
routeWkspc = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Routes\\'
# ---set workspace path for clipped roads
outDataPath = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\ClipRoad\\'
# ---set road layer - for use in clip & length
Road = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\RoadsForSnap\\SDC Edge Source.shp'
# ---fields to delete from Orders layers
dropFields = ["Descriptio", "ServiceTim", "TimeWind_1", "TimeWind_2", "TimeWind_3", "TimeWindow", "MaxViolati", "MaxViola_1", \
              "PickupQuan", "DeliveryQu", "Revenue", "SpecialtyN", "Assignment", "RouteName", "ViolatedCo", "CumulTrave", \
              "CumulDista", "CumulTime", "ArriveCurb", "DepartCurb", "DepartTime", "WaitTime", "ViolationT", "CumulWaitT", \
              "CumulViola", "ArriveTime", "SourceID", "SourceOID", "PosAlong", "SideOfEdge", "CurbApproa", "Status"]
# ---fields for CSV file
fieldNames = ['Block', 'Scale', 'BufferType', 'Sequence', 'FromPrevTr', 'FromPrevDi', 'x', 'y', 'bufferDist','totTime', \
              'totDist', 'totalRdLength', '\n']
# ---List of fields from envelope layer from which to extract data
sourceFieldsList = ["Sequence", "FromPrevTr", "FromPrevDi", "POINT_X", "POINT_Y", "BUFF_DIST"]
              
try:
    print 'Script started at: ' + strftime("%Y-%m-%d %H:%M:%S")
    print 'Opening CSV file...'
    # Step 1 - Create table for output data
    # Needed fields: From script [Block, Scale, Sequence, FromPrevTr, FromPrevDi, totTime, totDist, x, y, totalRdLength]
    with open('F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\AllTrapsData_June8_test.csv', "wb") as f:
        w = csv.writer(f, delimiter=',', lineterminator='\n')
        w.writerow(fieldNames)

        # Calculate totalTime and totalDistance using insert cursor
        # loop through each record in the table - calculate values and...
        # use selected features to clip roads layer & calculate geometry
        # add values to the CSV table
        print 'Populating CSV file...'
        
        envelopeList = []
        for dirpath, dirnames, filenames in arcpy.da.Walk(envelopeLyr, datatype = 'FeatureClass', type = 'Polygon'):
            for filename in filenames:
                envelopeList.append(os.path.join(filename))
        for eLayer in envelopeList:
            eLayerName = eLayer.split('_')
            blockEID = eLayerName[1]
            scaleEID = eLayerName[2].rstrip('.shp')
            bufEID = eLayerName[0]
            #print 'eLayer: ' + eLayer + ' Block: ' + blockEID + ' Scale: ' + scaleEID + ' BufferType: ' + bufEID
            eLyrPath = envelopeLyr + eLayer
            
            #Make a layer from the feature class - needed for selecting records to be used in the clipping
            arcpy.MakeFeatureLayer_management(eLyrPath, "clipLayer")

            #-----------------------pasted code below----------------------
            sourceFC = None  
            #select corresponding route layer file and grab sublayer: DepotVisits   
            print 'extracting time & distance to points. eLayer: ' + eLayer + ' Block: ' + blockEID + ' Scale: ' + scaleEID + ' BufferType: ' + bufEID   
            routeLyr = arcpy.mapping.Layer(routeWkspc + "TrapRoute_" + blockEID + "_" + scaleEID + ".lyr")   
            if routeLyr.isGroupLayer:   
                for sublyrs in routeLyr:   
                    # print sublyrs.name   
                    if sublyrs.name == 'Depot Visits':   
                        arcpy.MakeFeatureLayer_management(sublyrs, "depotVisits")  
                        sourceFC = "depotVisits"   
            qLast = "= 2"
            expression2 = arcpy.AddFieldDelimiters("depotVisits", "VisitType")+qLast
            depotFieldsList = ["VisitType", "FromPrevTravelTime", "FromPrevDistance"]
            depotVisitsDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, depotFieldsList, expression2)}
      
            sourceFC = eLyrPath      
                        
            # Use list comprehension to build a dictionary from a da SearchCursor    
            sequenceDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)}  
      
            for key in sequenceDict:   
                dataList = []    
                dataList.append(str(blockEID))    
                dataList.append(str(scaleEID))    
                dataList.append(str(bufEID))    
                dataList.append(str(key))    
                dataList.append(str(list(sequenceDict[key])[0]))                    
                dataList.append(str(list(sequenceDict[key])[1]))    
                dataList.append(str(list(sequenceDict[key])[2]))    
                dataList.append(str(list(sequenceDict[key])[3]))    
                dataList.append(str(list(sequenceDict[key])[4]))    
                n = int(key)    
                sN = "=" + str(n)    
                n2 = n +1    
                sel = "=" + str(n2)    
                prevTime = float(list(sequenceDict[key])[1])    
                prevDist = float(list(sequenceDict[key])[2])    
                # print sN    
                postTime = ""    

                # use a nested search to get values for total time and total distance
                # this loop is not running - I am not getting these values written to
                # the CSV table
                if str(n2) in sequenceDict:  
                    values = list(sequenceDict[str(n2)])  
                    for row2 in values:   
                        postTime = row2[1]                        
                        totTime = prevTime + float(postTime)    
                        print totTime    
                        dataList.append(str(totTime))    
                        postDist = row2[2]    
                        # print postDist    
                        totDist = prevDist + float(postDist)    
                        # print totDist    
                        dataList.append(str(totDist))    
        
                    # if above for loop yeilds no results, calculate postTime and postDist differently    
                    # grab DepotVisits layer - if VisitType = "End" then grab values for "FromPrevTravelTime" & "FromPrevDistance"    
                    if postTime == "":      
                        
                        #select corresponding route layer file and grab sublayer: DepotVisits    
                        print "extracting time & distance to Depot"    
                        if 2 in depotVisitsDict:  
                            row3 = list(depotVisitsDict[2])  
                            postTime2 = row3[1]    
                            totTime2 = prevTime + float(postTime2)    
                            print ("Depot Visit Time: {0}, {1}, {2}".format(prevTime, postTime2, totTime2))    
                            dataList.append(str(totTime2))    
                            postDist2 = row3[2]    
                            totDist2 = prevDist + float(postDist2)    
                            # print ("Depot Visit Distance: {0}, {1}, {2}".format(prevDist, postDist2, totDist2))    
                            dataList.append(str(totDist2))    
            
            # ----------------------pasted code above----------------------

                print "Sequence: " + sN + ", Block: " + blockEID + ", Scale: " + scaleEID
                # select single record to use when clipping - this selection must be on a layer not fc
                # always create where clause using AddFieldDelimiters
                seqField = "Sequence"
                where_clause = arcpy.AddFieldDelimiters("clipLayer", seqField)+ sN
                arcpy.SelectLayerByAttribute_management("clipLayer", "NEW_SELECTION", where_clause)
                
                # Clip roads w/ selected polygon - for this I will need a search cursor
                clipRoads = outDataPath + 'rdsClip_' + blockEID + '_' + scaleEID + '_' + bufEID + '_' + str(n)
                arcpy.Clip_analysis(Road, "clipLayer", clipRoads)
                clipRoadsShp = clipRoads + ".shp"        

                # Use geometry/length to get the total length of clipped roads(in meters)
                print "calculating road lengths...."
                g = arcpy.Geometry()
                geometryList = arcpy.CopyFeatures_management(clipRoadsShp, g)
                length = 0
                for geometry in geometryList:
                    length +=geometry.length

                # append length (meters) at end of line to csv dataList
                dataList.append(str(length))
                # dataList.append('\r\n')
                print length 
                
                # Write dataList to csv file ---- may need to dedent this one more time ---
                w.writerow(dataList)
                # print dataList
        
    print 'Script completed at: ' + strftime("%Y-%m-%d %H:%M:%S")

    f.close()
except:
    f.close()
    print 'Program failed at: ' + strftime("%Y-%m-%d %H:%M:%S")
    print 'Check python errors.'
    
    
    tb = sys.exc_info()[2]
    tbinfo = traceback.format_tb(tb)[0]
    pymsg = "PYTHON ERRORS:\nTraceback Info:\n" + tbinfo + "\nError Info:\n     " +        str(sys.exc_type) + ": " + str(sys.exc_value) + "\n"
    msgs = "ARCPY ERRORS:\n" + arcpy.GetMessages(2) + "\n"
    
    arcpy.AddError(msgs)
    arcpy.AddError(pymsg)

    print msgs
    print pymsg
    
    arcpy.AddMessage(arcpy.GetMessages(1))
    print arcpy.GetMessages(1)
    # print ("Points to select: {0}, Radius: {1}".format(centroidPath, radius))

The majority of the code seems to be working. However, I am not getting any values returned from the if statement on line 133. Any ideas on what might be happening here? What I am doing in that loop is finding the next point in the route (sequence +1) and grabbing the values for "FromPrevTravelTime" and "FromPrevDistance" from the same layer or the sequenceDict . Should this if statement be changed to a for statement?

I am also wondering if I should order the sequence dictionary? Would this save on processing time?

RichardFairhurst · ‎06-08-2015

Are you sure that the sequence field is a string field? I just made that assumption, but it probably is wrong. If sequence is a numeric field then the key is wrong at lines 133 and 134. The code would have to be:

if n2 in sequenceDict:

values = list(sequenceDict[n2])

Also, if you want to process the dictionary sorted, you don't sort the dictionary, since it can't be sorted. You sort the list of dictionary keys instead. I don't recommend using an OrderedDictionary, since it isn't necessary for this process. So line 110 would be (assuming the sequence key is numeric and not string):

for key in sorted(sequenceDict.keys()):

This won't have any noticeable effect the speed of the code, but it will output the data in the order that you probably want.

LauraBlackburn · ‎06-09-2015

Thanks again Richard!

The sequence field is a short integer. So, by changing it from a string to a number we are now going into that loop, however, I get a queue.empty error at line 136 postTime=row2[1]. Could you describe what we are doing in line 134? There should only be one record where n2 is in the sequenceDict. What I think line 134 does, is put all of the records for n2 into a list. Then postTime would be the second field in the sequenceDict where the first field = n2.

Also, the speed on this script still seems to be slow. I am pretty sure this is due to the clipping of roads and calculating the road length. Is this something that could be done in-memory to help with the speed? Would the syntax be something like below to replace lines 173-175:

clipRoadsShp = arcpy.Clip_analysis(Road, "clipLayer", "in_memory")

LauraBlackburn · ‎06-09-2015

okay, I have all of the dictionaries working and my CSV file is being populated correctly. I altered the road clipping to use "in_memory" processing too. But, I still seem to be dealing with slow processing speeds. It seems like it will take many days to complete.

RichardFairhurst · ‎06-09-2015

I tried to see if it would work with the few changes I made, but by changing from a one to many dictionary to a one to one dictionary I need to make more changes. This code should access the FromPrevTr and FromPrevDi fields.

                if n2 in sequenceDict:    
                    postTime = sequenceDict[n2][0]                         
                    totTime = prevTime + float(postTime)      
                    print totTime      
                    dataList.append(str(totTime))      
                    postDist = sequenceDict[n2][1]      
                    # print postDist      
                    totDist = prevDist + float(postDist)      
                    # print totDist      
                    dataList.append(str(totDist))

What is your clip layer? Before I completely rebuild this section I need to better understand what is going on. I would load cliplayer into a dictionary and include its sequence and Shape@ fields once at the beginning of the script for fast access if it has many records. The geometry for the n2 sequence record can be fed directly to the clip process based on the first example in this help post (ignore the array, it can come from the shape@ field)

For using the in_memory workspace you should get a better understanding (which I also need). Start by looking at the help here. You use in_memory as a path and you still need a slash and a feature class name after it for output. It is more like a geodatabase (which you should convert to anyway, since shapefile performance sucks). So it would be more like changing line 173 to:

clipRoads = r"in_memory\" + 'rdsClip_' + blockEID + '_' + scaleEID + '_' + bufEID + '_' + str(n)

You would not add .shp to the end of the feature class name, since it is not outputting a shapefile. If you use the in_memory workspace and do not actually need the output of the clip permanently, you should get rid of the individualized feature class names and should delete the feature class with arcpy.Delete_management() as soon as you are done with it to keep the memory clean.

Why do you use CopyFeatures to a geometry object and not a cursor to read through the geometry lengths? Duplicating the feature again is a waste of time as far as I can see.

Anyway, clipping one shape at a time will be a slow process, but it can be sped up. You should print a time stamp and time difference before and after the actual clip to find out how much time is being eaten by each clip operation. Only by batch processing many objects in one clip are you likely to speed that up significantly, since the set up time for each clip is probably what is killing your time. You should still notice a speed improvement in the section of code we have revised. In fact you should start adding print statements to show time spent in each key section if you are trying to find the most time consuming operations in your code.

You also need to evaluate what is the maximum time that this process can take for it to actually be useful to you. I have had to abandon several approaches I spent considerable time developing when the time spent doing them did not pay off sufficiently. I usually can rethink it eventually to hit my performance tolerance and get a close enough result that I can live with. If your process cannot perform acceptably because of an operation you cannot control (the clip), you may want to consider alternative processes that may be less precise, but that may produce a more worthwhile output in light of the performance improvements they offer.