I have a script that is creates a buffer around points, converts the round buffers to squares, clips a road layers with the square, then calculates the length for the clipped roads and exports data to a csv file. I have run this script before and it took roughly 3 days to complete. This time when I ran the code it's projected to take a month to mine the data. I have no idea why it might be slowing down and could really use some help figuring this out. Does anyone know how I could speed up this process?
# -*- coding: utf-8 -*-
# ---------------------------------------------------------------------------
# 2_TrapStatsAndFloatingRoadDensity.py
# Altered on: 2015-6-2 10:00 am
# Created by: L.M.Blackburn
# ArcGIS 10.1
# Notes: iterates thru orders shapefiles and creates a buffer
#        around each point. The round buffer is converted to a feature envelope
#        which makes a square polygon around the point. This square is used to
#        clip the roads layer. Then I calculate length for the clipped roads by adding
#        a field and calculating geometry. Run statistics to get the sum length
#        of roads in each square. export sum length, block name, centroid of
#        square out to a text file
#        (2nd script for 4th part of analysis)
# Needed data: Order sublayer from VRP
#              List of fields to drop
#              Roads network to get length data               
# ---------------------------------------------------------------------------
# Import arcpy module
import arcpy, os, sys, traceback, csv
# Set the necessary product code
import arceditor
import arcinfo
# Set workspace environment - this is where the solved route layers are located
arcpy.env.workspace = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Orders\\'
arcpy.env.overwriteOutput = True
# --------------------Set local variables ------------------
# ---workspace for order layers
ordersLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Orders\\'
# ---workspace for buffer layers
bufferLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Buffers\\'
# ---set default buffer distance
bufDefault = "6000 Meters"
# ---workspace for envelope layers
envelopeLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Envelopes\\'
# ---workspace for route layers
routeWkspc = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Routes\\'
# ---set workspace path for clipped roads
outDataPath = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\ClipRoads\\'
# ---set road layer - for use in clip & length
Road = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\RoadsForSnap\\SDC Edge Source.shp'
# ---fields to delete from Orders layers
dropFields = ["Descriptio", "ServiceTim", "TimeWind_1", "TimeWind_2", "TimeWind_3", "TimeWindow", "MaxViolati", "MaxViola_1", \
              "PickupQuan", "DeliveryQu", "Revenue", "SpecialtyN", "Assignment", "RouteName", "ViolatedCo", "CumulTrave", \
              "CumulDista", "CumulTime", "ArriveCurb", "DepartCurb", "DepartTime", "WaitTime", "ViolationT", "CumulWaitT", \
              "CumulViola", "ArriveTime", "SourceID", "SourceOID", "PosAlong", "SideOfEdge", "CurbApproa", "Status"]
              
try:
    print 'Opening CSV file...'
    # Step 1 - Create table for output data
    # Needed fields: From script [Block, Scale, Sequence, FromPrevTr, FromPrevDi, totTime, totDist, x, y, totalRdLength]
    CSVFile = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\AllTrapsData_June2.csv'
    f = open (CSVFile, "wb")
    w = csv.writer(f, delimiter=',', lineterminator='\n')
    fieldNames = ['Block', 'Scale', 'BufferType', 'Sequence', 'FromPrevTr', 'FromPrevDi', 'x', 'y', 'bufferDist','totTime', \
                  'totDist', 'totalRdLength', '\n']
    w.writerow(fieldNames)
    # Step 2 - Loop through files in orders folder
    # Create variable to hold orders feature class files
    '''
    orderList = arcpy.ListFiles("*.shp")
    # Loop through layers & break name into segments to be used when naming orders lyrs
    print 'Creating Buffers...' 
    for shp in orderList:
        lyrs = arcpy.mapping.Layer(ordersLyr + shp)
        splitName =shp.split('_')
        block = splitName[0]
        scale = splitName[1].rstrip('.shp')
        # define buffer distance 
        if scale == "1K":
            bufDist = "1000 Meters"
        elif scale == "2K":
            bufDist = "2000 Meters"
        elif scale == "3K":
            bufDist = "3000 Meters"
        elif scale == "4K":
            bufDist = "4000 Meters"
        elif scale == "6K":
            bufDist = "6000 Meters"
        elif scale == "500":
            bufDist = "500 Meters"
        else:
            bufDist = bufDefault
   
        print 'Order: ' + shp + ', Buffer: ' + bufDist
        # Jump into orderLayers to delete unneeded fields & add XY
        arcpy.DeleteField_management(shp, dropFields)
        arcpy.AddXY_management(shp)
                    
        # Step 4 - Run buffer on order layers     
        arcpy.Buffer_analysis(shp, bufferLyr + 'buf' + '_' + block + '_' + scale, bufDist, 'FULL', 'ROUND', 'NONE', '#')
        arcpy.Buffer_analysis(shp, bufferLyr + 'buf2' + '_' + block + '_' + scale, bufDefault, 'FULL', 'ROUND', 'NONE', '#')
    
    bufferList = []
    for dpath, dnames, fnames in arcpy.da.Walk(bufferLyr, datatype = 'FeatureClass', type = 'Polygon'):
        for files in fnames:
            bufferList.append(os.path.join(files))
    
    # Step 5 - Run feature envelope to polygon on buffer layers
    print 'Converting round buffers to squares...' 
    for bufShp in bufferList:
        bufSplitName = bufShp.split('_')
        blockID = bufSplitName[1]
        scaleID = bufSplitName[2].rstrip('.shp')
        bufID = bufSplitName[0]
        print 'Buffer: ' + bufShp + ' Block: ' + blockID + ' Scale: ' + scaleID + ' BufferType: ' + bufID
        arcpy.FeatureEnvelopeToPolygon_management(bufferLyr + bufShp, envelopeLyr + bufID + '_' +  blockID + '_' +  scaleID, 'SINGLEPART')
    '''
    # Step 6 - Calculate totalTime and totalDistance using insert cursor
    # loop through each record in the table - calculate values and...
    # use selected features to clip roads layer & calculate geometry
    # add values to the CSV table
    print 'Populating CSV file...'
    envelopeList = []
    for dirpath, dirnames, filenames in arcpy.da.Walk(envelopeLyr, datatype = 'FeatureClass', type = 'Polygon'):
        for filename in filenames:
            envelopeList.append(os.path.join(filename))
    for eLayer in envelopeList:
        eLayerName = eLayer.split('_')
        blockEID = eLayerName[1]
        scaleEID = eLayerName[2].rstrip('.shp')
        bufEID = eLayerName[0]
        print 'eLayer: ' + eLayer + ' Block: ' + blockEID + ' Scale: ' + scaleEID + ' BufferType: ' + bufEID
        eLyrPath = envelopeLyr + eLayer
        
        #Make a layer from the feature class - needed for selecting records to be used in the clipping
        arcpy.MakeFeatureLayer_management(eLyrPath, "clipLayer")
        
        # use search cursor to grab data row by row to populate CSV file
        with arcpy.da.SearchCursor(eLyrPath, ("Sequence", "FromPrevTr", "FromPrevDi", "POINT_X", "POINT_Y",
                                                          "BUFF_DIST")) as cursor:
            for row in cursor:
                dataList = []
                dataList.append(str(blockEID))
                dataList.append(str(scaleEID))
                dataList.append(str(bufEID))
                dataList.append(str(row[0]))
                dataList.append(str(row[1]))                
                dataList.append(str(row[2]))
                dataList.append(str(row[3]))
                dataList.append(str(row[4]))
                dataList.append(str(row[5]))
                n = int(row[0])
                sN = "=" + str(n)
                n2 = n +1
                sel = "=" + str(n2)
                prevTime = float(row[1])
                prevDist = float(row[2])
                # print sN
                postTime = ""
                # use a nested search to get values for total time and total distance
                # there are some cases where the below expression will be false in that
                # event we will need to cacluclate the distance to the depot or just leave
                # the values for prevTime & prevDist as place holders, otherwise I end up with
                # road length being written in the wrong field
                expression = arcpy.AddFieldDelimiters(eLyrPath, "Sequence")+ sel
                for row2 in arcpy.da.SearchCursor(eLyrPath, ("FromPrevTr", "FromPrevDi"), expression):
                    postTime = row2[0]                    
                    totTime = prevTime + float(postTime)
                    # print totTime
                    dataList.append(str(totTime))
                    postDist = row2[1]
                    # print postDist
                    totDist = prevDist + float(postDist)
                    # print totDist
                    dataList.append(str(totDist))
                # if above for loop yeilds no results, calculate postTime and postDist differently
                # grab DepotVisits layer - if VisitType = "End" then grab values for "FromPrevTravelTime" & "FromPrevDistance"
                if postTime == "":  
                    
                    #select corresponding route layer file and grab sublayer: DepotVisits
                    print "extracting time & distance to Depot"
                    routeLyr = arcpy.mapping.Layer(routeWkspc + "TrapRoute_" + blockEID + "_" + scaleEID + ".lyr")
                    if routeLyr.isGroupLayer:
                        for sublyrs in routeLyr:
                            # print sublyrs.name
                            if sublyrs.name == 'Depot Visits':
                                arcpy.MakeFeatureLayer_management(sublyrs, "depotVisits")
                                qLast = "= 2"
                                expression2 = arcpy.AddFieldDelimiters("depotVisits", "VisitType")+ qLast
                                for row3 in arcpy.da.SearchCursor("depotVisits", ("FromPrevTravelTime", "FromPrevDistance"), expression2):
                                    postTime2 = row3[0]
                                    totTime2 = prevTime + float(postTime2)
                                    # print ("Depot Visit Time: {0}, {1}, {2}".format(prevTime, postTime2, totTime2))
                                    dataList.append(str(totTime2))
                                    postDist2 = row3[1]
                                    totDist2 = prevDist + float(postDist2)
                                    # print ("Depot Visit Distance: {0}, {1}, {2}".format(prevDist, postDist2, totDist2))
                                    dataList.append(str(totDist2))
               
                # select single record to use when clipping - this selection must be on a layer not fc
                # always create where clause using AddFieldDelimiters
                seqField = "Sequence"
                where_clause = arcpy.AddFieldDelimiters("clipLayer", seqField)+ sN
                arcpy.SelectLayerByAttribute_management("clipLayer", "NEW_SELECTION", where_clause)
                
                # Clip roads w/ selected polygon - for this I will need a search cursor
                clipRoads = outDataPath + 'rdsClip_' + blockEID + '_' + scaleEID + '_' + bufEID + '_' + str(n)
                arcpy.Clip_analysis(Road, "clipLayer", clipRoads)
                clipRoadsShp = clipRoads + ".shp"        
                # Use geometry/length to get the total length of clipped roads(in meters)
                g = arcpy.Geometry()
                geometryList = arcpy.CopyFeatures_management(clipRoadsShp, g)
                length = 0
                for geometry in geometryList:
                    length +=geometry.length
                # append length (meters) at end of line to csv dataList
                dataList.append(str(length))
                # dataList.append('\r\n')
                # print length 
                
                # Write dataList to csv file ---- may need to dedent this one more time ---
                w.writerow(dataList)
                # print dataList
        
    print 'Script complete'
    f.close()
except:
    if arcpy.Exists(CSVFile):
        w.writerow(dataList)
        print 'Data written in CSV file'
    f.close()
    print 'Program failed.'
    print 'Check python errors.'
    
    
    tb = sys.exc_info()[2]
    tbinfo = traceback.format_tb(tb)[0]
    pymsg = "PYTHON ERRORS:\nTraceback Info:\n" + tbinfo + "\nError Info:\n     " +        str(sys.exc_type) + ": " + str(sys.exc_value) + "\n"
    msgs = "ARCPY ERRORS:\n" + arcpy.GetMessages(2) + "\n"
    
    arcpy.AddError(msgs)
    arcpy.AddError(pymsg)
    print msgs
    print pymsg
    
    arcpy.AddMessage(arcpy.GetMessages(1))
    print arcpy.GetMessages(1)
    # print ("Points to select: {0}, Radius: {1}".format(centroidPath, radius))
					
				
			
			
				
			
			
				Solved! Go to Solution.
Laura:
The code looks good to me. The dictionary could sum the length results while it is being created without using Summary Statistics if you wanted, but unless that step takes a significant amount of time revising the code would not make much difference. In any case, I am sure that the bulk of the time is still taken up by the Intersect tool, but that step is probably as fast as it is going to be at this stage.
I would be interested to know the time taken by the dictionary portion of the code relative to the time taken by the Intersect operation to determine if there may be other bottlenecks that could be optimized. Reducing the time spent in each iteration by 1/6th would still result is a savings of about 2 hours if you are correct about the overall time for the code.
Anyway, 12 hours is probably still taking too long to run on a nightly or weekly basis unless you have no other batch jobs you need to run. But the code provides a very acceptable overnight batch process if it only to be run once on an infrequent or monthly basis. I assume that is all that this code is intended for.
Yes, this code is only intended to be run a few times. But it sure beats doing all these steps manually. The summary statistics step took less than a second and I think that's about how long it takes to write the dictionaries. It's amazing to me how quickly it writes to my CSV file. 500 records are written to the file in only 1-2 seconds. So, thanks so much for letting me know about dictionaries. They are amazingly fast! I've already processed more data in a few minutes than I would have in 2 days before. Again, thanks for taking the time to help! It's greatly appreciated!
Laura:
I am concerned that you made a change in your last version of your code that will produce incorrect lengths due to the Intersect tool and Summary you are now doing. Before you were summing the lengths of the actual geometry created by the Clip tool. That would correctly update any segments split by your envelope boundaries. Now you appear to be summing a static double field called length_m that must have been populated before the Intersect tool was run, since I do not see any code that updates that field. Unfortunately no static numeric field of lengths will be accurate after the Intersect tool is run unless it is first updated with the Field Calculator (which would be slow so you don't want to do that).
No static numeric fields containing the length of the segment that existed prior to the Intersect will be automatically updated after the Intersect is completed. A static length field will only be correct after the intersect if the complete segment was not divided by the boundary of any envelope. All segments divided at an envelope boundary will still report the full original length of the original segment even though only a portion of the segment is actually contained in the envelope that divided the segment. If a segment weaved in and out of an envelope the many divided segments within the envelope would accumulate the original undivided length of the original segment many times in the summary value of lengths for that envelope.
The only accurate length values that the Summary Statistics tool could use after running the Intersect tool without running the field calculator would have to come from a file geodatabase feature class using the LENGTH field created and maintained by the geodatabase. That fields will report the updated length for any partial road segment(s) within the envelope created by the Intersect tool. Alternatively, a cursor reading the Intersect output directly into a dictionary could use the SHAPE@LENGTH field to do the same summary using the code I already provided.
Both of those updated length fields will be reported in the original units of measure for your feature class, so if your units are not actually in meters you would have to multiply the summary value by a conversion factor to get meters. A cursor does not allow you to specific a unit conversion factor directly to the reported geometry length values like the Field Calculator does. There is a metersperunit property you can access from the spatial reference of your features provided you are using a Projected Coordinate System (I assume you are). That factor could be used at some point in your code to do the conversion to meters. The best time to do that would be during the creation of the dictionary holding the lengths,
Thanks for bringing this to my attention. I just changed the code to reflect the length field created w/ the geodatabase.
Thanks too for all of your help! I couldn't have done this without you!
Laura
I am glad that was caught, even if it means your last run of the tool was wasted. It would be a pity to have made all of the improvements we made to the speed of your code only to introduce false data into the final result and analysis. Al least another run of the current tool will be able to be complete for Monday. I also appreciate the recognition and am glad I could help.
If you have Spatial Analyst, you may want to try:
