I have a script that is creates a buffer around points, converts the round buffers to squares, clips a road layers with the square, then calculates the length for the clipped roads and exports data to a csv file. I have run this script before and it took roughly 3 days to complete. This time when I ran the code it's projected to take a month to mine the data. I have no idea why it might be slowing down and could really use some help figuring this out. Does anyone know how I could speed up this process?
# -*- coding: utf-8 -*- # --------------------------------------------------------------------------- # 2_TrapStatsAndFloatingRoadDensity.py # Altered on: 2015-6-2 10:00 am # Created by: L.M.Blackburn # ArcGIS 10.1 # Notes: iterates thru orders shapefiles and creates a buffer # around each point. The round buffer is converted to a feature envelope # which makes a square polygon around the point. This square is used to # clip the roads layer. Then I calculate length for the clipped roads by adding # a field and calculating geometry. Run statistics to get the sum length # of roads in each square. export sum length, block name, centroid of # square out to a text file # (2nd script for 4th part of analysis) # Needed data: Order sublayer from VRP # List of fields to drop # Roads network to get length data # --------------------------------------------------------------------------- # Import arcpy module import arcpy, os, sys, traceback, csv # Set the necessary product code import arceditor import arcinfo # Set workspace environment - this is where the solved route layers are located arcpy.env.workspace = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Orders\\' arcpy.env.overwriteOutput = True # --------------------Set local variables ------------------ # ---workspace for order layers ordersLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Orders\\' # ---workspace for buffer layers bufferLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Buffers\\' # ---set default buffer distance bufDefault = "6000 Meters" # ---workspace for envelope layers envelopeLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Envelopes\\' # ---workspace for route layers routeWkspc = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Routes\\' # ---set workspace path for clipped roads outDataPath = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\ClipRoads\\' # ---set road layer - for use in clip & length Road = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\RoadsForSnap\\SDC Edge Source.shp' # ---fields to delete from Orders layers dropFields = ["Descriptio", "ServiceTim", "TimeWind_1", "TimeWind_2", "TimeWind_3", "TimeWindow", "MaxViolati", "MaxViola_1", \ "PickupQuan", "DeliveryQu", "Revenue", "SpecialtyN", "Assignment", "RouteName", "ViolatedCo", "CumulTrave", \ "CumulDista", "CumulTime", "ArriveCurb", "DepartCurb", "DepartTime", "WaitTime", "ViolationT", "CumulWaitT", \ "CumulViola", "ArriveTime", "SourceID", "SourceOID", "PosAlong", "SideOfEdge", "CurbApproa", "Status"] try: print 'Opening CSV file...' # Step 1 - Create table for output data # Needed fields: From script [Block, Scale, Sequence, FromPrevTr, FromPrevDi, totTime, totDist, x, y, totalRdLength] CSVFile = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\AllTrapsData_June2.csv' f = open (CSVFile, "wb") w = csv.writer(f, delimiter=',', lineterminator='\n') fieldNames = ['Block', 'Scale', 'BufferType', 'Sequence', 'FromPrevTr', 'FromPrevDi', 'x', 'y', 'bufferDist','totTime', \ 'totDist', 'totalRdLength', '\n'] w.writerow(fieldNames) # Step 2 - Loop through files in orders folder # Create variable to hold orders feature class files ''' orderList = arcpy.ListFiles("*.shp") # Loop through layers & break name into segments to be used when naming orders lyrs print 'Creating Buffers...' for shp in orderList: lyrs = arcpy.mapping.Layer(ordersLyr + shp) splitName =shp.split('_') block = splitName[0] scale = splitName[1].rstrip('.shp') # define buffer distance if scale == "1K": bufDist = "1000 Meters" elif scale == "2K": bufDist = "2000 Meters" elif scale == "3K": bufDist = "3000 Meters" elif scale == "4K": bufDist = "4000 Meters" elif scale == "6K": bufDist = "6000 Meters" elif scale == "500": bufDist = "500 Meters" else: bufDist = bufDefault print 'Order: ' + shp + ', Buffer: ' + bufDist # Jump into orderLayers to delete unneeded fields & add XY arcpy.DeleteField_management(shp, dropFields) arcpy.AddXY_management(shp) # Step 4 - Run buffer on order layers arcpy.Buffer_analysis(shp, bufferLyr + 'buf' + '_' + block + '_' + scale, bufDist, 'FULL', 'ROUND', 'NONE', '#') arcpy.Buffer_analysis(shp, bufferLyr + 'buf2' + '_' + block + '_' + scale, bufDefault, 'FULL', 'ROUND', 'NONE', '#') bufferList = [] for dpath, dnames, fnames in arcpy.da.Walk(bufferLyr, datatype = 'FeatureClass', type = 'Polygon'): for files in fnames: bufferList.append(os.path.join(files)) # Step 5 - Run feature envelope to polygon on buffer layers print 'Converting round buffers to squares...' for bufShp in bufferList: bufSplitName = bufShp.split('_') blockID = bufSplitName[1] scaleID = bufSplitName[2].rstrip('.shp') bufID = bufSplitName[0] print 'Buffer: ' + bufShp + ' Block: ' + blockID + ' Scale: ' + scaleID + ' BufferType: ' + bufID arcpy.FeatureEnvelopeToPolygon_management(bufferLyr + bufShp, envelopeLyr + bufID + '_' + blockID + '_' + scaleID, 'SINGLEPART') ''' # Step 6 - Calculate totalTime and totalDistance using insert cursor # loop through each record in the table - calculate values and... # use selected features to clip roads layer & calculate geometry # add values to the CSV table print 'Populating CSV file...' envelopeList = [] for dirpath, dirnames, filenames in arcpy.da.Walk(envelopeLyr, datatype = 'FeatureClass', type = 'Polygon'): for filename in filenames: envelopeList.append(os.path.join(filename)) for eLayer in envelopeList: eLayerName = eLayer.split('_') blockEID = eLayerName[1] scaleEID = eLayerName[2].rstrip('.shp') bufEID = eLayerName[0] print 'eLayer: ' + eLayer + ' Block: ' + blockEID + ' Scale: ' + scaleEID + ' BufferType: ' + bufEID eLyrPath = envelopeLyr + eLayer #Make a layer from the feature class - needed for selecting records to be used in the clipping arcpy.MakeFeatureLayer_management(eLyrPath, "clipLayer") # use search cursor to grab data row by row to populate CSV file with arcpy.da.SearchCursor(eLyrPath, ("Sequence", "FromPrevTr", "FromPrevDi", "POINT_X", "POINT_Y", "BUFF_DIST")) as cursor: for row in cursor: dataList = [] dataList.append(str(blockEID)) dataList.append(str(scaleEID)) dataList.append(str(bufEID)) dataList.append(str(row[0])) dataList.append(str(row[1])) dataList.append(str(row[2])) dataList.append(str(row[3])) dataList.append(str(row[4])) dataList.append(str(row[5])) n = int(row[0]) sN = "=" + str(n) n2 = n +1 sel = "=" + str(n2) prevTime = float(row[1]) prevDist = float(row[2]) # print sN postTime = "" # use a nested search to get values for total time and total distance # there are some cases where the below expression will be false in that # event we will need to cacluclate the distance to the depot or just leave # the values for prevTime & prevDist as place holders, otherwise I end up with # road length being written in the wrong field expression = arcpy.AddFieldDelimiters(eLyrPath, "Sequence")+ sel for row2 in arcpy.da.SearchCursor(eLyrPath, ("FromPrevTr", "FromPrevDi"), expression): postTime = row2[0] totTime = prevTime + float(postTime) # print totTime dataList.append(str(totTime)) postDist = row2[1] # print postDist totDist = prevDist + float(postDist) # print totDist dataList.append(str(totDist)) # if above for loop yeilds no results, calculate postTime and postDist differently # grab DepotVisits layer - if VisitType = "End" then grab values for "FromPrevTravelTime" & "FromPrevDistance" if postTime == "": #select corresponding route layer file and grab sublayer: DepotVisits print "extracting time & distance to Depot" routeLyr = arcpy.mapping.Layer(routeWkspc + "TrapRoute_" + blockEID + "_" + scaleEID + ".lyr") if routeLyr.isGroupLayer: for sublyrs in routeLyr: # print sublyrs.name if sublyrs.name == 'Depot Visits': arcpy.MakeFeatureLayer_management(sublyrs, "depotVisits") qLast = "= 2" expression2 = arcpy.AddFieldDelimiters("depotVisits", "VisitType")+ qLast for row3 in arcpy.da.SearchCursor("depotVisits", ("FromPrevTravelTime", "FromPrevDistance"), expression2): postTime2 = row3[0] totTime2 = prevTime + float(postTime2) # print ("Depot Visit Time: {0}, {1}, {2}".format(prevTime, postTime2, totTime2)) dataList.append(str(totTime2)) postDist2 = row3[1] totDist2 = prevDist + float(postDist2) # print ("Depot Visit Distance: {0}, {1}, {2}".format(prevDist, postDist2, totDist2)) dataList.append(str(totDist2)) # select single record to use when clipping - this selection must be on a layer not fc # always create where clause using AddFieldDelimiters seqField = "Sequence" where_clause = arcpy.AddFieldDelimiters("clipLayer", seqField)+ sN arcpy.SelectLayerByAttribute_management("clipLayer", "NEW_SELECTION", where_clause) # Clip roads w/ selected polygon - for this I will need a search cursor clipRoads = outDataPath + 'rdsClip_' + blockEID + '_' + scaleEID + '_' + bufEID + '_' + str(n) arcpy.Clip_analysis(Road, "clipLayer", clipRoads) clipRoadsShp = clipRoads + ".shp" # Use geometry/length to get the total length of clipped roads(in meters) g = arcpy.Geometry() geometryList = arcpy.CopyFeatures_management(clipRoadsShp, g) length = 0 for geometry in geometryList: length +=geometry.length # append length (meters) at end of line to csv dataList dataList.append(str(length)) # dataList.append('\r\n') # print length # Write dataList to csv file ---- may need to dedent this one more time --- w.writerow(dataList) # print dataList print 'Script complete' f.close() except: if arcpy.Exists(CSVFile): w.writerow(dataList) print 'Data written in CSV file' f.close() print 'Program failed.' print 'Check python errors.' tb = sys.exc_info()[2] tbinfo = traceback.format_tb(tb)[0] pymsg = "PYTHON ERRORS:\nTraceback Info:\n" + tbinfo + "\nError Info:\n " + str(sys.exc_type) + ": " + str(sys.exc_value) + "\n" msgs = "ARCPY ERRORS:\n" + arcpy.GetMessages(2) + "\n" arcpy.AddError(msgs) arcpy.AddError(pymsg) print msgs print pymsg arcpy.AddMessage(arcpy.GetMessages(1)) print arcpy.GetMessages(1) # print ("Points to select: {0}, Radius: {1}".format(centroidPath, radius))
Solved! Go to Solution.
I am getting a bit hung up on implementing the dictionaries, mainly because I am grabbing values from a feature class and putting them into a CSV file. So, I don't think I can use an update cursor, instead I am using w.writerow with a list of records, right? Please correct me if I am wrong.
Also, I am not just joining data from one table to another. I am grabbing data one row at a time so that I can perform some data manipulations. Basically, I have a point (n) with information about the time and distance from the previous point (stops along a route). My code is adding the values for time and distance for point (n) to the time and distance for point (n+1). So, I am querying the same feature class twice to get data for point(n) and adding that to point(n+1).
Richard, any help or insight you can provide is greatly appreciated. I am a self-taught coder and I feel that I know just enough to be dangerous. If there are any books or courses that you found helpful, I would love to hear about it!
You are correct that the UpdateCursor would be replaced by the w.writerow code, but that has nothing to do with the look up process. Anyway, dictionaries need to build the list object that you later feed to the w.writerow portion of your code. Additionally using a key that looks up the next key in the dicitionary works perfectly well and embedding cursors that point to the same data source gives even worse performance that embedded cursors accessing separate datasources.
Here is a stab at converting the code. Without knowing your data I assumed the Sequence field is a string field and that Sequence is a Unique key without duplicates in the data source. If Sequence is not a Unique key and can refer to multiple rows you will need to let me know so I can implement it as a list of lists value. I assumed a one to many relationship for the depot visits.
Also, I am unfamiliar with what expression2 creates and how it is used in the cursor expression:
expression2 = arcpy.AddFieldDelimiters("depotVisits", "VisitType")+ qLast
for row3 in arcpy.da.SearchCursor("depotVisits", ("FromPrevTravelTime", "FromPrevDistance"), expression2):
wouldn't that convert to this invalid invalid cursor set up (ignoring whatever delimiters apply)?
for row3 in arcpy.da.SearchCursor("depotVisits", ("FromPrevTravelTime", "FromPrevDistance"), "depotVisits", "VisitType"=2):
So I am ignoring the depotVisits and assuming it works with VisitType=2 until you explain it to me.
The code is untested, so expect some debugging effort to make sure everything is indented correctly. I free pasted and then modified a fair amount of the code from multiple sources so I could have left some mismatched syntax in the code. Let me know of any errors that are not obvious omissions or misspellings. The code below should replace lines 137 through 198 in your original code posting. The relative indentation of the code should make line 1 below indented at the same level as line 137 in your original code.
sourceFC = None #select corresponding route layer file and grab sublayer: DepotVisits print "extracting time & distance to Depot" routeLyr = arcpy.mapping.Layer(routeWkspc + "TrapRoute_" + blockEID + "_" + scaleEID + ".lyr") if routeLyr.isGroupLayer: for sublyrs in routeLyr: # print sublyrs.name if sublyrs.name == 'Depot Visits': arcpy.MakeFeatureLayer_management(sublyrs, "depotVisits") sourceFC = "depotVisits" depotVisitsDict = {} if sourceFC == "depotVisits": sourceFieldsList = ["VisitType", "FromPrevTravelTime", "FromPrevDistance"] # Set up a One to Many Dictionary with arcpy.da.SearchCursor(sourceFC, sourceFieldsList) as depots: for depot in depots: relatekey = depot[0] if relatekey in depotVisitsDict: depotVisitsDict[relatekey].append(depot[1:]) else: depotVisitsDict[relatekey] = [depot[1:]] sourceFC = eLyrPath sourceFieldsList = ["Sequence", "FromPrevTr", "FromPrevDi", "POINT_X", "POINT_Y", "BUFF_DIST"] # Use list comprehension to build a dictionary from a da SearchCursor sequenceDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)} for key in sequenceDict: dataList = [] dataList.append(str(blockEID)) dataList.append(str(scaleEID)) dataList.append(str(bufEID)) dataList.append(str(key)) dataList.append(str(list(sequenceDict[key])[0])) dataList.append(str(list(sequenceDict[key])[1])) dataList.append(str(list(sequenceDict[key])[2])) dataList.append(str(list(sequenceDict[key])[3])) dataList.append(str(list(sequenceDict[key])[4])) n = int(key) sN = "=" + str(n) n2 = n +1 sel = "=" + str(n2) prevTime = float(list(sequenceDict[key])[0]) prevDist = float(list(sequenceDict[key])[1]) # print sN postTime = "" # use a nested search to get values for total time and total distance # there are some cases where the below expression will be false in that # event we will need to cacluclate the distance to the depot or just leave # the values for prevTime & prevDist as place holders, otherwise I end up with # road length being written in the wrong field if str(n2) in sequenceDict: values = list(sequenceDict[str(n2)]) for row2 in values: postTime = row2[0] totTime = prevTime + float(postTime) # print totTime dataList.append(str(totTime)) postDist = row2[1] # print postDist totDist = prevDist + float(postDist) # print totDist dataList.append(str(totDist)) # if above for loop yeilds no results, calculate postTime and postDist differently # grab DepotVisits layer - if VisitType = "End" then grab values for "FromPrevTravelTime" & "FromPrevDistance" if postTime == "": #select corresponding route layer file and grab sublayer: DepotVisits print "extracting time & distance to Depot" if 2 in depotVisitsDict: for row3 in depotVisitsDict[2]: postTime2 = row3[0] totTime2 = prevTime + float(postTime2) # print ("Depot Visit Time: {0}, {1}, {2}".format(prevTime, postTime2, totTime2)) dataList.append(str(totTime2)) postDist2 = row3[1] totDist2 = prevDist + float(postDist2) # print ("Depot Visit Distance: {0}, {1}, {2}".format(prevDist, postDist2, totDist2)) dataList.append(str(totDist2))
I hate to be a nitpicker, Richard, especially when you have all the answers, but you forgot a colon after your if statement on line 13 and are missing an extra right parenthesis at the end of lines 48 and 49. Great work though!
Hi Richard,
Thank you for walking me through the use of the dictionary. Dictionaries of this type are completely new to me, so I have been referring to your blog "Turbo Charging Data Manipulation..." to try and understand what the code is doing.
Here's a little background on what I am doing. I used Network analyst to create routes for me. Now I am querying those routes to get data on the time and distance traveled from one point to the next. This works great until we get to the very last point or location in the route, identified by the final sequence #. For the final point in the route, I still need to know how long (time & dist) it takes to get to the depot (ie. the depot visit where visit type =2). This is a one to one join - there are only two records in the depot visit table and I am only interested in the one where visit type =2. Hopefully this helps to clarify what I am doing with expression2.
So, I don't think I need a one to many dictionary for the depot visits. There are only two records here and I really only need data from the second record. Is it worth making a dictionary for one record?
I have added your code Richard, and started debugging it, but I continue to get an error at line 24 in your code above.
I seem to keep getting an error at
else:
depotVisitsDict[relatekey].append(depot[1:])
Error Info:
<class 'Queue.Empty'>:
I have fixed the items pointed out by Blake. I also corrected the code in line 22 and 24 to reverse them. I had them backwards (or else I had forgotten to put the word "not" in line 21).
Since the depot visits table only has two records then you could replace lines 13 through 24 with (using the same indentation as line 13):
qLast = "= 2"
expression2 = arcpy.AddFieldDelimiters("depotVisits", "VisitType")+ qLast
dopotVisitsDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList, expression2)}
Then you would also have to replace line 77 with (using the same indentation as line 77):
row3 = list(depotVisitsDict[2])
# -*- coding: utf-8 -*- # --------------------------------------------------------------------------- # 2_FastTrapStatsAndFloatingRoadDensity.py # Altered on: 2015-6-8 10:00 am # Created by: L.M.Blackburn # ArcGIS 10.1 # Notes: Buffers and envelopes were created in first run-through of this code. # Now, I calculate length for the clipped roads by adding # a field and calculating geometry. Run statistics to get the sum length # of roads in each square. export sum length, block name, centroid of # square out to a text file. I'm making edits on this version to run # the code on Citrix and to speed up the entire process using dictionary # keys instead of nested cursors. # (2nd script for 4th part of analysis) # Needed data: Order sublayer from VRP # List of fields to drop # Roads network to get length data # --------------------------------------------------------------------------- # Import arcpy module import arcpy, os, sys, traceback, csv from time import strftime # Set the necessary product code import arceditor import arcinfo # Set workspace environment - this is where the solved route layers are located arcpy.env.workspace = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Orders\\' arcpy.env.overwriteOutput = True # --------------------Set local variables ------------------ # ---workspace for order layers ordersLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Orders\\' # ---workspace for buffer layers bufferLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Buffers\\' # ---set default buffer distance bufDefault = "6000 Meters" # ---workspace for envelope layers envelopeLyr = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Envelope\\' # ---workspace for route layers routeWkspc = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\Routes\\' # ---set workspace path for clipped roads outDataPath = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\ClipRoad\\' # ---set road layer - for use in clip & length Road = 'F:\\Workspace\\Sandy\\GM_costAnalysis\\RoadsForSnap\\SDC Edge Source.shp' # ---fields to delete from Orders layers dropFields = ["Descriptio", "ServiceTim", "TimeWind_1", "TimeWind_2", "TimeWind_3", "TimeWindow", "MaxViolati", "MaxViola_1", \ "PickupQuan", "DeliveryQu", "Revenue", "SpecialtyN", "Assignment", "RouteName", "ViolatedCo", "CumulTrave", \ "CumulDista", "CumulTime", "ArriveCurb", "DepartCurb", "DepartTime", "WaitTime", "ViolationT", "CumulWaitT", \ "CumulViola", "ArriveTime", "SourceID", "SourceOID", "PosAlong", "SideOfEdge", "CurbApproa", "Status"] # ---fields for CSV file fieldNames = ['Block', 'Scale', 'BufferType', 'Sequence', 'FromPrevTr', 'FromPrevDi', 'x', 'y', 'bufferDist','totTime', \ 'totDist', 'totalRdLength', '\n'] # ---List of fields from envelope layer from which to extract data sourceFieldsList = ["Sequence", "FromPrevTr", "FromPrevDi", "POINT_X", "POINT_Y", "BUFF_DIST"] try: print 'Script started at: ' + strftime("%Y-%m-%d %H:%M:%S") print 'Opening CSV file...' # Step 1 - Create table for output data # Needed fields: From script [Block, Scale, Sequence, FromPrevTr, FromPrevDi, totTime, totDist, x, y, totalRdLength] with open('F:\\Workspace\\Sandy\\GM_costAnalysis\\analysisMay15cont\\AllTrapsData_June8_test.csv', "wb") as f: w = csv.writer(f, delimiter=',', lineterminator='\n') w.writerow(fieldNames) # Calculate totalTime and totalDistance using insert cursor # loop through each record in the table - calculate values and... # use selected features to clip roads layer & calculate geometry # add values to the CSV table print 'Populating CSV file...' envelopeList = [] for dirpath, dirnames, filenames in arcpy.da.Walk(envelopeLyr, datatype = 'FeatureClass', type = 'Polygon'): for filename in filenames: envelopeList.append(os.path.join(filename)) for eLayer in envelopeList: eLayerName = eLayer.split('_') blockEID = eLayerName[1] scaleEID = eLayerName[2].rstrip('.shp') bufEID = eLayerName[0] #print 'eLayer: ' + eLayer + ' Block: ' + blockEID + ' Scale: ' + scaleEID + ' BufferType: ' + bufEID eLyrPath = envelopeLyr + eLayer #Make a layer from the feature class - needed for selecting records to be used in the clipping arcpy.MakeFeatureLayer_management(eLyrPath, "clipLayer") #-----------------------pasted code below---------------------- sourceFC = None #select corresponding route layer file and grab sublayer: DepotVisits print 'extracting time & distance to points. eLayer: ' + eLayer + ' Block: ' + blockEID + ' Scale: ' + scaleEID + ' BufferType: ' + bufEID routeLyr = arcpy.mapping.Layer(routeWkspc + "TrapRoute_" + blockEID + "_" + scaleEID + ".lyr") if routeLyr.isGroupLayer: for sublyrs in routeLyr: # print sublyrs.name if sublyrs.name == 'Depot Visits': arcpy.MakeFeatureLayer_management(sublyrs, "depotVisits") sourceFC = "depotVisits" qLast = "= 2" expression2 = arcpy.AddFieldDelimiters("depotVisits", "VisitType")+qLast depotFieldsList = ["VisitType", "FromPrevTravelTime", "FromPrevDistance"] depotVisitsDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, depotFieldsList, expression2)} sourceFC = eLyrPath # Use list comprehension to build a dictionary from a da SearchCursor sequenceDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)} for key in sequenceDict: dataList = [] dataList.append(str(blockEID)) dataList.append(str(scaleEID)) dataList.append(str(bufEID)) dataList.append(str(key)) dataList.append(str(list(sequenceDict[key])[0])) dataList.append(str(list(sequenceDict[key])[1])) dataList.append(str(list(sequenceDict[key])[2])) dataList.append(str(list(sequenceDict[key])[3])) dataList.append(str(list(sequenceDict[key])[4])) n = int(key) sN = "=" + str(n) n2 = n +1 sel = "=" + str(n2) prevTime = float(list(sequenceDict[key])[1]) prevDist = float(list(sequenceDict[key])[2]) # print sN postTime = "" # use a nested search to get values for total time and total distance # this loop is not running - I am not getting these values written to # the CSV table if str(n2) in sequenceDict: values = list(sequenceDict[str(n2)]) for row2 in values: postTime = row2[1] totTime = prevTime + float(postTime) print totTime dataList.append(str(totTime)) postDist = row2[2] # print postDist totDist = prevDist + float(postDist) # print totDist dataList.append(str(totDist)) # if above for loop yeilds no results, calculate postTime and postDist differently # grab DepotVisits layer - if VisitType = "End" then grab values for "FromPrevTravelTime" & "FromPrevDistance" if postTime == "": #select corresponding route layer file and grab sublayer: DepotVisits print "extracting time & distance to Depot" if 2 in depotVisitsDict: row3 = list(depotVisitsDict[2]) postTime2 = row3[1] totTime2 = prevTime + float(postTime2) print ("Depot Visit Time: {0}, {1}, {2}".format(prevTime, postTime2, totTime2)) dataList.append(str(totTime2)) postDist2 = row3[2] totDist2 = prevDist + float(postDist2) # print ("Depot Visit Distance: {0}, {1}, {2}".format(prevDist, postDist2, totDist2)) dataList.append(str(totDist2)) # ----------------------pasted code above---------------------- print "Sequence: " + sN + ", Block: " + blockEID + ", Scale: " + scaleEID # select single record to use when clipping - this selection must be on a layer not fc # always create where clause using AddFieldDelimiters seqField = "Sequence" where_clause = arcpy.AddFieldDelimiters("clipLayer", seqField)+ sN arcpy.SelectLayerByAttribute_management("clipLayer", "NEW_SELECTION", where_clause) # Clip roads w/ selected polygon - for this I will need a search cursor clipRoads = outDataPath + 'rdsClip_' + blockEID + '_' + scaleEID + '_' + bufEID + '_' + str(n) arcpy.Clip_analysis(Road, "clipLayer", clipRoads) clipRoadsShp = clipRoads + ".shp" # Use geometry/length to get the total length of clipped roads(in meters) print "calculating road lengths...." g = arcpy.Geometry() geometryList = arcpy.CopyFeatures_management(clipRoadsShp, g) length = 0 for geometry in geometryList: length +=geometry.length # append length (meters) at end of line to csv dataList dataList.append(str(length)) # dataList.append('\r\n') print length # Write dataList to csv file ---- may need to dedent this one more time --- w.writerow(dataList) # print dataList print 'Script completed at: ' + strftime("%Y-%m-%d %H:%M:%S") f.close() except: f.close() print 'Program failed at: ' + strftime("%Y-%m-%d %H:%M:%S") print 'Check python errors.' tb = sys.exc_info()[2] tbinfo = traceback.format_tb(tb)[0] pymsg = "PYTHON ERRORS:\nTraceback Info:\n" + tbinfo + "\nError Info:\n " + str(sys.exc_type) + ": " + str(sys.exc_value) + "\n" msgs = "ARCPY ERRORS:\n" + arcpy.GetMessages(2) + "\n" arcpy.AddError(msgs) arcpy.AddError(pymsg) print msgs print pymsg arcpy.AddMessage(arcpy.GetMessages(1)) print arcpy.GetMessages(1) # print ("Points to select: {0}, Radius: {1}".format(centroidPath, radius))
The majority of the code seems to be working. However, I am not getting any values returned from the if statement on line 133. Any ideas on what might be happening here? What I am doing in that loop is finding the next point in the route (sequence +1) and grabbing the values for "FromPrevTravelTime" and "FromPrevDistance" from the same layer or the sequenceDict . Should this if statement be changed to a for statement?
I am also wondering if I should order the sequence dictionary? Would this save on processing time?
Are you sure that the sequence field is a string field? I just made that assumption, but it probably is wrong. If sequence is a numeric field then the key is wrong at lines 133 and 134. The code would have to be:
if n2 in sequenceDict:
values = list(sequenceDict[n2])
Also, if you want to process the dictionary sorted, you don't sort the dictionary, since it can't be sorted. You sort the list of dictionary keys instead. I don't recommend using an OrderedDictionary, since it isn't necessary for this process. So line 110 would be (assuming the sequence key is numeric and not string):
for key in sorted(sequenceDict.keys()):
This won't have any noticeable effect the speed of the code, but it will output the data in the order that you probably want.
Thanks again Richard!
The sequence field is a short integer. So, by changing it from a string to a number we are now going into that loop, however, I get a queue.empty error at line 136 postTime=row2[1]. Could you describe what we are doing in line 134? There should only be one record where n2 is in the sequenceDict. What I think line 134 does, is put all of the records for n2 into a list. Then postTime would be the second field in the sequenceDict where the first field = n2.
Also, the speed on this script still seems to be slow. I am pretty sure this is due to the clipping of roads and calculating the road length. Is this something that could be done in-memory to help with the speed? Would the syntax be something like below to replace lines 173-175:
clipRoadsShp = arcpy.Clip_analysis(Road, "clipLayer", "in_memory")
okay, I have all of the dictionaries working and my CSV file is being populated correctly. I altered the road clipping to use "in_memory" processing too. But, I still seem to be dealing with slow processing speeds. It seems like it will take many days to complete.
I tried to see if it would work with the few changes I made, but by changing from a one to many dictionary to a one to one dictionary I need to make more changes. This code should access the FromPrevTr and FromPrevDi fields.
if n2 in sequenceDict: postTime = sequenceDict[n2][0] totTime = prevTime + float(postTime) print totTime dataList.append(str(totTime)) postDist = sequenceDict[n2][1] # print postDist totDist = prevDist + float(postDist) # print totDist dataList.append(str(totDist))
What is your clip layer? Before I completely rebuild this section I need to better understand what is going on. I would load cliplayer into a dictionary and include its sequence and Shape@ fields once at the beginning of the script for fast access if it has many records. The geometry for the n2 sequence record can be fed directly to the clip process based on the first example in this help post (ignore the array, it can come from the shape@ field)
For using the in_memory workspace you should get a better understanding (which I also need). Start by looking at the help here. You use in_memory as a path and you still need a slash and a feature class name after it for output. It is more like a geodatabase (which you should convert to anyway, since shapefile performance sucks). So it would be more like changing line 173 to:
clipRoads = r"in_memory\" + 'rdsClip_' + blockEID + '_' + scaleEID + '_' + bufEID + '_' + str(n)
You would not add .shp to the end of the feature class name, since it is not outputting a shapefile. If you use the in_memory workspace and do not actually need the output of the clip permanently, you should get rid of the individualized feature class names and should delete the feature class with arcpy.Delete_management() as soon as you are done with it to keep the memory clean.
Why do you use CopyFeatures to a geometry object and not a cursor to read through the geometry lengths? Duplicating the feature again is a waste of time as far as I can see.
Anyway, clipping one shape at a time will be a slow process, but it can be sped up. You should print a time stamp and time difference before and after the actual clip to find out how much time is being eaten by each clip operation. Only by batch processing many objects in one clip are you likely to speed that up significantly, since the set up time for each clip is probably what is killing your time. You should still notice a speed improvement in the section of code we have revised. In fact you should start adding print statements to show time spent in each key section if you are trying to find the most time consuming operations in your code.
You also need to evaluate what is the maximum time that this process can take for it to actually be useful to you. I have had to abandon several approaches I spent considerable time developing when the time spent doing them did not pay off sufficiently. I usually can rethink it eventually to hit my performance tolerance and get a close enough result that I can live with. If your process cannot perform acceptably because of an operation you cannot control (the clip), you may want to consider alternative processes that may be less precise, but that may produce a more worthwhile output in light of the performance improvements they offer.