Looking for ways to optimize painfully slow code:

JeremyRead · ‎06-07-2013

We are trying to accomplish the following:

1. Water Mains have a Pressure Zone assigned to them depending on what zone they are located in.

2. The junction/point features (valves, fittings, etc.) have attributes called PRESSUREZONE1 and PRESSUREZONE2.

3. The Pressure Zone values of the mains connected to the junction determine what its PRESSUREZONE1/2 values are. If all the mains connected have the same Pressure Zone value, then PRESSUREZONE1 is that value and PRESSUREZONE2 is Null. If the mains have two or more values, PRESSUREZONE1 is one value, and PRESSUREZONE2 is one of the other values (there are no specific criteria for what to do if there are 3 ore more different Pressure Zones, and there is not a field to hold the extra values anyway.

I have started a script to assign the PRESSUREZONE1/2 values to the junctions programmatically, and am just testing it with two featureclasses so far: ZoneValves and Mains. The script does work, but it is very slow and takes about 1-second per feature to process. This is a big problem as some of the junction featureclasses have tens of thousands of records.

Here is what I have done so far:

# Import arcpy module
import arcpy, string

# Local variables:
Main = "Database Connections\\gissde.sde\\CITY.Water\\CITY.Main"
ZoneValve = "Database Connections\\gissde.sde\\CITY.Water\\CITY.ZoneValve"

arcpy.MakeFeatureLayer_management(Main, "Main_Lyr")
arcpy.MakeFeatureLayer_management(ZoneValve, "ZoneValve_Lyr")

rows = arcpy.UpdateCursor("ZoneValve_Lyr")
for row in rows:
    arcpy.SelectLayerByAttribute_management("ZoneValve_Lyr", "NEW_SELECTION", "AW_ID = " + str(row.AW_ID))
    arcpy.SelectLayerByLocation_management("Main_Lyr", "INTERSECT", "ZoneValve_Lyr", "", "NEW_SELECTION")

    mainPZList = []

    mainRows = arcpy.SearchCursor("Main_Lyr")
    for mainRow in mainRows:
        mainPZList.append(mainRow.PRESSUREZONE)

    mainPZList = list(set(mainPZList)) # Remove duplicate values from list

    if len(mainPZList) >= 1:
        row.PRESSUREZONE1 = mainPZList[0]
        rows.updateRow(row)

    if len(mainPZList) > 1:
        row.PRESSUREZONE2 = mainPZList[1]
        rows.updateRow(row)

Basically I am making feature layers out of the featureclasses, creating an Update Cursor to iterate through the junction/point features, then for each feature doing a spatial selection to select the mains that intersect the junction, adding the pressure zones for those mains to a list, using "set" to compress the list to remove duplicate values, then populating the PRESSUREZONE1/2 values of the junction feature based on whether or not one or more Pressure Zone values are found in the list of connected mains.

I see some obvious problems, such as creating a Search Cursor for each junction (tens of thousands of search cursors are created when this is run), however I'm not sure of a faster, more efficient way to do this.

If you guys have any suggestions I would appreciate them very much!

Anonymous User · ‎06-07-2013

This will always be painfully slow as you are performing many operations inside each iteration of each row inside your update cursor. Also, using nested cursors is generally a bad idea. I think there has to be a better way to do this. I am not quite sure I completely follow what you are doing here but I think I have an idea.

I would suggest instead of this process, do a spatial join of your ZoneValves to the Mains and store this in the in_memory workspace.

Next, I would iterate through this in_memory table and build a dictionary using the 'JOIN_FID' and the 'COUNT'.

You could then call your update cursor and if use the dictionary to update the table. I think this process would be much faster.

Is there any chance you could post a small sample of your data?

JeremyRead · ‎06-08-2013

This will always be painfully slow as you are performing many operations inside each iteration of each row inside your update cursor. Also, using nested cursors is generally a bad idea. I think there has to be a better way to do this. I am not quite sure I completely follow what you are doing here but I think I have an idea.

I would suggest instead of this process, do a spatial join of your ZoneValves to the Mains and store this in the in_memory workspace.

Next, I would iterate through this in_memory table and build a dictionary using the 'JOIN_FID' and the 'COUNT'.

You could then call your update cursor and if use the dictionary to update the table. I think this process would be much faster.

Is there any chance you could post a small sample of your data?

Hello Caleb,

This is an excellent idea! I unfortunately cannot post our data on a public forum as that would violate our security rules, however I understand the process you described so I will give that a try. I knew there had to be a better way than using nested cursors 🙂

Thank you very much for the suggestion!