Code stops after 39,898 updates.

Anonymous User · ‎05-27-2014

Original User: armein01

Hi again,

I am trying to develop an application which dissolves adjacent polygons and then assigns a new unitID for the dissolved polys.

uRows = arcpy.SearchCursor("wtlndUnits1")
    wtlndUnitID = 1
        
    for uRow in uRows:
        arcpy.SelectLayerByLocation_management("wtlnd4Anlys","WITHIN",uRow.shape, "", "NEW_SELECTION")
        numPolys = int(arcpy.GetCount_management("wtlnd4Anlys").getOutput(0))
                
        if numPolys > 1:

            #Get total area using the geometry
            g = arcpy.Geometry()
            totalArea = 0
            gList = arcpy.CopyFeatures_management("wtlnd4Anlys",g)
            
            for geometry in gList:
                totalArea += geometry.area
                
            arcpy.CalculateField_management("wtlnd4Anlys", "UnitAcres", totalArea * 0.00024711, "Python")
            arcpy.CalculateField_management("wtlnd4Anlys", "WtlndUnitID", wtlndUnitID, "PYTHON")
            
            wtlndUnitID += 1
           
    del uRow, uRows

So, Only polygons that have been dissolved are updated with an UnitID. This works, but I started the process Friday and when I returned this morning it was still running then shortly after I came in it stopped.

The acres field was not updated either, but no error was thrown by the code.

Here is a shot of the polygons that were updated.

[ATTACH=CONFIG]34103[/ATTACH]

So do I need to clean up somewhere, so the code can complete, why oh why is it taking so long and last why is it not populating the totalAcres.

This is the first time Ive tried to use geometry, I can switch to a search cursor for the total area.

Thanks
Alicia

KimOllivier · ‎06-03-2014

You don't have to wait over the weekend to know that you have not approached the problem the best way. I have a rule of thumb called the "Cup of Coffee Rule".

If an single process has not finished by the time I have finished my coffee, then I interrupt the process and find a better way.

It must be something seriously wrong since computers are now so fast. Maybe a missing index, working across a network instead of local data, exceeding the size that can fit in memory. Just using a tool with inappropriate tolerances or settings can turn a simple process into something impossible.

In this case I can see the problem at once - you are running geoprocessing tools inside a cursor! They are not designed that way, you are supposed to frame your problem to run a tool over the whole dataset in one pass. Never in a cursor. It is just too slow to restart a process and cleanup. Since the tools are not designed that way, it is likely that garbage collection is not being worked either so you will run out of memory long before you get any useful results. I have never run a tool inside a cursor, there is always a way of avoiding it.

I know it is easy to visualise a problem by considering a feature and then applying the same thing a million times, but this is not the best way usually. Think of the way that SQL queries work. You specify what is to be done with a pair of tables and you let the database engine decide how to do it, you don't even have the luxury of a cursor in the SQL language.

It would be nice if GIS tools had a similar command to SQL called Explain which shows how things are going to be solved. Instead you have to think it through yourself and recognise the patterns that will not work. You need to multiply the numbers of operations together by the expected time to see if it will finish before you retire.

Any 40,000 updates should finish in a few seconds or minutes, to give you a feel for what can be achieved. I know this has not reworked the problem and supplied you with a neat alternative. While I am reviewing your code, I suggest that you don't use CalculateField in a script. It is better left for ModelBuilder. Since it is just wrapping a cursor around an expression, it would be better to just use an expression in the cursor. It is much easier to understand, can be debugged more easily and you can trap unexpected data and recover with if statements. It would also be much faster.

If you are using shapefiles, consider using file geodatabases. Then the area is automatic, you don't have to calculate it.

Anonymous User · ‎06-03-2014

Original User: recurvata

Not withstanding Kim's advice above, you could improve performance a bit by changing this:

uRows = arcpy.SearchCursor("wtlndUnits1")

to this:

with arcpy.SearchCursor("wtlndUnits1") as uRows:

This eliminates the need to manually delete uRows and uRow.

ChrisSnyder · ‎06-03-2014

There are many things you could do to improve the code and make it faster/more efficient. However, your desired end result would probably be best accomplished by just using the SpatialJoin tool (to tag the wetlands with what analysis unit they happen to fall within) and then Frequency tool to (sum the area of the wetlands by wetland analysis unit).

Anonymous User · ‎06-09-2014

Original User: armein01

There are many things you could do to improve the code and make it faster/more efficient. However, your desired end result would probably be best accomplished by just using the SpatialJoin tool (to tag the wetlands with what analysis unit they happen to fall within) and then Frequency tool to (sum the area of the wetlands by wetland analysis unit).

I would really like to learn best practices. Could you clue me in on any specific improvements. I don't expect the code to be rewritten but just some advise.
thanks

ChrisSnyder · ‎06-09-2014

If you have v10.1+, I would advise you to use the "data access" cursors, as they are much faster than the old cursors. Also, your indentation (line 3 in your code) appears to be off - it shouldn't be indented there. Does your "wtlndUnits1" layer have a field indicating the wetland unit, or do you just want to tag them (basically) using the OBJECTID order as you are doing? You shouldn't have to copy the features to get the sum of area. A single pass with an update cursor should be faster than 2 field calcs. Here's a (UNTESTED) re-write assuming you have a "WETLAND_UNIT_ID" field that you are trying to tag your wetlands with. This would be the fastest way I can think of doing this if you want to take a cursor-based Python approach. However, I suspect it might be faster (like I said above) to use a spatial join, summarize the joined areas, and then a field calc.

searchRows= arcpy.da.SearchCursor("wtlndUnits1",["SHAPE@","WETLAND_UNIT_ID")
for searchRow in searchRows:
   shapeObj, wtlndUnitID = searchRow
   arcpy.SelectLayerByLocation_management("wtlnd4Anlys","WITHIN",shapeObj, "", "NEW_SELECTION")
      areaList = [r[0] for r in arcpy.da.SearchRows("wtlnd4Anlys", ["SHAPE@AREA"])]
      totalArea = sum(areaList)
      selectedCount = len(areaList)
      if selectedCount > 1:
         updateRows = arcpy.da.UpdateCursor("wtlnd4Anlys", ["UnitAcres","WtlndUnitID"])
         for updateRow in updateRows:
            updateRow[0] = totalArea * 0.00024711
            updateRow[1] = wtlndUnitID
            updateRows.updateRow(updateRow)
         del updateRows, updateRows
   del searchRow, searchRows