Select to view content in your preferred language

Python Script to Identify Last of each group

2143
14
Jump to solution
06-03-2013 09:02 AM
KarlCox
Emerging Contributor
I have a shapefile of points which is GPS data from a number of collared animals.   I am trying to write a script to add a column and flag the last entry for each animal so that it can be easily queried and displayed on the map.   It seems like the script is working properly according to the debug results, but it does not save the results to the table.    I have done quite a bit of scripting with VBA but this is my first attempt at python.    See Script Below:

Thanks!


import arcpy
import os
import types
import arcgisscripting
from arcpy import env
#get Parameters from toolbox

#-For Use with Script Toolbox
#inPts = arcpy.getparameterastext(0)
#Animalfield=arcpy.getparameterastext(1)
#SortField=arcpy.getparameterastext(2)

#-For Troubleshooting
inPts = "C:\ArcGIS\Temp\Tester.shp"
AnimalField = "AnimalNum"
SortField = "MSTTime"


#Add Field

print "Beginning:"
arcpy.AddField_management(inPts,"LAST_PNT","TEXT",)

ThesortString = AnimalField + " A; " + SortField + " A"

# Apply Sort
#firstTime=True
gp=arcgisscripting.create()
rows = gp.UpdateCursor(inPts,"","","",ThesortString)
print "Finished Sorting"
row = rows.Next()
now = row.GetValue(AnimalField)
if row:
    row.setValue("LAST_PNT","FALSE")
    rows.UpdateRow(row)
    row=rows.Next()

while row :
    previous=now
    now = row.GetValue(AnimalField)
   
    if now == previous:
       
        row.setValue("LAST_PNT","FALSE")
        print row.getvalue("LAST_PNT")
    else :
       
        row.setValue("LAST_PNT","TRUE")
        print row.getvalue("LAST_PNT")
        rows.UpdateRow(row)
    row = rows.Next()

del row, rows

print "Completed!"
Tags (2)
0 Kudos
1 Solution

Accepted Solutions
RichardFairhurst
MVP Alum
Here is the latest version....It is getting the job done but is still slow.   I am intrigued by rzufelt's suggestion of using the searchcursor, but if I understand correctly the searchcursor is read only so I am not sure how I can then work with the LAST_PNT field?

I removed any print functions and it is still not much faster.   I will try a combining the 2 sort fields into one to see if that speeds it up.


lastValue = arcpy.SearchCursor(infc, "", "", "", OID + " D").next().getValue(myField) #Get 1st row in cursor - gets last value based on OID  firstValue = arcpy.SearchCursor(infc, "", "", "", OID + " A").next().getValue(myField) #Get last row in cursor - gets first value based on OID



Working Script
import arcpy import os import types from arcpy import env #get Parameters from toolbox  #-For Use with Script Toolbox #inPts = arcpy.getparameterastext(0) #Animalfield=arcpy.getparameterastext(1) #SortField=arcpy.getparameterastext(2)  #-For Troubleshooting inPts = "C:\ArcGIS\Temp\Tester.shp" AnimalField = "AnimalNum" SortField = "MSTTime"   #Add Field print "Beginning:" arcpy.DeleteField_management(inPts,"LAST_PNT") arcpy.AddField_management(inPts,"LAST_PNT","SHORT",) ThesortString = AnimalField + " A; " + SortField + " D"  # Apply Sort rows = arcpy.UpdateCursor(inPts,"","","",ThesortString) print "Finished Sorting"  i=0 lastRow = "Starter"  for row in rows:     #print i     #i=i+1     if row.getValue(AnimalField) == lastRow:         #row.setValue("LAST_PNT",0)         lastRow=row.getValue(AnimalField)         rows.updateRow(row)              else:         row.setValue("LAST_PNT",1)         lastRow=row.getValue(AnimalField)         rows.updateRow(row)  del row del rows  print "Completed!"


If speed is your concern, are your sort fields indexed?  Python may not take full advantage of them, but the Summary Statistics with a Join and Select where < MAX_MSTTime then field calculator then invert selection then field calculator would finish this is under 2 minutes for 20,000 records.  If you don't want a permanent output do the Summary Statistics to an in memory table.

You are reinventing the wheel from my perspective, since the above workflow has been optimized (at least as of 10.1).  It will beat the performance of any sorted Python cursor easily, since those cursor are not optimized at all for using indexes in my experience.

View solution in original post

0 Kudos
14 Replies
LindseyWood
Deactivated User
Something like this should work  😮

f __name__ == '__main__':
    
    
    inPts = r"testLast.shp"
    try:
        count = arcpy.GetCount_management(inPts).getOutput(0)
        lastRow = int(count) - 1
        print lastRow
        arcpy.AddField_management(inPts, "last", "SHORT","","", 3)
        rows = arcpy.UpdateCursor(inPts)
        for row in rows:
            print row.getValue("FID")
            if row.getValue("FID") == lastRow:
                print row.getValue("FID")
                print "Found it"
                row.last = lastRow ##Put what you like here I like 1's or 0's but used last row num for testing
                rows.updateRow(row)
                print "done"
        del rows
    except:
        print arcpy.GetMessages()
        print traceback.print_exc()
0 Kudos
KarlCox
Emerging Contributor
Thanks,   your script didn't quite work for me but it got me to the point where I could get it working...The script below works for me...but there is now a performance issue.   This seems to take much longer than it should to process.   It probably takes about 3-4 minutes to process only 1000 records.   I have similar functions in Excel that can do this in a few seconds.  Given that my actual dataset has about 20,000+ records, I don't know how useful this script will really be.   Does anybody have any ideas on how to speed this up?
import arcpy
import os
import types
from arcpy import env
#get Parameters from toolbox

#-For Use with Script Toolbox
#inPts = arcpy.getparameterastext(0)
#Animalfield=arcpy.getparameterastext(1)
#SortField=arcpy.getparameterastext(2)

#-For Troubleshooting
inPts = "C:\ArcGIS\Temp\Tester.shp"
AnimalField = "AnimalNum"
SortField = "MSTTime"

#Add Field
print "Beginning:"
arcpy.AddField_management(inPts,"LAST_PNT","TEXT",)

ThesortString = AnimalField + " A; " + SortField + " D"

# Apply Sort
    
rows = arcpy.UpdateCursor(inPts,"","","",ThesortString)
print "Finished Sorting"

thecount = arcpy.GetCount_management(inPts).getOutput(0)
i=0
lastRow = "Starter"
for row in rows:
    print i
    i=i+1
    #print str(count)
    #thecount = thecount - 1
    if row.getValue(AnimalField) == lastRow:
        row.setValue("LAST_PNT","FALSE")
        lastRow=row.getValue(AnimalField)
        rows.updateRow(row)
        
    else:
        row.setValue("LAST_PNT","TRUE")
        lastRow=row.getValue(AnimalField)
        rows.updateRow(row)

del row
del rows
0 Kudos
RhettZufelt
MVP Notable Contributor
Can use a sort field in the Cursors.  Below sorts by OID (so the table order is maintained) then sets the variable equal to the last or first table ojbect, depending on which you use.

Handy for grabbing fist/last or max/min (just change sort field) values in a table.


lastValue = arcpy.SearchCursor(infc, "", "", "", OID + " D").next().getValue(myField) #Get 1st row in cursor - gets last value based on OID

firstValue = arcpy.SearchCursor(infc, "", "", "", OID + " A").next().getValue(myField) #Get last row in cursor - gets first value based on OID



R_
0 Kudos
curtvprice
MVP Alum
Does anybody have any ideas on how to speed this up?


You're doing an updateRow when you don't have to; it would be more efficient to start by initilizing all rows to "FALSE" first and dropping your else clause. This still uses a cursor behind the scenes but it runs through quickly as there is no sort.

arcpy.AddField_management(inPts,"LAST_PNT","TEXT",)
arcpy.CalculateField_management(inPts,"LAST_PNT", "'TRUE'", "PYTHON")


If you're using ArcGIS 10.1, there is a new cursor function available that is 10x-100x faster: arcpy.da.SearchCursor.

A whole different approach would be to locate your points of interest using the Summary Statistics tool instead of using a sorted cursor. You could then get the Animal ID and max time in the results, join to your point shapefile and tag using Calculate Field. If you're using 10.0, I recommend that approach.

By the way, please [thread=48475]use code tags[/thread] so we can see your Python code!
0 Kudos
RhettZufelt
MVP Notable Contributor
Does anybody have any ideas on how to speed this up?
import arcpy
import os
import types
from arcpy import env
#get Parameters from toolbox

#-For Use with Script Toolbox
#inPts = arcpy.getparameterastext(0)
#Animalfield=arcpy.getparameterastext(1)
#SortField=arcpy.getparameterastext(2)

#-For Troubleshooting
inPts = "C:\ArcGIS\Temp\Tester.shp"
AnimalField = "AnimalNum"
SortField = "MSTTime"

#Add Field
print "Beginning:"
arcpy.AddField_management(inPts,"LAST_PNT","TEXT",)

ThesortString = AnimalField + " A; " + SortField + " D"

# Apply Sort
    
rows = arcpy.UpdateCursor(inPts,"","","",ThesortString)
print "Finished Sorting"

thecount = arcpy.GetCount_management(inPts).getOutput(0)
i=0
lastRow = "Starter"
for row in rows:
    print i
    i=i+1
    #print str(count)
    #thecount = thecount - 1
    if row.getValue(AnimalField) == lastRow:
        row.setValue("LAST_PNT","FALSE")
        lastRow=row.getValue(AnimalField)
        rows.updateRow(row)
        
    else:
        row.setValue("LAST_PNT","TRUE")
        lastRow=row.getValue(AnimalField)
        rows.updateRow(row)

del row
del rows


In my 10.1 scripts, I do use the da.searchcursor, but in this case, I really don't see a noticable improvement in speed.  ALSO, there is a bug/known limitation in the da.UpdatCursor and will not work on non SDE registered database.

Also, this original SearchCursor is much easier to code.

As far as speeding things up, comment out/remove your print statements, especially the ones inside a loop.
doesn't seem like that would be much of a load, but I have scripts that take orders of magnitude longer to run with my debugging print statements than without them.

I comment out all print statements once the script is running right.

Actually, I put something similar to this in my scripts:

debug = "n"

if debug == "y":print "Processing ",infc



That way, I just change the value of debug if I want it to print my statements or not.

Of course, even faster if you do the cursor/sort/grab first/last value and assign to variable in one swoop as in my previous post.

R_
0 Kudos
curtvprice
MVP Alum
Karl, I found the bug in your original script:

    if now == previous:
        row.setValue("LAST_PNT","FALSE")
        print row.getvalue("LAST_PNT")
    else:
        row.setValue("LAST_PNT","TRUE")
        print row.getvalue("LAST_PNT")
    rows.UpdateRow(row)  # note indent fix
    row = rows.Next()
0 Kudos
LindseyWood
Deactivated User
I agree with Curt that is why using 0's and 1's is easier if possible for true in false it takes longer to do text and you would not need field calculator the automatic value if 0 when a numeric field is created.  But field calculator would be much faster if you need the text values. Taking any of the looping print statements out speeds it up as well.

With printing the id out in mine 6,400 records still takes 30s if I remove the print it takes 0.047
so big improvement

Looks like your sort may be slowing it down the most is my guess because your joining two fields so it is processing the field information and sorting at the same time....if you added a field with this definition using "field_calc" for the sort parameters then used that as the sort field it may speed things up as well.


AnimalField = "AnimalNum"
SortField = "MSTTime"

#Add Field
print "Beginning:"
arcpy.AddField_management(inPts,"LAST_PNT","TEXT",)
arcpy.AddField_management(inPts,"sort_field","TEXT",)
arcpy.CalculateFieldManagement(sort_field......AnimalField + " A; " + SortField + " D")... "sudo code"

ThesortString = "sort_field"
0 Kudos
RichardFairhurst
MVP Alum
Thanks,   your script didn't quite work for me but it got me to the point where I could get it working...The script below works for me...but there is now a performance issue.   This seems to take much longer than it should to process.   It probably takes about 3-4 minutes to process only 1000 records.   I have similar functions in Excel that can do this in a few seconds.  Given that my actual dataset has about 20,000+ records, I don't know how useful this script will really be.   Does anybody have any ideas on how to speed this up?
import arcpy
import os
import types
from arcpy import env
#get Parameters from toolbox

#-For Use with Script Toolbox
#inPts = arcpy.getparameterastext(0)
#Animalfield=arcpy.getparameterastext(1)
#SortField=arcpy.getparameterastext(2)

#-For Troubleshooting
inPts = "C:\ArcGIS\Temp\Tester.shp"
AnimalField = "AnimalNum"
SortField = "MSTTime"

#Add Field
print "Beginning:"
arcpy.AddField_management(inPts,"LAST_PNT","TEXT",)

ThesortString = AnimalField + " A; " + SortField + " D"

# Apply Sort
    
rows = arcpy.UpdateCursor(inPts,"","","",ThesortString)
print "Finished Sorting"

thecount = arcpy.GetCount_management(inPts).getOutput(0)
i=0
lastRow = "Starter"
for row in rows:
    print i
    i=i+1
    #print str(count)
    #thecount = thecount - 1
    if row.getValue(AnimalField) == lastRow:
        row.setValue("LAST_PNT","FALSE")
        lastRow=row.getValue(AnimalField)
        rows.updateRow(row)
        
    else:
        row.setValue("LAST_PNT","TRUE")
        lastRow=row.getValue(AnimalField)
        rows.updateRow(row)

del row
del rows


If it is that slow just run Summary Statistics and get the unique result by AnimalNum for the max MSTTime.  Should take about a minute to run.  You cannot beat it for efficiency.
0 Kudos
curtvprice
MVP Alum
Actually, I put something similar to this in my scripts:

debug = "n"
if debug == "y":print "Processing ",infc



Even more Pythonesque:

debug = False
...
if debug: print "Processing ", infc
0 Kudos