Select to view content in your preferred language

Python Script to Identify Last of each group

2147
14
Jump to solution
06-03-2013 09:02 AM
KarlCox
Emerging Contributor
I have a shapefile of points which is GPS data from a number of collared animals.   I am trying to write a script to add a column and flag the last entry for each animal so that it can be easily queried and displayed on the map.   It seems like the script is working properly according to the debug results, but it does not save the results to the table.    I have done quite a bit of scripting with VBA but this is my first attempt at python.    See Script Below:

Thanks!


import arcpy
import os
import types
import arcgisscripting
from arcpy import env
#get Parameters from toolbox

#-For Use with Script Toolbox
#inPts = arcpy.getparameterastext(0)
#Animalfield=arcpy.getparameterastext(1)
#SortField=arcpy.getparameterastext(2)

#-For Troubleshooting
inPts = "C:\ArcGIS\Temp\Tester.shp"
AnimalField = "AnimalNum"
SortField = "MSTTime"


#Add Field

print "Beginning:"
arcpy.AddField_management(inPts,"LAST_PNT","TEXT",)

ThesortString = AnimalField + " A; " + SortField + " A"

# Apply Sort
#firstTime=True
gp=arcgisscripting.create()
rows = gp.UpdateCursor(inPts,"","","",ThesortString)
print "Finished Sorting"
row = rows.Next()
now = row.GetValue(AnimalField)
if row:
    row.setValue("LAST_PNT","FALSE")
    rows.UpdateRow(row)
    row=rows.Next()

while row :
    previous=now
    now = row.GetValue(AnimalField)
   
    if now == previous:
       
        row.setValue("LAST_PNT","FALSE")
        print row.getvalue("LAST_PNT")
    else :
       
        row.setValue("LAST_PNT","TRUE")
        print row.getvalue("LAST_PNT")
        rows.UpdateRow(row)
    row = rows.Next()

del row, rows

print "Completed!"
Tags (2)
0 Kudos
14 Replies
KarlCox
Emerging Contributor
Here is the latest version....It is getting the job done but is still slow.   I am intrigued by rzufelt's suggestion of using the searchcursor, but if I understand correctly the searchcursor is read only so I am not sure how I can then work with the LAST_PNT field?

I removed any print functions and it is still not much faster.   I will try a combining the 2 sort fields into one to see if that speeds it up.


lastValue = arcpy.SearchCursor(infc, "", "", "", OID + " D").next().getValue(myField) #Get 1st row in cursor - gets last value based on OID

firstValue = arcpy.SearchCursor(infc, "", "", "", OID + " A").next().getValue(myField) #Get last row in cursor - gets first value based on OID



Working Script
import arcpy
import os
import types
from arcpy import env
#get Parameters from toolbox

#-For Use with Script Toolbox
#inPts = arcpy.getparameterastext(0)
#Animalfield=arcpy.getparameterastext(1)
#SortField=arcpy.getparameterastext(2)

#-For Troubleshooting
inPts = "C:\ArcGIS\Temp\Tester.shp"
AnimalField = "AnimalNum"
SortField = "MSTTime"


#Add Field
print "Beginning:"
arcpy.DeleteField_management(inPts,"LAST_PNT")
arcpy.AddField_management(inPts,"LAST_PNT","SHORT",)
ThesortString = AnimalField + " A; " + SortField + " D"

# Apply Sort
rows = arcpy.UpdateCursor(inPts,"","","",ThesortString)
print "Finished Sorting"

i=0
lastRow = "Starter"

for row in rows:
    #print i
    #i=i+1
    if row.getValue(AnimalField) == lastRow:
        #row.setValue("LAST_PNT",0)
        lastRow=row.getValue(AnimalField)
        rows.updateRow(row)
        
    else:
        row.setValue("LAST_PNT",1)
        lastRow=row.getValue(AnimalField)
        rows.updateRow(row)

del row
del rows

print "Completed!"
0 Kudos
KarlCox
Emerging Contributor
The combined sort field was slightly faster but that is apparently not the holdup.
0 Kudos
RichardFairhurst
MVP Alum
Here is the latest version....It is getting the job done but is still slow.   I am intrigued by rzufelt's suggestion of using the searchcursor, but if I understand correctly the searchcursor is read only so I am not sure how I can then work with the LAST_PNT field?

I removed any print functions and it is still not much faster.   I will try a combining the 2 sort fields into one to see if that speeds it up.


lastValue = arcpy.SearchCursor(infc, "", "", "", OID + " D").next().getValue(myField) #Get 1st row in cursor - gets last value based on OID  firstValue = arcpy.SearchCursor(infc, "", "", "", OID + " A").next().getValue(myField) #Get last row in cursor - gets first value based on OID



Working Script
import arcpy import os import types from arcpy import env #get Parameters from toolbox  #-For Use with Script Toolbox #inPts = arcpy.getparameterastext(0) #Animalfield=arcpy.getparameterastext(1) #SortField=arcpy.getparameterastext(2)  #-For Troubleshooting inPts = "C:\ArcGIS\Temp\Tester.shp" AnimalField = "AnimalNum" SortField = "MSTTime"   #Add Field print "Beginning:" arcpy.DeleteField_management(inPts,"LAST_PNT") arcpy.AddField_management(inPts,"LAST_PNT","SHORT",) ThesortString = AnimalField + " A; " + SortField + " D"  # Apply Sort rows = arcpy.UpdateCursor(inPts,"","","",ThesortString) print "Finished Sorting"  i=0 lastRow = "Starter"  for row in rows:     #print i     #i=i+1     if row.getValue(AnimalField) == lastRow:         #row.setValue("LAST_PNT",0)         lastRow=row.getValue(AnimalField)         rows.updateRow(row)              else:         row.setValue("LAST_PNT",1)         lastRow=row.getValue(AnimalField)         rows.updateRow(row)  del row del rows  print "Completed!"


If speed is your concern, are your sort fields indexed?  Python may not take full advantage of them, but the Summary Statistics with a Join and Select where < MAX_MSTTime then field calculator then invert selection then field calculator would finish this is under 2 minutes for 20,000 records.  If you don't want a permanent output do the Summary Statistics to an in memory table.

You are reinventing the wheel from my perspective, since the above workflow has been optimized (at least as of 10.1).  It will beat the performance of any sorted Python cursor easily, since those cursor are not optimized at all for using indexes in my experience.
0 Kudos
KarlCox
Emerging Contributor
If speed is your concern, are your sort fields indexed?  Python may not take full advantage of them, but the Summary Statistics with a Join and Select where < MAX_MSTTime then field calculator then invert selection then field calculator would finish this is under 2 minutes for 20,000 records.  If you don't want a permanent output do the Summary Statistics to an in memory table.

You are reinventing the wheel from my perspective, since the above workflow has been optimized (at least as of 10.1).  It will beat the performance of any sorted Python cursor easily, since those cursor are not optimized at all for using indexes in my experience.



This actually worked well too...much appreciated.   Although I would still like to develop the script to add to a toolbox of custom tools...but unless I can get it working much quicker this technique is the way to go.  

Thanks!
0 Kudos
ChrisSnyder
Honored Contributor
Using the cursor sort parameter is "slow"... Assuming you have v10.1, this would be the fastest (untested) way I can think of doing it. Otherwise if you have v10.0, you are best off using the SummaryStats tool... Another thing to consider in v10.1 is that the sort option for the .da cursors ONLY works with Geodatabases (not shapefiles, INFO tables, etc.). That said, since this method doesn't rely on the cursor sort option, shapefiles would work fine (actually a bit faster usually than FGDB).

inPts = "C:\ArcGIS\Temp\Tester.shp"
animalField = "AnimalNum"
sortField = "MSTTime"
arcpy.AddField_managment(inPnts, "LAST_FLAG", "SHORT")
animalDateDict = {}
searchRows = arcpy.da.SearchCursor(inPnts, [OID@, animalField, sortField])
for searchRow in searchRows:
   oidValue, animalValue, sortValue = searchRow
   if animal in animalDateDict:
      animalDateDict[animal].append(sortValue, oidValue)
   else:
      animalDateDict[animal] = [(sortValue, oidValue)]
del searchRow, searchRows
lastOidsList = [max(animalDict[animal])[-1] for animal in animalDict]
oidFieldName = arcpy.Describe(inPnts).oidFieldName
queryExp = oidFieldName + " in (" + ",".join(str(lastOid) for lastOid in lastOidsList) + ")"
updateRows = arcpy.da.UpdateRows(inPnts, ["LAST_FLAG"], queryExp)
for updateRow in updateRows:
    updateRow[0] = 1
    updateRows.updateRow(updateRow)
del updateRow, updateRows
0 Kudos