Subtract two specific values

ClintonCooper1 · ‎10-25-2013

I have a table that has several records that I have given a sequence number to based on the find identical tool that creates a new field which creates a new sequence number if they are identical. Some features are unique while others may have up to 12 identical features. I created a third filed which is a concatenation of the feat_seq field and an age field of which I can sort to get an order of age from oldest to youngest. I would like to subtract the first two values, the oldest from the second oldest while not preforming this subtraction if there is less than 1 identical feature. Additionally, if there are more than 3 identical features, it would only subtract the first two and none of the other ones. Here is a sample of what the data might look like:

id age newid difference in age
2257 72 225772 n/a
2257 62 225762 10
2258 45 225845 0
2259 58 225958 n/a
2259 57 225957 1
2259 27 225927 n/a
2259 24 225924 n/a

I have found some code/thread (http://forums.arcgis.com/threads/28786-Need-Python-Example-sort-rows-subtract-current-from-next-load...) that will subtract each value starting from the first value all through the end. I was wondering if it is possible to specify which rows to subtract and which ones not to subtract using python ?

Clinton

JakeSkinner · ‎10-28-2013

I believe the problem was with the SQL Expression. You will need to specify 'FEAT_SEQ' twice in the query:

FEAT_SEQ = " + str(FEAT_SEQ) + "AND YEAR_OF_BIRTH = " + str(list2[0]) + " OR FEAT_SEQ = " + str(FEAT_SEQ) + "AND YEAR_OF_BIRTH = " + str(list2[1])

Try the following:

try:         #update table with difference value         with arcpy.da.UpdateCursor(table, ["FEAT_SEQ", "YEAR_OF_BIRTH", "DIFFERENCE"], "FEAT_SEQ = " + str(FEAT_SEQ) +  "AND YEAR_OF_BIRTH = " + str(list2[0]) + " OR FEAT_SEQ = " + str(FEAT_SEQ) + "AND YEAR_OF_BIRTH = " + str(list2[1])) as cursor:             for row in cursor:                 row[2] = DIFFERENCE                 cursor.updateRow(row)         del row, cursor except:         pass

View solution in original post

MikeMacRae · ‎10-25-2013

Can you give an example of the logic behind which rows your are choosing and which ones you are not?

ClintonCooper1 · ‎10-25-2013

well I would like to subtract the values where the age is the highest, and then the second highest. per my table,

id age newid difference in age
2257 72 225772 n/a
2257 62 225762 10
2258 45 225845 0
2259 58 225958 n/a
2259 57 225957 1
2259 27 225927 n/a
2259 24 225924 n/a

2257 has two values, 72 and 62. I would like to subtract 62 from 72

for 2259, I would like to only subtract 58-57 and not do any calculations on the other values. Could I do a new field what has is a sequential number based on the sort field of new id, and then specify subtract value 2 from 1? What would that python code look like?

Thanks!

JakeSkinner · ‎10-25-2013

Hi Clinton,

Here is an example on how you can do this:

import arcpy
from arcpy import env
env.workspace = r"C:\temp\python\test.gdb"

table = "data"

# create empyt list to append ID field
list = []

# iterate through table and append ID to list
with arcpy.da.SearchCursor(table, ["id"]) as cursor:
    for row in cursor:
        list.append(row[0])

del row, cursor

#remove duplicate IDs
list = dict.fromkeys(list)
list = list.keys()

#iterate through list
for id in list:
    list2 = []
    with arcpy.da.SearchCursor(table, ["id", "age", "difference"], "id = " + str(id)) as cursor:
        for row in cursor:
            #append age to new list
            list2.append(row[1])
    #sort list
    list2.sort()
    try:
        #calculate the difference
        difference = list2[-1] - list2[-2]
    except:
        pass
    del row, cursor
    try:
        #update table with difference value
        with arcpy.da.UpdateCursor(table, ["id", "age", "difference"], "age = " + str(list2[-2]) + " AND id = " + str(id)) as cursor:
            for row in cursor:
                row[2] = difference
                cursor.updateRow(row)
    except:
        pass

del row, cursor

ClintonCooper1 · ‎10-25-2013

Hi Clinton,

Here is an example on how you can do this:

import arcpy
from arcpy import env
env.workspace = r"C:\temp\python\test.gdb"

table = "data"

# create empyt list to append ID field
list = []

# iterate through table and append ID to list
with arcpy.da.SearchCursor(table, ["id"]) as cursor:
    for row in cursor:
        list.append(row[0])

del row, cursor

#remove duplicate IDs
list = dict.fromkeys(list)
list = list.keys()

#iterate through list
for id in list:
    list2 = []
    with arcpy.da.SearchCursor(table, ["id", "age", "difference"], "id = " + str(id)) as cursor:
        for row in cursor:
            #append age to new list
            list2.append(row[1])
    #sort list
    list2.sort()
    try:
        #calculate the difference
        difference = list2[-1] - list2[-2]
    except:
        pass
    del row, cursor
    try:
        #update table with difference value
        with arcpy.da.UpdateCursor(table, ["id", "age", "difference"], "age = " + str(list2[-2]) + " AND id = " + str(id)) as cursor:
            for row in cursor:
                row[2] = difference
                cursor.updateRow(row)
    except:
        pass

del row, cursor

Thanks Jskinn. It works, but I have two questions....

1. It is subtracting from the wrong direction.
I probably should have stated how the actual age is set. It is set by year, for my example above:

2259 58 225958 n/a
2259 57 225957 1
2259 27 225927 n/a
2259 24 225924 n/a

It would be

2259 1955 22591955 n/a
2259 1956 22591956 1
2259 1986 22591986 n/a
2259 1989 22591989 n/a

so the code is subtracting the bottom 2 instead of the top 2. looking at the code I couldn't quite determine how to change it to make it go the way of my data ( I could just take starting year and subtract the age field, but I am dealing with millions of records, and any processing time I can save it needed).

which leads me into point 2. The process takes quite a bit of time to run. If I just subtracted all ages from the previous age (when there are 2 ids or more) would that save a lot of processing time? I am testing on 70,000 records (about 1/100th of the file I will be working with), and this process it taking 20 minute plus (in an un indexed file database). While I will be indexing the main file, most processes (that usually take 20 to 30 minutes on the large file) are happening within a few seconds.

Thanks again!!

P.S. Also could you explain the where clauses for the cursors a bit:

"id = " + str(id)

and

"age = " + str(list2[-2]) + " AND id = " + str(id)

as to what they are doing. I want to fully understand the script. I appreciate it!!

Clinton

JakeSkinner · ‎10-25-2013

Just so I understand you correctly. In your example:

ID AGE NEWID DIFFERENCE
2259 1955 22591955 n/a
2259 1956 22591956 1
2259 1986 22591986 n/a
2259 1989 22591989 n/a

The 'Difference' value is not being calculated between 1986 and 1989 because the age from 1955 and 1956 is larger when compared to todays date. Is that correct?

ClintonCooper1 · ‎10-25-2013

Just so I understand you correctly. In your example:

ID AGE NEWID DIFFERENCE
2259 1955 22591955 n/a
2259 1956 22591956 1
2259 1986 22591986 n/a
2259 1989 22591989 n/a

The 'Difference' value is not being calculated between 1986 and 1989 because the age from 1955 and 1956 is larger when compared to todays date. Is that correct?

Yep that is right

JakeSkinner · ‎10-25-2013

Here is the updated code that should work:

import arcpy
from arcpy import env
env.workspace = r"C:\temp\python\test.gdb"

table = "data"

# create empyt list to append ID field
list = []

# iterate through table and append ID to list
with arcpy.da.SearchCursor(table, ["id"]) as cursor:
    for row in cursor:
        list.append(row[0])

del row, cursor

#remove duplicate IDs
list = dict.fromkeys(list)
list = list.keys()

#iterate through list
for id in list:
    list2 = []
    with arcpy.da.SearchCursor(table, ["id", "age", "difference"], "id = " + str(id)) as cursor:
        for row in cursor:
            #append age to new list
            list2.append(row[1])
    #sort list
    list2.sort()
    try:
        #calculate the difference
        difference = list2[1] - list2[0]
    except:
        difference = ''
    del row, cursor
    try:
        #update table with difference value
        with arcpy.da.UpdateCursor(table, ["id", "age", "difference"], "age = " + str(list2[0]) + " AND id = " + str(id)) as cursor:
            for row in cursor:
                row[2] = difference
                cursor.updateRow(row)
    except:
        pass

del row, cursor

Here is an explanation of the where clauses.

1. "id = " + str(id)

"id" is the field name and 'str(id)' is the variable for the id value in the list. The line 'for id in list' is setting this variable.

2. "age = " + str(list2[0]) + " AND id = " + str(id))

This query is selecting the age field that is equal to the first value in the second list. In this case, the first value is the largest year from today's data, (i.e. 1955).

If you have the year 1955 in two different 'ids', the AND is ensuring that the update will update the correct row.

ClintonCooper1 · ‎10-25-2013

JSkinn,

Thank you so much!

I still have that one last question about performance. Would just subtracting the second from the first, then the third from the second, so on and so forth make a difference in performance by taking out the first list function from code? I rebuilt the code taking out the first list, and I am running into an error:

Runtime error 
Traceback (most recent call last):
  File "<string>", line 9, in <module>
RuntimeError: An invalid SQL statement was used. [SELECT id, age, OBJECTID FROM WAYNE WHERE id = <built-in function id>]

Using this code:


import arcpy
from arcpy import env
env.workspace = r"C:\temp\python\test.gdb"

table = "table"

list2 = []
with arcpy.da.SearchCursor(table, ["id", "age"], "id = " + str(id)) as cursor:
   for row in cursor:
         #append age to new list
         list2.append(row[1])
#sort list
list2.sort()
try:
    #calculate the difference
    difference = list2[-1] - list2[-2]
except:
    pass
del row, cursor
try:
    #update table with difference value
    with arcpy.da.UpdateCursor(table, ["id", "age", "difference"], "age = " + str(list2[-2]) + " AND id = " + str(id)) as cursor:
          for row in cursor:
             row[2] = difference
             cursor.updateRow(row)
except:
    pass
del row, cursor

P.S. I am using the first set of code as my template.

JakeSkinner · ‎10-25-2013

I don't believe that will improve performance. Is there a reason you are not indexing the sample feature class?