I have a script that will be running three times a day (7am, 3pm and 5pm) but I only want the script to run on those features that have been added since the last time the script was run.The attribute table the script will be going through is quite large (250+ fields with 1000+ existing features). The number of new features between each run will probably not exceed 25. I've thought of saving a snapshot of the table between runs but due to the size of it, I don't think I want to go this route. Is there a way to somehow use timestamps or unique ID values (since they will continuously increment)? If so, which would be the better solution and how would I go about starting that.
We have done something similar with a couple of our tables and processes and decided to go with a count table that holds the data (counts of features) we use to determine if a process should run or not.
Code wise, does the table track editing? created date/ last modified etc? You could calculate the time the script runs minus the time difference between runs to select the features that are new since that time.
@Anonymous User's suggestion of using editor tracking fields is probably the route I would go as well.
However, there may be alternatives. What exactly is this script doing? Could you look at the attribute table and easily distinguish between features that need updating and those that already are? And is there any risk to data integrity if the script re-ran against a feature that was previously updated?
I think the simplest way to do this would be using a boolean field, call it is_updated, which would default to 0. In additional to whatever else your script is doing, it could update the value of this field to 1. In successive runs, this gives you a clear, simple boolean filter ("is_updated = 0") to ensure that you're only running the script against new features.
It's possible, too, that one of your existing fields can fit this purpose already. Perhaps there is a field that prior to running the script is null, but which always has a value upon being updated; you could filter your features for "some_field is null".
I use search cursor to populate a list of the GlobalID's in both datasets (since already have this as unique ID) . Then, iterate through the list, and if the GlobalID is in the working datset, but not in the destination, I append that to a separate list (AddList) with a where clause that only loads the features in the AddList:
expression="GlobalID IN ('" + '\',\''.join(AddList) + "')"
This will then only append the features that are not in the destibation FC.
R_