I'd like to select randomly based on a case field.
I have a telemetry FC w/ more than 20k animal locations, w/ date/time field (yyyy-mm-dd hh:mm:ss).
I'd like to select points randomly, based on Date/Time field: ONE POINT RANDOMLY PER DAY.
If possible, the selected points must have more than 12 hours difference.
Can anyone help me with a code sample, please?
how many observations per day? one per hour?, per minute? per second? A batch query to sequentially select the records by day (Julian day if needed and I presume these are in sorted tabular form). Assign a random sequence to the day's records, then chose one record at random. so a Select by attributes, to get a list of records, assign a random sequence to those records and select one at random. ... move on to the next day. Which begs the question, you are going to all this trouble of randomizing the time in the day selection, why not go to a completely randomized approach or are you going to deal with separating out the stratified nature/issues as well?
As I sad, there are more than 20k records, coleted in 20 minutes step along almost one year. In order to minimize spatial dependence, the ideia is to select randomly 1 point per day. The amount of observations per day varies because GPS failed some times. We'd like one record per day randomly separately by 12h (minimum) to avoid the last record of a day and the first record of next day. Randomized all points approach will maybe compromise season sample to estimate home range. Sorry for my English, I hope you unterstand me.
Just a rough idea of how you might do it, but it doesn't check separation by 12h.... Edit: I've added some code that should give you the 12h difference.
import time from datetime import datetime from random import randint startDay = int(time.mktime(datetime.strptime('2010-01-01','%Y-%m-%d').timetuple())) stopDay = int(time.mktime(datetime.strptime('2010-01-31','%Y-%m-%d').timetuple())) randomTime = 0 for day in range(startDay, stopDay, 86400): if randomTime > 43200: randomTime = 43200 - (86400 - randomTime) else: randomTime = 0 randomTime = randint(randomTime, 86399) dateString = datetime.fromtimestamp(day+randomTime).strftime('%Y-%m-%d %H:%M:%S') if randomTime < 43200: # morning print "SELECT TOP 1 column FROM table WHERE dateTimeField > '" + dateString + "' ORDER BY dateTimeField ASC" else: #afternoon print "SELECT TOP 1 column FROM table WHERE dateTimeField < '" + dateString + "' ORDER BY dateTimeField DESC"
Another option might be to assume there are 72 records added per day. Then select at random one of those records. You could then use a "SELECT ... IN ( ...)" query. Edit: I've updated this to insure that records are separated by at least 36 (approximately 1/2 day's worth). It may not insure that one record per day is selected, but the average record is 1 day apart and should have a difference of at least 12h.
from random import randint # assume 72 records added per day, aproximately 1 every 20 minutes # 365 days * 72 records would total 26280 records # record ID should be apart by at least 36 idList =  randomNum = 0 for x in range(1, 26280, 72): if randomNum > 36: randomNum = 36 - (72 - randomNum) else: randomNum = 0 randomNum = randint(randomNum, 72) idList.append(randomNum + x) # print randomNum + x print "SELECT * FROM table WHERE ObjectID IN " + str(tuple(idList))
There may be a single-pass solution, but my first thoughts went to a two-pass solution. The approach below relies on creating a new column (sel_field) that will be used to identify selected records using a value of 1. The first pass over the data set builds a dictionary containing selected records, the second pass updates the new field to reflect the selection.
from collections import defaultdict from datetime import datetime from itertools import groupby from random import sample, shuffle fc = # full path to feature class dt_field = # datetime field name sel_field = # selection field name time_gap = # minimum time gap between samples (seconds) selected = defaultdict(int) with arcpy.da.SearchCursor(fc, ["OID@", dt_field], sql_clause=(None, "ORDER BY " + dt_field)) as cur: date_groups = groupby(cur, lambda x: datetime.date(x)) _, record_group = next(date_groups) (oid, dt), = sample(list(record_group), 1) selected[oid] = 1 prev_dt = dt for _, record_group in date_groups: records = [(oid, dt) for oid, dt in record_group if (dt - prev_dt).total_seconds() > time_gap] if records: shuffle(records) oid, dt = next(iter(records)) selected[oid] = 1 prev_dt = dt with arcpy.da.UpdateCursor(fc, ["OID@", sel_field]) as cur: for oid, _ in cur: cur.updateRow([oid, selected[oid]])