Python Random.Sample does not seem that random

1507
4
02-07-2012 06:29 PM
DavidBirkigt
New Contributor III
Hi, Please see the attached GIF image

I am trying to make a little tool to make random selections of features from feature classes by randomly selecting FIDs. What it does is:
1.Creates a python list of feature FIDs
2. Using the random.sample I draw a number of samples from my python list of FIDs, the number drawn is equivalent to what the user desires
3. This is then converted to a sql statement allowing a selection to be made and stored in a new feature class

I don't have trouble with my code, but the output does not seem to be that random, as the gif below shows

I have tried random.shuffle and other variations but seem to get a similar output.
Does anyone know anything about getting a better result, different method perhaps?
Right now I am working on some test data and my population is about 300 with a sample of 30.

Here is the part of my code that lists the FIDs and makes a selection
rows = arcpy.SearchCursor(InputToSample)
for row in rows:
    fidVal = row.getValue(FieldName)
    fidList.append(fidVal)
#make a random selection
rndList = random.sample(fidList, numSample)




Thanks
David


[ATTACH=CONFIG]11785[/ATTACH]
Tags (2)
0 Kudos
4 Replies
DavidBirkigt
New Contributor III
Alright,

I have found out why my random sampling method does not work. It is because random.sample considers the order. Ie if you have a population 1-10 and select 3 elements ex numbers 456 are drawn, this selection would be considered distinct from 654 as they were selected in a different order. I will post some correct code when I have a better sampling method.

David
0 Kudos
ThomMackey
New Contributor III
Somewhat off-topic, but I thought I might mention: I wrote a very similar tool (extract random sample of features) a little while ago, and found that it was breaking when the sample size was >~10,000. I was constructing the SQL statement string to pass to the Select tool something along the lines of

" OR ".join(["'{0}' = {1}".format(oid_fname,x) for x in random_fids])


And this string of OR statements was falling over. I had to change it to use the IN statement for it to work, i.e.

"'{0}' IN ({1})".format(oid_fname,",".join(random_fids))


(Note I haven't double-checked that syntax, but hopefully you get the idea).

In other words: chaining multiple OR statements made the arcpy.Select() tool fail, using the IN SQL statement worked on >50,000 records.

Just to save you some pain 🙂
0 Kudos
ChrisSnyder
Regular Contributor III
I could add two sort of related things:

1. If you have parallel processes, each using the random function, you can mix things up (scramble the "states" so that they are all out of sequence which is a good thing) by using the .jumpahead() method. For example:

timeInt = int(str(int(time.time() * 10000))[-5:])
random.jumpahead(timeInt)


2. When using FGDBs, you can send it absurdly long SQL strings. Such as "OBJECTID in (1,2,3,4,5,.....)" I sent one that was > million characters long and it actually worked! Curious what the character limit is... There must be one, right? I know that I have been frustrated by Oracle SDE having a SQL statement limit of ~1400 characters, which is pretty lame.
0 Kudos
LoganPugh
Occasional Contributor III
Another thing to note is that the Oracle "IN" operator is limited to 1000 elements by default; thus you would need to break up your IN statements by OR operators and keep each one to 1000 elements.
0 Kudos