Create Table of Random Numbers

1202
6
05-20-2014 06:56 AM
Corey_C_Denninger
New Contributor
I have a feature class table of OBJECTIDs that I would like to use to create a table of random numbers.  This output table of random numbers will be used to Relate or Join back to the original feature class in order to select those records to QA/QC.  I will be doing this repeatedly and would like this to be ArcGIS script tool.  I need to find the minimum, maximum, and range of the OBJECTIDs and then select a random sample of only 10% (or other % if desired) of that range from which to generate the random numbers.  I know the Python random.sample(range(min,max),range) will work, but I am having trouble with the coding part.  Any help is greatly appreciated.  My Python skills are currently C+.

#The goal of this script is to take a list of OBJECTID values from a feature class
# and find the Minimum, Maximum, and Range of that list. Using that information
#  then create a list of random numbers between the min and the max that is 
#   equal to 10% of the range.  The output should be a table with a column
#    containing the random 10% values which can then be related or joined to the 
#     source feature class to identify a 10% random sample to analyze further.

#Import the arcpy and Random modules
import arcpy, os, random

#Overwrite if files already exists
arcpy.env.overwriteOutput = True

#Setting up the paramters for the ArcGIS Python Script Tool
inputfc = 'L:\\Mapping-GIS\\PLN\\GIS\\Layers\\Geodatabases\\Parcels_YearBuilt\\SWFWMD_2013_q4.gdb\\Pasco'
outputfc = "L:/Mapping-GIS/PLN/GIS/Projects/PSSA/PSSA_Edits.mdb/parcel_stats"

#Setting up the paramters for the OBJECTID field, or similar, to be used to generate the random numbers list from
statsField = "OBJECTID"

#Calculate Minimum, Maximum, Range and 10% of Range of the fc's OBJECTID field  
#...Used to determine the numbers that will be used to calc a random list from which later
#....can be used to relate to the original OBJECTIDs to select those records which to scrutinuze further in the QA/QC process
arcpy.Statistics_analysis(inputfc, outputfc, [[statsField, "MIN"], [statsField, "MAX"], [statsField, "RANGE"]])
arcpy.AddField_management(outputfc, "PERCENT_10", "DOUBLE")
arcpy.CalculateField_management(outputfc, "PERCENT_10", "[RANGE_OBJECTID] * 0.10")

#Testing the functionality of the random number generator
# Just an example of the numbers, but I need these inputs here, to come as parameters from the outputs of the Statistics table above.
x = random.sample(range(100550,355150),25460)
Tags (2)
0 Kudos
6 Replies
ChrisSnyder
Regular Contributor III
OIDs are not always sequential... Something like this (untested code) should work:

samplePct = 0.1
oidFieldName = arcpy.Describe(myFC).oidFieldName
oidList = [r[0] for r in arcpy.da.SearchCursor(myFC, ["OID@"])] #you could also use a feature layer here instead of a FC
sampleOidList = sorted(random.sample(oidList, int(len(oidList) * samplePct)))
sqlExp = oidFieldName + " in (" + ",".join([str(i) for i in  sampleOidList]) + ")"
0 Kudos
Corey_C_Denninger
New Contributor
Chris - Thank you for the reply and effort.  I tried sometime to make it work but am having difficulty.  What I am hoping for is to take my Summary Statistics output table and access this table in the script and use the fields within as inputs into the random.sample code and output the results into a dbf or gdb table.  The JOIN I can perform with just a simple arcpy.JOIN code later, but I really need assistance with the code below. The Summary Stats table contains one row with 6 fields, including these 4 I want to use further in the script:

  1. minimum object id

  2. maximum object id

  3. range

  4. 10% of the range

I know this won't work exactly as written, but I would think conceptually it would be something like the following (added to the code I've already written in original post).  I appreciate assistance on any of the following, thanks:
#Create parameters for the fields that already exist in the Summary Stats table...
minField = "MIN_OBJECTID"
maxField = "MAX_OBJECTID"
pctRange = "PERCENT_10"

#Use those parameters for input into the random sample generator...
random.sample(range(minField,maxField),pctRange)

#Write the result of the random sample list to a .dbf or gdb table
0 Kudos
SteveLynch
Esri Regular Contributor
If you have the Geostatistical Analyst extension you could use the SubsetFeatures tool.

This will randomly split up your data and make a copy of what percentage of the data you requested (or you can request a number of values). It can also create a second data set which is all that is not in the first one.

This tools works against any type of feature class.

-Steve
0 Kudos
Corey_C_Denninger
New Contributor
Wow, I did not know that Steve!  Thanks for the information.  I will try that out.  It would be nice to have this custom script tool as well.  If anyone out there is feeling particularly giving. 🙂
0 Kudos
SteveLynch
Esri Regular Contributor
Corey

This is not a script tool. It is one of the geoprocessing tools that is shipped with ArcGIS. You do need a Geostatistical Analyst license however.

-Steve
0 Kudos
ChrisSnyder
Regular Contributor III
Not sure if the trouble lies in:

1. Reading the min/max OID values from the input table (use a search cursor)
2. Calculating a 10% sample of the OID values from the range (see code below)
3. Writing the sampled OIDs to an output table (use an insert cursor)

#2 might look like:
minOid = 1234
maxOid = 5678
samplePct = 0.1
rangeList = [i for i in range(minOid, maxOid + 1)]
sampleCount = int(len(rangeList) * samplePct + .5)
randomSampleList = random.sample(rangeList, sampleCount)
0 Kudos