Random Selection of Preexisting Points w/ Definition Query

1498
6
Jump to solution
11-28-2018 02:10 PM
RobertCelestin1
New Contributor

Hi everyone,

I want to use definition queries with this code, taken from this ESRI page, in order to randomly select a subset of preexisting points by count:

import arcpy
def SelectRandomByCount (layer, count):
 import random
 layerCount = int (arcpy.GetCount_management (layer).getOutput (0))
 if layerCount < count:
 print "input count is greater than layer count"
 return
 oids = [oid for oid, in arcpy.da.SearchCursor (layer, "OID@")]
 oidFldName = arcpy.Describe (layer).OIDFieldName
 delimOidFld = arcpy.AddFieldDelimiters (layer, oidFldName)
 randOids = random.sample (oids, count)
 oidsStr = ", ".join (map (str, randOids))
 sql = "{0} IN ({1})".format (delimOidFld, oidsStr)
 arcpy.SelectLayerByAttribute_management (layer, "", sql)
__________________________________________________________________
SelectRandomByPercent ("layer", num)

The code works with a shapefile, but does not work for me with a personal geodatabase.
My desired scenario with using a definition query is as follows:

I have 27,000 points in one layer and 511 polygons in another. The points are draw over the polygons
and are only contained within the boundaries of the polygons.

Phase One:
1. I want to first filter out a subset of points, using a simple definition query (e.g. "ID = #") that
is set via the layer properties
2. I want to randomly select a max number of points from the layer that has the definition query set,
using the script above
3. Once that selection is made, I will assign the selected points a unique name in a field i've created.

Phase Two:
1. I want to set a definition query and repeat Phase One - Step 1. with the addition of excluding
the points assigned a name.

Can anyone help with this? I've attached a representation of the points and polygons. Thank you

Best,
Robert

0 Kudos
1 Solution

Accepted Solutions
PriscillaCole
New Contributor II

Hi Robert, 

I tested your code on a personal geodatabase and I found where it's getting stuck. The arcpy field delimiters function isn't returning the corrector delimiter. Personal geodatabases need square brackets wrapping their fields for SQL queries, but instead it's wrapping with double quotes. I added in a test to detect if the datasource is coming from a .mdb. 

NOTE - I plan to report to ESRI about the field delimiter bug. This arcpy function shouldn't be behaving in this way.

import arcpy, random

def SelectRandomByCount (layer, count):
     layerCount = int (arcpy.GetCount_management (layer).getOutput (0))
     if layerCount < count:
          print "input count is greater than layer count"
     oids = [oid for oid, in arcpy.da.SearchCursor (layer, "OID@")]
     desc = arcpy.Describe(layer)
     if ".mdb" in desc.path: 
          name = "[" + desc.OIDFieldName + "]"
     else: 
          name = arcpy.AddFieldDelimiters (layer, desc.OIDFieldName)
     randOids = random.sample (oids, count)
     oidsStr = ", ".join (map (str, randOids))
     sql = "{0} IN ({1})".format (name, oidsStr)
     arcpy.SelectLayerByAttribute_management (layer, "", sql)

View solution in original post

6 Replies
JoshuaBixby
MVP Esteemed Contributor

You mention definition query several times, but you are posting Python code.  Since definition queries are written in SQL, are you using the Python code to return a list of OIDs to base the definition query on?  Reading over Phase One, you talk about setting a definition query in Step 2 but then making a selection in Step 3.

Instead of describing the process you think you need to follow, can you summarize what you want the outcome to be?  My impression is that you want to randomly group features in a feature class and assign each group a new label in a new field, is that what you are trying to do?

0 Kudos
RobertCelestin1
New Contributor

Hi, thanks for replying.

Sorry for the confusion. I am not using the python code to return an OID to use in a definition query. The definition query is set before the python code is used, in order to first confine the set of points I am working with. The first image that I attached to my post is an example of that confined set of points. The second image is an example of what results from using the python code on the points.

You are correct in your impression of what i'd like the outcome to be. However, I do not want to select/assign all points to groups at the same time. To properly reiterate my desired outcome: I would like to randomly select some points (no greater than a max number that I define) from a specific set of points (a subset of the 27,000). Then assign that selection a label in a new field. After this, I need to repeat the process of random selection on the same initial specific set of points, but exclude those points that have already been assigned a label.

Whenever I go through the process of random selection, I only wanted to work with a subset of points from the layer of 27,000. This is why I thought to use a definition query. 

0 Kudos
RandyBurton
MVP Alum
The code works with a shapefile, but does not work for me with a personal geodatabase.

Can you explain "does not work" a bit more.  Are you getting an error message; if so, what?  Are you working in the Python window in ArcMap, or as a stand-alone script?

The code appears to work with a file geodatabase.  I was successful in selecting a random group of points in a feature layer.  If I understand what you want to do, it seems that line 14 in your code could be an update cursor that uses the "sql" where clause to update a field.

0 Kudos
PriscillaCole
New Contributor II

Hi Robert, 

I tested your code on a personal geodatabase and I found where it's getting stuck. The arcpy field delimiters function isn't returning the corrector delimiter. Personal geodatabases need square brackets wrapping their fields for SQL queries, but instead it's wrapping with double quotes. I added in a test to detect if the datasource is coming from a .mdb. 

NOTE - I plan to report to ESRI about the field delimiter bug. This arcpy function shouldn't be behaving in this way.

import arcpy, random

def SelectRandomByCount (layer, count):
     layerCount = int (arcpy.GetCount_management (layer).getOutput (0))
     if layerCount < count:
          print "input count is greater than layer count"
     oids = [oid for oid, in arcpy.da.SearchCursor (layer, "OID@")]
     desc = arcpy.Describe(layer)
     if ".mdb" in desc.path: 
          name = "[" + desc.OIDFieldName + "]"
     else: 
          name = arcpy.AddFieldDelimiters (layer, desc.OIDFieldName)
     randOids = random.sample (oids, count)
     oidsStr = ", ".join (map (str, randOids))
     sql = "{0} IN ({1})".format (name, oidsStr)
     arcpy.SelectLayerByAttribute_management (layer, "", sql)
RobertCelestin1
New Contributor

Hi Priscilla,

Thank you for the modification, it worked quite well including setting a def query before executing. And thanks for contacting ESRI about this.

Best,

Robert

0 Kudos
SteveLynch
Esri Regular Contributor

if you have a Geostatistical Analyst license you could use the SubsetFeatures geoprocessing tool.