I have a parameter in a python toolbox which allows a user to select a field from a dataset given in a previous parameter.
# Feature Class to absorb geometry from
param2 = arcpy.Parameter(
displayName="Geometry Feature Class",
name="in_geoFC",
datatype="GPFeatureLayer",
parameterType="Required",
direction="Input")
# Table ID field
param3 = arcpy.Parameter(
displayName="Table Geometry ID Field",
name="table_geoIDField",
datatype="GPString",
parameterType="Required",
direction="Input",
enabled=False)
param3.filter.type = "ValueList"
param3.filter.list = []
The updateParameters section contains logic to update Parameter 3 with the field names for all of the fields in the dataset provided in parameter 2 which have a datatype of string:
def updateParameters(self, parameters):
"""Modify the values and properties of parameters before internal
validation is performed. This method is called whenever a parameter
has been changed."""
# if 'in_geoFC' is populated with a value
if parameters[2].value:
# if 'in_geoFC' does not have an error set
if not parameters[2].hasError():
# Create a list of all of the fields in the 'in_geoFC'
# which have a datatype of 'String'
fc_geoIDFields = [field.name for field in arcpy.Describe(
parameters[2].valueAsText).fields
if field.type == 'String']
# Enable the parameter
parameters[3].enabled = True
# Populate the parameter with the list of text fields in the
# table
parameters[3].filter.list = fc_geoIDFields
This all works just fine...
There's one more thing I need to do though. Whichever field the user selects for parameter 3, I need to ensure that the values contained in this field are unique - every record must have a unique value.
I know of a pretty easy and elegant way to get the number of unique values in that field:
len(set(r[0] for r in arpy.da.SearchCursor(parameters[1].valueAsText
, parameters[2].valueAsText)))
What I don't know is the quickest and most elegant way to get the total number of features in that dataset so that I can compare it to the number of unique values in that field and thus, determine if all of the values in that field are unique.
Keep in mind that this would be occurring in the updateMessages function, so it needs to be a fairly quick process.
Any thoughts on how to get a record count super fast?
Solved! Go to Solution.
Have you done any benchmarking? Get Count easily beats other methods for getting the total records in a data set, especially for larger data set. If you are already running a cursor against the data set, then using Get Count might be unnecessary, but any extra time added to the script would be from an necessary call and not a slow function.
In terms of balancing simplicity and performance, I am a big fan of Counter, as I stated above. It is trivially slower than proposed set operations, and it gives you much richer information in case that might have value at some point.
import collections
def unique_check_cnt(iterable):
counter = collections.Counter(iterable)
return True if counter.most_common(1)[0][1] == 1 else False
fc = # path to feature class
fld = # field name to check for uniqueness
with arcpy.da.SearchCursor(fc, fld) as cur:
print unique_check_cnt(i for i, in cur)
If the data sets are large and there is a reasonable chance there will be duplicates, then it might pay off to implement a method that can stop as soon as a duplicate is found.
import collections
def unique_check_defdict(iterable):
d = collections.defaultdict(int)
for i in iterable:
if d[i] > 0:
return False
d[i] += 1
else:
return True
fc = # path to feature class
fld = # field name to check for uniqueness
with arcpy.da.SearchCursor(fc, fld) as cur:
print unique_check_defdict(i for i, in cur)
Personally, I like Counter.
some may offer numpy and pandas solutions, but the initial load time
%timeit len(set(random.randint(0,10) for i in range(1000000)))
1.5 s ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
import numpy as np
%timeit len(np.unique(np.random.randint(0, 10, size=1000000, dtype='l')))
43.4 ms ± 281 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# ---- is a coffee sip worth the difference? ----
numpy or pandas is likely going to be faster. I always forget about Pandas being available now. I wouldn't hold the import times against them with a Python toolbox. You pay those costs upfront at toolbox initialization unless you're importing inside of your tool classes.
_as_narray ... I have used it recently in some of my blogs.
import numpy as np
a = arcpy.da.SearchCursor(in_fc, 'OID@', explode_to_points=True)._as_narray()
# in_fc is a featureclass, explode to points not needed, but I wanted a bigger file
len(a) == len(np.unique(a))
True
Now if you are playing Code Golf, you can put that all in two lines
import numpy as np
nparr = arcpy.da.FeatureClassToNumPyArray(TheInputFeatureClass, ['TheFieldToEvaluate'])
uniqueValueCount = len(np.unique(nparr))
print uniqueValueCount
the timing for both conversion to numpy array has a bit of overhead but insignificant, for about 25,000 unique points, but still within a coffee sip and an import
%timeit len(np.unique(arcpy.da.FeatureClassToNumPyArray(in_fc, 'OID@')))
107 ms ± 2.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Yep. Great point considering the OP needs this to perform during an onLoad event.
Look at you rockstars! Lightning quick to respond.
I actually came up with this a few minutes later. Benchmarked it against 465,051 records. I'd like to get it down to 3 seconds or less
def calculateRunTime(function, *args):
startTime = time.time()
result = function(*args)
return time.time() - startTime, result
def isUniqueValueField(dataset, field):
idList = []
with arcpy.da.SearchCursor(dataset, field) as cursor:
for row in cursor:
idList.append(row[0])
if len(idList) != len(set(idList)):
return False
else:
return True
>>> calculateRunTime(isUniqueValueField
, parameters[2].valueAsText
, parameters[3].valueAsText)
(8.847000122070312, True)
Considering Dan's valid point about numpy setup time, what about simply setting a "set" to the SearchCursor?
print len(set(arcpy.da.SearchCursor(TheInputFeatureClass, 'TheFieldToEvaluate')))