|
POST
|
A completely different approach to relying on application logic to select random records would be to rely on the database management system to select random records. If someone is using an enterprise DBMS (Oracle, SQL Server, PostgreSQL, etc..), the database will support some way of returning randomly selected records. def main():
import arcpy
import os
fc_in = arcpy.GetParameterAsText(0) # input featureclass
fl_out = arcpy.GetParameterAsText(1) # output layerfile
cnt = arcpy.GetParameter(2) # number of features to select
dbms = # use 'oracle' or 'sqlserver'
oid_fld = arcpy.Describe(fc_in).OIDFieldName
oracle_where_clause = (
"{} IN (SELECT {} FROM "
"(SELECT {} FROM {} ORDER BY dbms_random.value) "
"WHERE rownum <= {})".format(
oid_fld, oid_fld, oid_fld, os.path.split(fc_in)[1], cnt
)
)
sqlserver_where_clause = (
"{} IN (SELECT TOP {} {} FROM {} ORDER BY NEWID())".format(
oid_fld, cnt, oid_fld, os.path.split(fc_in)[1]
)
)
if dbms == 'oracle':
where_clause = oracle_where_clause
elif dbms == 'sqlserver':
where_clause = sqlserver_where_clause
arcpy.MakeFeatureLayer_management(fc_in, "tmpLayer")
arcpy.SelectLayerByAttribute_management("tmpLayer", "NEW_SELECTION", where_clause)
arcpy.MakeFeatureLayer_management("tmpLayer", "selection")
arcpy.SaveToLayerFile_management("selection", fl_out)
if __name__ == '__main__':
main() In many ways, this would not be a very good "general" approach for several reasons. One, relying on the DBMS makes the code more involved or less portable because every DBMS seems to have a different approach to selecting random records. Second, passing SQL through ArcGIS tools always seems to have a sketchiness about it. It does work, but I have definitely run into issues as well. Looking at the code above, one might wonder why lines 29 and 30 exist, i.e., why not just pass the SQL directly to the MakeFeatureLayer tool. When I first ginned up this code, I tried doing just that, but I found an interesting/odd behavior. The SQL to select random records actually became embedded in the definition of the feature layer so every time ArcMap was refreshed, the records kept changing. ArcMap didn't like that, there would be issues with displaying polygons at times. It would be neat, though, with attribute-only data to load a dynamically, randomly changing table into ArcMap for testing at times.
... View more
02-15-2015
01:33 PM
|
1
|
0
|
2482
|
|
POST
|
If we are already importing random, then we can rely on random.sample to do the heavy lifting for us, assuming we have already went to the effort of building an OID list and want sampling without replacement. def main():
import arcpy
from random import sample
fc_in = arcpy.GetParameterAsText(0) # input featureclass
fl_out = arcpy.GetParameterAsText(1) # output layerfile
cnt = arcpy.GetParameter(2) # number of features to select
fld_oid = arcpy.Describe(fc_in).OIDFieldname
lst_oids = [oid for oid, in arcpy.da.SearchCursor(fc_in, (fld_oid))]
oids = ", ".join(map(str, sample(lst_oids, cnt)))
where = "{0} IN ({1})".format(arcpy.AddFieldDelimiters(fc_in, fld_oid), oids)
arcpy.MakeFeatureLayer_management(fc_in, "selection", where)
arcpy.SaveToLayerFile_management("selection", fl_out)
if __name__ == '__main__':
main() If building an OID list in-memory becomes an issue, one could resort to using a reservoir sampling approach. def stream_sample(iterator, k):
from random import randint
result = [next(iterator) for _ in range(k)]
n = k
for item in iterator:
n += 1
s = randint(0, n)
if s < k:
result = item
return result
def main():
import arcpy
fc_in = arcpy.GetParameterAsText(0) # input featureclass
fl_out = arcpy.GetParameterAsText(1) # output layerfile
cnt = arcpy.GetParameter(2) # number of features to select
fld_oid = arcpy.Describe(fc_in).OIDFieldname
sample_oids = [oid for oid, in stream_sample(arcpy.da.SearchCursor(fc_in, "OID@"), cnt)]
oids = ", ".join(map(str, sample_oids))
where = "{0} IN ({1})".format(arcpy.AddFieldDelimiters(fc_in, fld_oid), oids)
arcpy.MakeFeatureLayer_management(fc_in, "selection", where)
arcpy.SaveToLayerFile_management("selection", fl_out)
if __name__ == '__main__':
main() A plus of using reservoir sampling is that the memory footprint can be quite modest to trivial when working with very large datasets. A minus of using reservoir sampling is that calling random so many times can add noticeable overhead; that said, I can still sample from a million records in a couple seconds. The stream_sample function is taken from JesseBuesking on the StackExchange thread: pick N items at random. The code from JesseBuesking is basically just implementing Don Knuth's algorithm for picking random elements from a set whose cardinality is unknown.
... View more
02-15-2015
01:13 PM
|
1
|
0
|
2482
|
|
POST
|
OK, now we are getting down to business. Hopefully Vince Angelo can find some time to chime in, he always has good information to share on these types of question. I will have to take some time to think them over.
... View more
02-12-2015
10:39 AM
|
0
|
0
|
2588
|
|
POST
|
You can see database views in ArcGIS Desktop just by connecting to a database, what functionality are you hoping to gain by registering a database view? Is it not functionality but performance related?
... View more
02-12-2015
10:06 AM
|
0
|
7
|
3474
|
|
POST
|
Providing a bit more information would be helpful. You mention ArcSDE 10.1, have you applied any Service Packs or patches? What edition of ArcSDE (Personal, Workgroup, Enterprise)? What version and edition of SQL Server are you using? What version of MS Access are you using? What driver(s) and version(s) have you tried?
... View more
02-12-2015
07:21 AM
|
0
|
0
|
1159
|
|
POST
|
There are a couple of things going on here. First, arcpy.GetInstallInfo has no parameters. It will accept an argument and not throw an error, but any argument that is passed doesn't affect the results of what is returned. The arcpy.GetInstallInfo function gets the installation information that relates to the currently loaded ArcPy site package, which in your case is ArcGIS Engine. If you install ArcGIS Desktop and ArcGIS Engine on the same machine using standard installation instructions, the two will share a single Python interpreter, usually C:\Python27\ArcGIS10.x (10.2 in this case). Another way to look at it is that a single Python interpreter has 2 ArcPy site packages registered/installed. Since the site packages have the same name (arcpy), they both can't be loaded into the interpreter at the same time. When the interpreter encounters an import arcpy statement, it will find and import whichever site package is found first in the search path for modules, i.e., sys.path. Before importing ArcPy, you can quickly determine which ArcPy site package will be loaded by running: import imp
imp.find_module('arcpy') In this case, the result will come back with ...\Engine10.2\... since it comes first in the sys.path. Reversing the order of sys.path before importing ArcPy will likely find and import the Desktop site package for your situation. import imp
import sys
sys.path.reverse()
imp.find_module('arcpy') The issue with reversing sys.path out of hand is that it will import the Engine-based site package if ArcGIS Engine was installed first. There are a couple of more thoughtful ways to get around this problem. First, setting the PYTHONPATH Windows environment variable to the Desktop-based site package will ensure that it is loaded first regardless of the sys.path order. However, in this case, it seems you don't already know that location and am trying to figure it out. A second approach would be to remove all of the Engine-related entries in sys.path before import arcpy. import sys
for p in sys.path[:]:
if 'Engine' in p:
sys.path.remove(p) If the goal is to determine whether ArcGIS Desktop is installed and where, maybe querying a WMI service for installed applications and information is a better approach. Adapted from the Microsoft Script Center List Installed Software Python script: import win32com.client
objWMIService = win32com.client.Dispatch("WbemScripting.SWbemLocator")
objSWbemServices = objWMIService.ConnectServer(".", "root\cimv2")
colItems = objSWbemServices.ExecQuery("Select * from Win32_Product "
"where Name Like 'ArcGIS % for Desktop'")
for objItem in colItems:
print "Name: ", objItem.Name
print "Install Date: ", objItem.InstallDate
print "Install Location: ", objItem.InstallLocation
print ""
... View more
02-12-2015
06:43 AM
|
3
|
0
|
1349
|
|
POST
|
With CalculateField_management, try removing "PYTHON_9.3". You aren't actually passing an expression, just a value.
... View more
02-10-2015
03:09 PM
|
0
|
0
|
1849
|
|
POST
|
Review the documentation on ArcPy Data Access cursors. The arcpy.da update cursor doesn't have a setValue method, that was for the older-style update cursor. You are effectively mixing up the syntax for the two types of cursors.
... View more
02-10-2015
03:00 PM
|
0
|
1
|
1849
|
|
POST
|
I am not understanding your predicament. Could you post a screenshot or some specific examples of what you would like to see and what you are actually seeing? Regarding your code, you don't have to loop over the dataframes if you aren't going to pass them to ListLayers. Calling ListLayers without a dataframe object will list the layers in all of the dataframes.
... View more
02-10-2015
01:35 PM
|
0
|
0
|
1223
|
|
POST
|
You can't pass ListLayers only a dataframe object, it will fail. At a minimum, and map document or layer needs to be passed. That said, the code still has an issue.
... View more
02-10-2015
01:28 PM
|
1
|
1
|
1223
|
|
POST
|
Have you tried putting in a sleep call to pause the script in between calling serviceStartStop()? I wonder if the server sometimes gets bogged down and file locks aren't released right away. Have you tried re-starting or re-stopping the service that just failed, does it work the second time right after it failed the first time?
... View more
02-09-2015
12:59 PM
|
0
|
0
|
1751
|
|
POST
|
Let's jump over to the other thread to continue the discussion.
... View more
02-09-2015
12:48 PM
|
0
|
0
|
1799
|
|
POST
|
Taking your question at face value, i.e., does SelectLayerByAttribute or similar tools have internal checking for SQL injection, I think the answer is pretty clearly no. I just did a quick check using SelectLayerByAttribute, and I was able to drop a table in SQL Server by injecting extra SQL into the where_clause of the tool. Since MS Access doesn't support multiple SQL statements, it didn't work on personal geodatabases. It also didn't work on file geodatabases, which I am guessing is for the same reason. Of course, I could use less invasive SQL injection with all three to return more records than intended. Although I didn't check all DBMSes and all forms of SQL injection, the fact that I could successfully use some SQL injection with some DBMSes gives a strong indication the tools themselves are not doing any internal checks for SQL injection. I think Esri would say these tools are simply passing SQL along, and that hardening against SQL injection should be taking place elsewhere. Not only would programming internal checks get complicated and quickly, it would likely involve putting big constraints on how SQL is used with those tools. Always a trade off. The tools you reference might not be hardened against SQL injection, but that doesn't mean the floodgates are open. There are still multiple layers in the application stack between these tools and the interface of ArcGIS Server that users will be interacting with. One thing Esri introduced, I can't remember when exactly, is standardized queries for ArcGIS Server. In terms of publishing GP tools, there may be extra precautions in place, I don't know. I am a firm believer in seeing is believing, especially with ArcGIS. Regardless of what the documentation does or doesn't say, I say test it and see for yourself.
... View more
02-09-2015
12:45 PM
|
1
|
0
|
624
|
|
POST
|
In this case, there is no url to close. The url variable in the script is just a string. The script never directly interacts with the file-like object returned by urllib2.urlopen. Instead, the json.load method calls urllib2.open and iterates through the returned file-like object. Once json.load returns, everything it created goes out of scope, including objects returned by urllib2.urlopen.
... View more
02-09-2015
12:09 PM
|
1
|
2
|
1799
|
| Title | Kudos | Posted |
|---|---|---|
| 1 | yesterday | |
| 1 | 2 weeks ago | |
| 1 | 3 weeks ago | |
| 1 | 12-19-2025 06:05 AM | |
| 1 | 12-02-2025 07:31 AM |
| Online Status |
Online
|
| Date Last Visited |
an hour ago
|