Select to view content in your preferred language

slow performance using cursor to read/write data

4858
10
03-04-2011 01:09 AM
CrispinCooper
Emerging Contributor
Hi

I wonder if anyone here can point me in the right direction?  My script reads data out of a feature class using a cursor (slow), processes it using an external dll (fast) and writes it back to the feature class again (slow).

Is this really the fastest way to get data in/out of arc in python, or is there some way of improving performance?  Or do I need to get an EDN license and use C++ or such like?

Thanks in advance, C.

import arcpy
import ctypes

#get input params
in_polyline_feature_class = arcpy.GetParameterAsText(0)
in_arc_idfield = arcpy.GetParameterAsText(1)
shapefieldname = arcpy.Describe(in_polyline_feature_class).ShapeFieldName

#set up dll
dll = ctypes.windll.LoadLibrary(u'c:\\path\\to\\my\\dll.dll')
dll.get_output.restype = ctypes.c_double
handle = ctypes.c_void_p(dll.init())

# define function for sending feature data to dll
def send_data(arcid,points,elev):
    point_array_x = (ctypes.c_double*len(points))()
    point_array_y = (ctypes.c_double*len(points))()
    for i,(x,y) in enumerate(points):
        point_array_x = x
        point_array_y = y
    dll.send_data(handle,arcid,len(points),point_array_x,point_array_y)

# read feature data with cursor, send to dll
rows = arcpy.SearchCursor(in_polyline_feature_class)
for row in rows:
    # read id and shape
    shape = row.getValue(shapefieldname)
    arcid = row.getValue(in_arc_idfield)
    
    # extract points from shape
    pointlist = []
    for i in range(shape.partCount):
        for point in shape.getPart(i):
            pointlist.append((point.X,point.Y))

    # send data
    send_data(arcid,pointlist,(startelev,endelev))

# process data
dll.process(handle)

# read back output into table
arcpy.AddField_management(in_polyline_feature_class,'my field name','DOUBLE')
rows = arcpy.UpdateCursor(in_polyline_feature_class)
for row in rows:
    row.setValue('my field name',dll.get_output(handle,row.getValue(idfield)))
    rows.updateRow(row) 
del row
del rows
Tags (2)
0 Kudos
10 Replies
GregCorradini
Emerging Contributor
Hey Crispin,
I'm surprised no one already answered this. yes, slowness with python cursors has always been my experience. Using .NET SDK should be faster (at least it always is for me using C#.NET).

Depending on what you're doing and where the data is stored (type of database or shapefile) I'd recommend GDAL OGR Python bindings (http://pypi.python.org/pypi/GDAL/). They read shapefiles and postgresql/postgis dbs among other formats waaaayy faster. Or if just accessing the geometries for point arrays, then try Shapely (http://trac.gispython.org/lab/wiki/Shapely). Or if you're into JVM scripting languages like Jython or Groovy try GeoScript (http://geoscript.org/). None of these talk with PGDBs or FGDBs...yet (though FGDB is now open).

But it's a minor tweak (and easier in most cases) to export your data to run a process, then after it is done upload into ESRI proprietary data structs after. At least that's my experience...

long live open source
0 Kudos
CrispinCooper
Emerging Contributor
Greg,

thanks for your informative reply.  Unfortunately I need to be able to work with PGDBs/FGDBs. 

I tried exporting my data using arctoolbox FeatureClassToShapefile; that took twice as long for the export as my cursor reading routine.  Is there a quicker way?

Thanks,

Crispin
0 Kudos
KimOllivier
Honored Contributor
I see you are using a cursor and looping through each feature to reconstruct a geometry object. At 10.0 you can read a whole featureclass into an array of geometry objects in one step without a cursor. You can also write out an array to a featureclass as well.
That avoids the cursor both ways, but the disadvantage is that the attributes are separated.

You don't say what you are doing with the featureclass in the DLL, I am curious what you have to do that is not already available in the geoprocessing tools.
0 Kudos
JasonScheirer
Esri Alum
You don't need to use the InsertCursor in Python -- you can do it in ArcObjects in your DLL. Please note the C++ code on this help page for opening a cursor from C++.
0 Kudos
CrispinCooper
Emerging Contributor
Thanks again guys...

kimo: do you mean using arcpy.CopyFeatures_management?  I just tried that and it ran slower than using a cursor to read all the geometries.

jscheirer: do I need an EDN license to use arcobjects?
0 Kudos
JasonScheirer
Esri Alum
No, you can use ArcObjects at any license level. All you will require is Visual Studio or Visual Studio Express.
0 Kudos
CrispinCooper
Emerging Contributor
I tried implementing the C++ code, but it's even slower!  38 seconds to read through a large-ish (17,000 line) feature class instead of 20 for the original python cursor.  It's compiled in release configuration not debug.

I think I did this in native C++ not .NET (I hope this is what visual studio 2008 does if you select 'no CLR support', because the project target cannot be changed from '.NET framework'...) - but, just to be sure I also tried with the wrapper method on this page.  Still the same result; 38 seconds.

The inner c++ loop is shown below, for test purposes it retrieves the OID of each line and writes it to a new field (also reads the geometry but does nothing with it):

 esriGeometryType gt;
 for (ipCursor->NextFeature(&ipRow); ipRow != NULL; ipCursor->NextFeature(&ipRow))
 {
    ipRow->get_Shape(&ipShape);

    long oid;
    ipRow->get_OID(&oid);

    VARIANT value;
    value.vt = VT_R8;
    value.dblVal = (double)oid;

    ipRow->put_Value(outfieldIndex, value);

    ipRow->Store();
 }


Is this as good as it gets, or have I done something else wrong?  Thanks again, C.
0 Kudos
DaveVerbyla
Occasional Contributor
kimo;83348 wrote:
I see you are using a cursor and looping through each feature to reconstruct a geometry object. At 10.0 you can read a whole featureclass into an array of geometry objects in one step without a cursor. You can also write out an array to a featureclass as well.

That avoids the cursor both ways, but the disadvantage is that the attributes are separated.

How do you read a feature class into an arcpy.Arrray() without using Cursors?  Doe this work with a polygon feature class??
0 Kudos
MichaelHoward1
New Contributor
I see you are using a cursor and looping through each feature to reconstruct a geometry object. At 10.0 you can read a whole featureclass into an array of geometry objects in one step without a cursor. You can also write out an array to a featureclass as well.


I would also like to know how to do this. I've been given the task of speeding up a script which makes heavy use of cursors inside nested 'while' loops. I'm still reading some of the suggestions from this forum, and this sounds like potentially the cleanest one.
0 Kudos