AddField slow even in in_memory workspace

GrzegorzDabkowski · ‎09-25-2013

I have an arcpy script which is run outside of ArcMap which constructs a feature class in the in-memory workspace. That empty feature class is later used as a template for several feature classes created in a file-based geodatabase. The script runs slow and profiling clearly shows the most time is used by the addFields function.

Fri Sep 20 17:30:50 2013    profile.stats

         31613 function calls (29722 primitive calls) in 120.420 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      165   98.447    0.597   98.462    0.597 C:\Program Files (x86)\ArcGIS\Desktop10.1\arcpy\arcpy\geoprocessing\_base.py:484(<lambda>)

Function                                                                                              was called by...
                                                                                                          ncalls  tottime  cumtime
C:\Program Files (x86)\ArcGIS\Desktop10.1\arcpy\arcpy\geoprocessing\_base.py:484(<lambda>)            <-      10    0.000    0.011  C:\Program Files (x86)\ArcGIS\Desktop10.1\arcpy\arcpy\arcobjects\mixins.py:210(__init__)
                                                                                                               2   14.767   14.767  C:\Program Files (x86)\ArcGIS\Desktop10.1\arcpy\arcpy\conversion.py:1497(FeatureClassToFeatureClass)
                                                                                                               1    1.188    1.188  C:\Program Files (x86)\ArcGIS\Desktop10.1\arcpy\arcpy\management.py:1615(CreateFeatureclass)
                                                                                                             150   70.096   70.100  C:\Program Files (x86)\ArcGIS\Desktop10.1\arcpy\arcpy\management.py:2916(AddField)
                                                                                                               2   12.396   12.396  C:\Program Files (x86)\ArcGIS\Desktop10.1\arcpy\arcpy\management.py:3637(Delete)

The question basically is, what is arcpy doing and why does it take so long? The above timing was taken when the script was executed from a mapped remote drive. When executed locally, the lambda in _base.py was responsible for only 10 seconds of total time which fell from 120 to 18 seconds. How would it matter though if the feature class is created in memory anyway?

Any suggestions will be appreciated.

curtvprice · ‎09-25-2013

I'm guessing the extra time is being taken by having to do tool validation.

Two thoughts:

1) Even though you're working in memory, the current and scratch workspace may need to be checked by the validation step before the tool runs. If your current workspace is over a slow network connection, I could see that slowing down the validation.

2) Have you tried creating a layer or table view? This can often speed up a tool because a layer is "pre-validated", ie the system already knows the table schema, that the input exists, etc. Working with layers is good practice if you are running more than one tool against a dataset, so you don't have to validate each time.

arcpy.MakeFeatureLayer_management("in_memory/testtable","lyr")
arcpy.AddField_management("lyr", "NEWFIELD", "LONG")
# ... do other things with the layer ...

GrzegorzDabkowski · ‎09-26-2013

Thanks for the tip about using layers. The timings seem to be a tad better after I wrapped the featclass using MakeFeatureLayer. For whatever reason I can't recreate the 90 sec+ timings at all today, though, leading me to believe it might be mostly caused by the local environment.

Regarding the workspaces, I don't have any of them set, which should mean a local TEMP directory is used?

DuncanHornby · ‎09-26-2013

Curtis,

Your #2 point sounds really interesting. So you are saying that by explicitly creating a FeatureLayer from a featureclass path (e.g. c:\temp\houses.shp) you can get a performance boost if that FeatureLayer is used multiple times within the Python script?

Why is this not mentioned in the Help? If it is I've never spotted it! This sounds like this should be adopted as best practice?

Duncan

ChrisSnyder · ‎09-26-2013

I don't agree with Curtis here... I would always recommend adding the field(s) directly to the actual featureclass or table. I have never noticed a performance boost... but then again I have never looked.

This can be a bad idea because:

arcpy.MakeFeatureLayer_managment(myFC, "cats", "FIELD1 = 'cat')
arcpy.MakeFeatureLayer_managment(myFC, "dogs", "FIELD1 = 'dog')
arcpy.AddField_managment("cats", "ANIMAL_NAMES", "TEXT", "", "", 50)
#Is BAD, since the "dogs" feature layer will not have a field called ANIMAL_NAMES

arcpy.AddField_managment(myFC, "ANIMAL_NAMES", "", "", 50)
arcpy.MakeFeatureLayer_managment(myFC, "cats", "FIELD1 = 'cat')
arcpy.MakeFeatureLayer_managment(myFC, "dogs", "FIELD1 = 'dog')
#Is GOOD since both feature layers have a field called ANIMAL_NAMES

Perhaps this: http://forums.arcgis.com/threads/66584-repeated-AddField-operations-fails-in-10.1-works-in-10.0 is related to the perfomace issue you are having?

KevinHibma · ‎10-04-2013

[I got here following this post: http://forums.arcgis.com/threads/94059-Does-creating-a-FeatureLayer-speed-up-arcpy-processing-Looks-...... ]

Chris,

Your code sample isn't correct. When you're changing schema on a featurelayer, you're changing schema on the source. Thus the featurelayers built off that will have a new field available. A featurelayer is much more like a pointer to data than it is data itself.
However a featurelayer honors selection, like you've shown.

import arcpy

arcpy.MakeFeatureLayer_management(r'D:\temp\gpKML\data.gdb\points',"lyr1","FIELD1 ='aaa'")
arcpy.MakeFeatureLayer_management(r'D:\temp\gpKML\data.gdb\points',"lyr__2","FIELD1 ='bbb'")

#add to lyr1
arcpy.AddField_management("lyr1", "NEWLONG","LONG")

#calc that new field on lyr__2
arcpy.CalculateField_management("lyr__2","NEWLONG", 222, "PYTHON")

print "fields..."
for field in arcpy.ListFields("lyr__2"):
    print field.name

print "   values for lyr1:"    
with arcpy.da.SearchCursor("lyr1", ("NEWLONG")) as cursor:
    for row in cursor:
        print row[0]
print "   values for lyr__2:"

with arcpy.da.SearchCursor("lyr__2", ("NEWLONG")) as cursor:
    for row in cursor:
        print row[0]

OUTPUT:
fields...
OBJECTID
SHAPE
field1
NEWLONG
values for lyr1:
None
values for lyr__2:
222

I'll just make a general statement about performance.... in general, layers (feature layers, table views, etc) are faster to operate on because "they're open".
A very crude test would be to buffer 100,000 points 50x. If you had this as a layer the gp operations dont have to go back and re-open the data every execution vs if you were referencing the featureclass itself. Of course I'm talking about the performance of buffer, not any of the costs of creating the layer. But hey, if you're doing the same operation 50x you probably wouldnt be re-constructing the layer 50x, just once, right?
(remember, this is an in-general statement. There's probably exceptions depending on machines, data, and functions being used)

ChrisSnyder · ‎10-04-2013

Hi Kevin,

Appologies... Looks like you guys fixed a bug that I had found a long time ago (v9.1, or so)... and I probably didn't make my point very clear in the code above, but that point is (was?): It used to be that if you created two feature layers ('fl1' and 'fl2') from a single source featureclass, then added a field to 'fl1', the 'fl2' feature layer would not "be aware" of that field having being created. A possible bad outcome of this was if you then tried to calc the newly added field via 'fl2' (remember you added it to 'fl1', not 'fl2')... it would throw an error thinking that the field didn't exist. My main point was that adding fields to a feature layer rather than a featureclass might be a poor practice since this it creates the possibility of this scenario happening (which I myself had experienced a long while back). It seems somewhere along the line this issue was fixed (now there is a direct link to the featureclass and all featurelayers derived from it), and I failed to notice. So, my <now revised> stance is:

1. Curtis is correct
2. There no longer appears to be any issues adding a field directly to a fetaure layer, when other pre-existing feature layers are based off the same source featureclass.
3. Adding a field to a feature layer seems 'a bit' faster than adding it to the source featureclass.