Poor performance with FileGDB and AddField_management

3407
12
11-28-2011 08:27 AM
KyleShannon
New Contributor III
I am getting poor performance adding fields to a feature class in a FileGDB.  This is a ArcGIS Server geoprocessing service and for each call, AddField is taking between 1.3 and 1.8 seconds per call.  With 25 fields to add (I don't know what they are ahead of time) it takes way too long to run for a service.  If I use in memory or shapefiles, it takes about 1/10 the time.  Is there a way to add multiple fields at once to avoid the overhead (I assume) of opening and closing the FileGDB?  Shape files and in memory features don't support all the functionality I need.  Below is a quick example straight from my arcpy console (the exec calls are for more accurate times):

>>> fc = 'C:\\Users\\kshannon\\Documents\\ArcGIS\\Default.gdb\\output'
>>> shp = C:\\Users\\kshannon\\Documents\\ArcGIS\\output.shp'
>>> gdb_code = 'start = time.clock()\narcpy.AddField_management(fc,new_field,"DOUBLE")\nprint("AddField:{0}".format(time.clock()-start))'
>>> shp_code = 'start = time.clock()\narcpy.AddField_management(fc,new_field,"DOUBLE")\nprint("AddField:{0}".format(time.clock()-start))'
>>> new_field = "NEW_FIELD"
>>> exec(shp_code)
AddField:0.683299860168

>>> exec(gdb_code)
AddField:1.4100630674


Neither feature is loaded in ArcMap.  The time gap grows quite a bit with 25-30 new Fields.  Any suggestions?
0 Kudos
12 Replies
KyleShannon
New Contributor III
I think copying the in memory maybe the best way to go.  I am running an extreme test overnight that adds a crazy amount of fields (5000) to each to see how performance is affected by how many fields are already present(mostly just curious).  It appears shapefiles have to traverse a list of fields before inserting, like a linked list.  I saw it with 100 fields.  FGDB, I am not sure about.  in memory I think is constant insertion, same with the ogr access.  I will see in the morning.  I know 5000 fields will probably never be inserted, but it's a good way to see trends in data.  I will post a graph tomorrow.
0 Kudos
KyleShannon
New Contributor III
Both fgdb and esri's shape access suffer from size/performance correlation.  It is strange, because when I use ogr to add fields, insertion is constant.  Good example of 'Schlemiel the Painter's algorithm'[0].  Never thought I would see an example of that now a days.  Attached is a graph of insertion times over the course of 1000 AddField calls.  I also tested the copy of the mem to gdb, and that seems to be the best way to go in arc:

Output of test over 1000 Add Field calls:
C:\Users\kshannon\Desktop>c:\Python26\ArcGIS10.0\python.exe add_field_test.py
Creating feature classes...
Feature classes created.
GDB(Total:940.140323837,Max(924):1.72044671624,Min(60):0.582889467023,Average:0.940140323837
SHP(Total:352.133845124,Max(994):1.01701894889,Min(2):0.0293058087336,Average:0.352133845124
MEM(Total:52.2060125065,Max(994):1.01701894889,Min(2):0.0293058087336,Average:0.0522060125065
MEM to GDB:2.62204624813
OGR(Total:37.934927268,Max(994):1.01701894889,Min(1):0.0222360889709,Average:0.037934927268
Deleting feature classes...
Feature classes deleted.


The values in parentheses are the iteration at which the min/max occurred.  Thanks for the help.

[0] http://en.wikipedia.org/wiki/Schlemiel_the_Painter's_algorithm
0 Kudos
ChrisSnyder
Regular Contributor III
Ouch... Can't hide from the numbers.

FYI: A lot of the ESRI geoprocessing tools exhibit this behavior (consistently longer run times) when run in a loop. It used to be FAR worse, and to their credit, ESRI has made great strides in correcting/minimizing the issue.

http://forums.esri.com/Thread.asp?c=93&f=1729&t=177007
http://forums.esri.com/Thread.asp?c=93&f=1729&t=175721
http://forums.arcgis.com/threads/8866-Spatial-Join-CRASHING-at-10

The gp "memory leak" might not be the underlying issue here though...
0 Kudos