Does creating a FeatureLayer speed up arcpy processing? Looks like it...

DuncanHornby · ‎10-04-2013

All,

In a recent thread discussed here it was suggested that creating a FeatureLayer from a FeatureClass can improve the performance of arcpy. It was clear from the subsequent discussions that this was not necessarily accepted by others and I was skeptical as I have never seen any reference to this "top tip".

So with time to kill I set out to test this and I am reporting my findings for others to mull over.

I have 10.2 on a VISTA OS and a I ran everything in Pyscripter. On each run I reset the python interpreter window and reset the source dataset so the conditions were the same. I had stopped all other applications and did not touch the keyboard whilst the code ran.

I had created a point shapefile with 1000 random points, my test code simply added a field and then calculated a constant value into this new field. The code repeated these steps 49 times. I record the start and end times so I could work out how long it took. I repeated the test 10 times for each scenario. My null hypothesis is that there is no performance difference.

The two test scenarios where:

Access the source dataset as a full path name, what you typically see in many examples in the ESRI help

Create a FeatureLayer first and use that instead of the full path name

My code for accessing the full path was:

import arcpy
import time


print "START TIME = " + time.asctime()


# Source dataset to update
fc = r"C:\Scratch\TestLayer.shp"


# Create a list of numbers from 1 to 50
l = range(1,50,1)


for i in l:
    # Create a field name and add it to dataset
    name = "F_" + str(i)
    print "Processing "  + name
    arcpy.AddField_management(fc,name,"LONG")


    # Populate field with a constant 999
    arcpy.CalculateField_management(fc,name,"999","VB")


print "END TIME = " + time.asctime()

My code for accessing a FeatureLayer was:

import arcpy
import time


print "START TIME = " + time.asctime()


# Source dataset to update
fc = r"C:\Scratch\TestLayer.shp"
fl = "TestLayer"
arcpy.MakeFeatureLayer_management(fc,fl)


# Create a list of numbers from 1 to 50
l = range(1,50,1)


for i in l:
    # Create a field name and add it to dataset
    name = "F_" + str(i)
    print "Processing "  + name
    arcpy.AddField_management(fl,name,"LONG")


    # Populate field with a constant 999
    arcpy.CalculateField_management(fl,name,"999","VB")


print "END TIME = " + time.asctime()

For the 10 test runs the mean time for running the code for:

full path name was 22 seconds

FeatureLayer was 19.5 seconds

I did a Two-Sample T-Test in Minitab which is significant which means I reject my null hypothesis.

Two-sample T for fl vs fc

   N Mean   StDev SE Mean
fl 10 19.500 0.527     0.17
fc 10 22.000 0.816     0.26

Difference = mu (fl) - mu (fc)
Estimate for difference: -2.500
95% CI for difference: (-3.155, -1.845)
T-Test of difference = 0 (vs not =): T-Value = -8.13 P-Value = 0.000 DF = 15

So for this simple scenario there was a significant difference in performance by creating a FeatureLayer first...

Interesting hey?

ChrisSnyder · ‎10-04-2013

Okay - since I was the naysayer....

Duncan, I ran your code on my own point .shp (1000 random pnts). I ran each script 4 times, restarting Python for each run. Here are the results (in seconds):

1. Feature Layer: [9, 9, 9, 9]
2. Feature Class: [10, 9, 10, 9]

Okay.... 5% faster it seems from this limited test, enough of a (very small) difference to get me curiuous... so I upped the feature cound to 10,000 pnts... Here are the results of those (three this time) runs:

1. Feature Layer: [49.08, 46.10, 46.36]
2. Feature Class: [51.06, 46.78, 45.51]

So about a 1% speed boost this time...

I'll stand by my prior statement on the other thread.... but I will admit the minutia of these results may indicate there is a very small performance gain by adding fields to a feature layer instead of the source featureclass. Because the overall speed boost difference was reduced (5% to 1%) by adding more records, I'm deducing that the performace gain is coming from the AddField tool, and not the CalcField tool.

DuncanHornby · ‎10-04-2013

Chris,

Before you increased the number of points in your test data it's interesting to note that your run times are around 9 to 10 seconds. Mine were around the 20 second mark! I'm guessing this is down to you having a superior PC? The PC that I did the testing on was a 32 bit operating system with a Intel Core 2 CPU (2.4Ghz).

I was wondering after I posted if the performance benefit would be noticeable in a much more complex scenario? For example some cursor going through some dataset farming out selections for subsequent processing steps.

I also wonder if this make FeatureLayer "micro" boost in performance extends to TableViews and RasterLayers?

Anyway it's good to have a naysayer casting doubt at every opportunity as I would never have queried such a technique.

Duncan

ChrisSnyder · ‎10-04-2013

I ran that on my new machine: Xeon 2687 (3.1 Ghz w/ turbo @ 3.9Ghz) with 2 solid state drives in RAID0 - Also 64 GB of RAM and 64bit OS too, but really only the disk and processor makes a real difference for a test like this I think. Moore's Law and some other stuff (like SSDs!) seem to be in full effect still - and so these infernal machines seem to contuinue getting exponentially faster and faster every year.

Like Kevin was saying on the other thread, a feature layer is a pointer (a door if you will) to an on-disk feature class. So I can believe that they might increase performace for some thinks like adding fields, listing fields, etc. But I'm not convinced about the geometry stuff... So for example, unioning 50 featureclasses together vs. unioning 50 feature layers (of the feature classes). Maybe you would see a noticeable diff if they were really small and the schema stuff (and not the geometry stuff) was the bulk of the processing? I'd have a hard time accepting a noticable boost for "geometrically large" datasets though. Have to test that out sometime in the near future!

Could it be that many of the geoprocessing tools, behind the scenes, have to convert the featureclasses to layers, and that if they are already a layer, then this small added overhead is not needed... thus the (slighter) faster run times?