Problem with Arcpy for ArcPro - SelectLayerByAttribute_management takes longer and longer if it is called many times

578
10
Jump to solution
02-22-2019 12:54 PM
VincentLantaca1
New Contributor III

I have set up a simple test script to demonstrate the problem I'm running into:

#importing python modules
import arcpy, time

basePath = r"C:\Users\vincentl\AppData\Local\PythonTesting\Geodatabases\ScratchAddressing.gdb"
arcpy.env.workspace = basePath
addresses = basePath + r"\GIS_DBO_Places_Sample"

whereClause = "OBJECTID IN (1)"

for i in range(0, 100):
    start = time.time()
    addSelect = arcpy.SelectLayerByAttribute_management(addresses, "NEW_SELECTION", whereClause)       #selecting current record
    stop = time.time()
    print("Time elapsed: {0}".format(stop - start))
‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Output:

Time elapsed: 2.4421534538269043
Time elapsed: 0.2509007453918457
Time elapsed: 0.26900672912597656
Time elapsed: 0.2650105953216553
Time elapsed: 0.2917516231536865
Time elapsed: 0.2583498954772949
Time elapsed: 0.27279186248779297
Time elapsed: 0.2871057987213135
Time elapsed: 0.31418442726135254
Time elapsed: 0.2747483253479004
Time elapsed: 0.2698488235473633
Time elapsed: 0.2711470127105713
Time elapsed: 0.30519843101501465
Time elapsed: 0.30664968490600586
Time elapsed: 0.3076004981994629
Time elapsed: 0.2947726249694824
Time elapsed: 0.3275151252746582
Time elapsed: 0.3086280822753906
Time elapsed: 0.3205225467681885
Time elapsed: 0.32954835891723633
Time elapsed: 0.32260847091674805
Time elapsed: 0.328507661819458
Time elapsed: 0.3341994285583496
Time elapsed: 0.32455968856811523
Time elapsed: 0.3244626522064209
Time elapsed: 0.3398430347442627
Time elapsed: 0.3336300849914551
Time elapsed: 0.3469429016113281
Time elapsed: 0.3484632968902588
Time elapsed: 0.3517005443572998
Time elapsed: 0.3604257106781006
Time elapsed: 0.36439085006713867
Time elapsed: 0.3885536193847656
Time elapsed: 0.44512104988098145
Time elapsed: 0.395768404006958
Time elapsed: 0.3962588310241699
Time elapsed: 0.4052317142486572
Time elapsed: 0.4457588195800781
Time elapsed: 0.4870429039001465
Time elapsed: 0.42713308334350586
Time elapsed: 0.4171907901763916
Time elapsed: 0.4220712184906006
Time elapsed: 0.4343099594116211
Time elapsed: 0.4841780662536621
Time elapsed: 0.4685649871826172
Time elapsed: 0.4868438243865967
Time elapsed: 0.4662456512451172
Time elapsed: 0.5436382293701172
Time elapsed: 0.5267820358276367
Time elapsed: 0.5091872215270996
Time elapsed: 0.5081048011779785
Time elapsed: 0.5297191143035889
Time elapsed: 0.5239181518554688
Time elapsed: 0.5485963821411133
Time elapsed: 0.5876140594482422
Time elapsed: 0.5736618041992188
Time elapsed: 0.5639777183532715
Time elapsed: 0.5830345153808594
Time elapsed: 0.6187348365783691
Time elapsed: 0.5978899002075195
Time elapsed: 0.6382076740264893
Time elapsed: 0.6436002254486084
Time elapsed: 0.6225919723510742
Time elapsed: 0.6568043231964111
Time elapsed: 0.7085685729980469
Time elapsed: 0.6644773483276367
Time elapsed: 0.6723501682281494
Time elapsed: 0.6834757328033447
Time elapsed: 0.6862823963165283
Time elapsed: 0.7144062519073486
Time elapsed: 0.75490403175354
Time elapsed: 0.7380294799804688
Time elapsed: 0.7777352333068848
Time elapsed: 0.782534122467041
Time elapsed: 0.8311202526092529
Time elapsed: 0.7933468818664551
Time elapsed: 0.8011541366577148
Time elapsed: 0.8183913230895996
Time elapsed: 0.8311331272125244
Time elapsed: 0.8553295135498047
Time elapsed: 0.8610031604766846
Time elapsed: 0.9558396339416504
Time elapsed: 0.9203469753265381
Time elapsed: 0.9103479385375977
Time elapsed: 0.9160957336425781
Time elapsed: 0.928046703338623
Time elapsed: 0.950376033782959
Time elapsed: 0.9951839447021484
Time elapsed: 0.9768331050872803
Time elapsed: 0.9877028465270996
Time elapsed: 1.0079317092895508
Time elapsed: 1.0394885540008545
Time elapsed: 1.0345790386199951
Time elapsed: 1.0764164924621582
Time elapsed: 1.0987389087677002
Time elapsed: 1.108257532119751
Time elapsed: 1.114212989807129
Time elapsed: 1.1997053623199463
Time elapsed: 1.154165267944336
Time elapsed: 1.1690573692321777

If SelectLayerByAttribute_management is called multiple times, each successive call will take longer and longer. Is there anything I am doing wrong?

0 Kudos
1 Solution

Accepted Solutions
curtvprice
MVP Esteemed Contributor

Since you are providing a dataset to the SelectLayerByAttribute tool (not a layer, as the tool needs) the tool is dealing with this by creating a layer on the fly on every run of the tool.  It seems (note the incrementing layer names in Dan's test) filling memory up with layers on every iteration, bogging things down. As the first note in the help on this tool says:

If the input is a feature class or dataset path, this tool will automatically create and return a new layer with the result of the tool applied.

Try this instead. Note my use of the current workspace to simplify path handling.

#importing python modules
import time
import arcpy
from arcpy import env

env.workspace = (
    r"C:\Users\vincentl\AppData\Local\PythonTesting\"
    r"Geodatabases\ScratchAddressing.gdb")
lyr = arcpy.MakeFeatureLayer_management("GIS_DBO_Places_Sample", "lyr")
whereClause = "OBJECTID IN (1)"

for i in range(0, 100):
    start = time.time()
    arcpy.SelectLayerByAttribute_management(lyr, "NEW_SELECTION", whereClause)
    stop = time.time()
    print("Time elapsed: {0}".format(stop - start))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

View solution in original post

10 Replies
DanPatterson_Retired
MVP Esteemed Contributor

not deleting the selection between calls? It could be a memory thing if the file is large

0 Kudos
VincentLantaca1
New Contributor III

How do I delete the selection between calls?

I have tried adding "del addSelect" at the end of the loop, it does nothing.

Adding "addSelect = arcpy.SelectLayerByAttribute_management(addresses, "CLEAR_SELECTION")" after each loop simply makes it longer, since all it does is add more calls

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

It will test the case of making a selection and clearing a selection... that should be the only way to compare the inter-selection times.

That was what I was saying about a memory thing.  Since even though "new selections" were being made, there is no guarantee that 'garbage' was being cleared out and memory usage wasn't getting filled.  You could also try monitoring your memory use profile using windows system tools while you do this 

0 Kudos
VincentLantaca1
New Contributor III

Memory usage does increase over time. Why would accumulated garbage in memory slow down successive calls to SelectLayerByAttribute_management?

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

You would have to profile it.  I suspect that you are using ArcMap which already has memory limitations. This can occur in other environments, like in python garbage collection usually doesn't happen until a script is finished.  I would be more interested in the fact that "clearing the selection" doesn't do anything memory-wise but I isn't unexpected.  When your script finishes.. does memory usage drop back to the initial level?

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

Well, a lot of disk writing even with adding the 'CLEAR_Selection' after the last time step.  So more is going on behind the scenes that meets the eye.  My cache bloomed.  I ran it with ArcGIS Pro open and closed. and I got results similar to yours

Here is a query I did on a 2K table which results in 492 records being selected

import arcpy
import time
def test():
    """ """
    gdb = r"C:\Path_to_my_gdb\array_tools.gdb"
    start = time.time()
    lyr = gdb + r"\pnts_2000"
    qry = "Text01 = 'C'"
    r = arcpy.management.SelectLayerByAttribute(lyr, "NEW_SELECTION", qry, None)
    stop = time.time()
    arcpy.management.SelectLayerByAttribute(lyr, "CLEAR_SELECTION") #, qry, None)
    return r, (stop - start)
r, t = test()
t   =>   0.238 s
r[0]  =>  <arcpy._mp.Layer at 0x226a6ec46a0>   The result object, layer
r[1]  =>  '492'        The result count (which is correct)
running it 10 times yielded
[test() for i in range(10)]

[(<Result 'pnts_2000_Layer20'>, 0.25133728981018066),
 (<Result 'pnts_2000_Layer26'>, 0.20045042037963867),
 (<Result 'pnts_2000_Layer32'>, 0.24329805374145508),
 (<Result 'pnts_2000_Layer38'>, 0.22243928909301758),
 (<Result 'pnts_2000_Layer44'>, 0.23232150077819824),
 (<Result 'pnts_2000_Layer50'>, 0.2583131790161133),
 (<Result 'pnts_2000_Layer56'>, 0.25226902961730957),
 (<Result 'pnts_2000_Layer62'>, 0.3051586151123047),
 (<Result 'pnts_2000_Layer68'>, 0.282210111618042),
 (<Result 'pnts_2000_Layer74'>, 0.2693297863006592)]
about the same... this did scale quite large when it got over 50 into 100 range.

Couldn't be that slow... so I cranked up my usual method of doing queries, using the same file read from disk.

The actual conversion to a numpy array took 0.0075 seconds with a whole 17.5 microseconds per query for 7 runs of 100,000 loops.

%timeit arcpy.da.TableToNumPyArray(lyr, "Text01")
7.5 ms ± 132 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit (a['Text01'] == 'C')
17.5 µs ± 63.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)‍‍‍‍‍

If you are really performing repetitive queries or even different queries, you might want to consider this alternative.

To that end, I determined all the unique cases in that table's field and ran a query on each of them and timed the whole lot.

np.unique(a)

array([('A',), ('Aa',), ('Aaa',), ('B',), ('Bbbbb',), ('C',), ('Ccccc',),
       ('D',), ('Dddd',), ('None',)], dtype=[('Text01', '<U5')])

%timeit (a['Text_01' == i] for i in np.unique(a))
537 µs ± 15.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Scaling quite nicely for 10 queries for the 10 cases, run 7 times for 1000 loops.  

Options abound since NumPy and esri products mesh quite nicely and offer advantages to the standard interface for large data analysis and/or just about any task.

curtvprice
MVP Esteemed Contributor

Since you are providing a dataset to the SelectLayerByAttribute tool (not a layer, as the tool needs) the tool is dealing with this by creating a layer on the fly on every run of the tool.  It seems (note the incrementing layer names in Dan's test) filling memory up with layers on every iteration, bogging things down. As the first note in the help on this tool says:

If the input is a feature class or dataset path, this tool will automatically create and return a new layer with the result of the tool applied.

Try this instead. Note my use of the current workspace to simplify path handling.

#importing python modules
import time
import arcpy
from arcpy import env

env.workspace = (
    r"C:\Users\vincentl\AppData\Local\PythonTesting\"
    r"Geodatabases\ScratchAddressing.gdb")
lyr = arcpy.MakeFeatureLayer_management("GIS_DBO_Places_Sample", "lyr")
whereClause = "OBJECTID IN (1)"

for i in range(0, 100):
    start = time.time()
    arcpy.SelectLayerByAttribute_management(lyr, "NEW_SELECTION", whereClause)
    stop = time.time()
    print("Time elapsed: {0}".format(stop - start))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
DanPatterson_Retired
MVP Esteemed Contributor

still relative slow.  I actually made a featurelayer on disk to test to see if that improves things

import arcpy
import time

lyr = r"C:\Arc_projects\Polygon_lineTools\pnts_2000_lyr.lyrx"
qry = "Text01 = 'C'"
def test(lyr, qry):
    """ """
    start = time.time()
    r = arcpy.management.SelectLayerByAttribute(lyr, "NEW_SELECTION", qry, None)
    stop = time.time()
#    arcpy.management.SelectLayerByAttribute(lyr, "CLEAR_SELECTION") #, qry, None)
    return (stop - start)

and ran it with and without the CLEAR_SELECTION

Without clearing

np.array([test(lyr, uniq[0]) for u in range(10)])
    array([0.065, 0.062, 0.06 , 0.06 , 0.061, 0.061, 0.066, 0.059, 0.061, 0.059])

markedly better but still 

it really slows down with multiple queries

lyr = r"C:\Arc_projects\Polygon_lineTools\pnts_2000_lyr.lyrx"
uniq = ['A', 'Aa', 'Aaa', 'B', 'Bbbbb', 'C', 'Ccccc', 'D', 'Dddd', 'None']

def test(lyr, uniq):
    """ """
    start = time.time()
    qry = "Text01 = '{}'".format(uniq)
    r = arcpy.management.SelectLayerByAttribute(lyr, "NEW_SELECTION", qry, None)
    stop = time.time()
#    arcpy.management.SelectLayerByAttribute(lyr, "CLEAR_SELECTION") #, qry, None)
    return (stop - start)

np.array([test(lyr, u) for u in uniq])
array([0.073, 0.063, 0.059, 0.059, 0.06 , 0.06 , 0.06 , 0.067, 0.062, 0.059]) = 0.612

compared to   0.000537 s for all 10 using numpy.  still orders of magnitude different, but not as bad 

0 Kudos
curtvprice
MVP Esteemed Contributor

Of course numpy will be many many orders of magnitude faster - you are working with a simple numpy structure instead of this big complex (and powerful) layer object.

But my fix of creating the layer first does make it stop the memory-eating monster and the slowing down with iterations, yes?