<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Slow Select By Attribute processing in Python Questions</title>
    <link>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524416#M41114</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi M,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Not certain it'd definitely fix it, but I would probably build the query as one string, and use the Select gp tool to extract the records all at once. This would remove the loop and prevent modifying the selection of the layer, which may speed it up. It would also mean you wouldn't need to create a feature layer of the shapefile in the first place I believe.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Assuming that your text file has a Key on every line, something like this might work (untested!):&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="lia-code-sample line-numbers language-none"&gt;

# Text files 
output = "D:\\Subblocks\\SubblocksOutput.txt" 

# Set overwrite property
gp.OverWriteOutput = 1

# Make feature layer
gp.MakeFeatureLayer("D:\\SUBBLOCKS.shp", "feature_lyr")

# Open output file in read only mode
text_file = open(output, "r") 
# Read the file into a list
data = text_file.readlines()
# Strip the data of trailing newlines
cleandata = [x.strip() for x in data]
# Quick function to put the data into SQL syntax
querify = lambda st: '"KEY" = \'%s\''%st
# Apply that function to all data (transforming each element from "a1" to "KEY" = 'A1')
queryterms = map(querify,cleandata)
# Put " OR " between each of the queries
querystring = " OR ".join(queryterms)
# Use the Select tool once
gp.Select(input_shapefile,output_dataset,querystring)

# Close text file
text_file.close() 

# Free memory
del gp&lt;/PRE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;So this is basically reading the text file into a list, and then building one long SQL query string which will be "KEY" = 'A1' OR "KEY" = 'B2' OR "KEY" = 'C3'...&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;It then passes that string as the argument to the select tool, which extracts records matching the query into a new dataset. Note that you might have to look up the actual syntax of the Select tool, I'm not certain of the order of arguments.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Let me know how it goes!&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;-Thom&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Sat, 11 Dec 2021 22:51:54 GMT</pubDate>
    <dc:creator>ThomMackey</dc:creator>
    <dc:date>2021-12-11T22:51:54Z</dc:date>
    <item>
      <title>Slow Select By Attribute processing</title>
      <link>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524415#M41113</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi there,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I have written some code that reads each line of a text file and adds the lines to a list. It then loops through each item in this list and applies the item to a definition query to select By attribute features in a feature layer. Once its finished looping through the list it copies all selected features to a shapefile. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;My concern is that it is extremely slow to process and I am wondering if there is anyway I can do to speed the process up or perhaps a different way I could approach this to speed the script up?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Any ideas on how I could speed up my script would be greatly appreciated...&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Here's my code:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;# Text files&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;output = "D:\\Subblocks\\SubblocksOutput.txt"&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;# Set overwrite property&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;gp.OverWriteOutput = 1&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;# Make feature layer&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;gp.MakeFeatureLayer("D:\\SUBBLOCKS.shp", "feature_lyr")&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;# Open output file in read only mode&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;text_file = open(output, "r")&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;# Loop through text file and add to list&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;lines = text_file.readlines()&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;print len(lines)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;for line in lines:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; print line&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; query = "\"KEY\"" + " = " + "\'" + line.rstrip("\n") + "\'"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; gp.SelectLayerByAttribute_management("feature_lyr", "ADD_TO_SELECTION", query)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;# Copy selected features to shapefile&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;gp.CopyFeatures_management("feature_lyr", "D:\\Subblocks\\SELECTED_SUBBLOCKS.shp")&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;# Close text file&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;text_file.close()&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;# Free memory&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;del gp&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 01 Aug 2011 04:54:23 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524415#M41113</guid>
      <dc:creator>MPickering</dc:creator>
      <dc:date>2011-08-01T04:54:23Z</dc:date>
    </item>
    <item>
      <title>Re: Slow Select By Attribute processing</title>
      <link>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524416#M41114</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi M,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Not certain it'd definitely fix it, but I would probably build the query as one string, and use the Select gp tool to extract the records all at once. This would remove the loop and prevent modifying the selection of the layer, which may speed it up. It would also mean you wouldn't need to create a feature layer of the shapefile in the first place I believe.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Assuming that your text file has a Key on every line, something like this might work (untested!):&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="lia-code-sample line-numbers language-none"&gt;

# Text files 
output = "D:\\Subblocks\\SubblocksOutput.txt" 

# Set overwrite property
gp.OverWriteOutput = 1

# Make feature layer
gp.MakeFeatureLayer("D:\\SUBBLOCKS.shp", "feature_lyr")

# Open output file in read only mode
text_file = open(output, "r") 
# Read the file into a list
data = text_file.readlines()
# Strip the data of trailing newlines
cleandata = [x.strip() for x in data]
# Quick function to put the data into SQL syntax
querify = lambda st: '"KEY" = \'%s\''%st
# Apply that function to all data (transforming each element from "a1" to "KEY" = 'A1')
queryterms = map(querify,cleandata)
# Put " OR " between each of the queries
querystring = " OR ".join(queryterms)
# Use the Select tool once
gp.Select(input_shapefile,output_dataset,querystring)

# Close text file
text_file.close() 

# Free memory
del gp&lt;/PRE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;So this is basically reading the text file into a list, and then building one long SQL query string which will be "KEY" = 'A1' OR "KEY" = 'B2' OR "KEY" = 'C3'...&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;It then passes that string as the argument to the select tool, which extracts records matching the query into a new dataset. Note that you might have to look up the actual syntax of the Select tool, I'm not certain of the order of arguments.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Let me know how it goes!&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;-Thom&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 11 Dec 2021 22:51:54 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524416#M41114</guid>
      <dc:creator>ThomMackey</dc:creator>
      <dc:date>2021-12-11T22:51:54Z</dc:date>
    </item>
    <item>
      <title>Re: Slow Select By Attribute processing</title>
      <link>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524417#M41115</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi Thom,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks so much for your reply! It worked perfectly! &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I followed your example exactly and it processed the 12,000 lines in minutes which is brilliant. Before it was still processing after a full day and I would just give up on it. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Another change I made was converting my shapefile to a feature class in a file geodatabase and&amp;nbsp; then using this to create a feature layer (gp.MakeFeatureLayer) to use in the processing and this also dramatically improved the process. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks again!&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;m&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 01 Aug 2011 23:24:54 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524417#M41115</guid>
      <dc:creator>MPickering</dc:creator>
      <dc:date>2011-08-01T23:24:54Z</dc:date>
    </item>
    <item>
      <title>Re: Slow Select By Attribute processing</title>
      <link>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524418#M41116</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Glad it worked! Days to minutes is a pretty big improvement! I didn't realise that you had that many records, I've had the Select tool fail when trying to use an SQL string with more than ~10,000 conditions in it (using a similar process) so I'm glad it worked out for you &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; If you end up with more in future and it starts to break, I'd probably do something like&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="lia-code-sample line-numbers language-none"&gt;
all_lines_to_query = [x.strip() for x in in_file.readlines()]

first_half = all_lines_to_query[:len(all_lines_to_query)/2]

second_half = all_lines_to_query[len(all_lines_to_query)/2:]
&lt;/PRE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Then do the process on both halves, and use the Merge (or is it Append?) gp tool to merge the results. It's a bit messy but it's the only way I've found that worked on large sets.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Anyway, happy to help &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;-Thom&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 11 Dec 2021 22:51:57 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524418#M41116</guid>
      <dc:creator>ThomMackey</dc:creator>
      <dc:date>2021-12-11T22:51:57Z</dc:date>
    </item>
    <item>
      <title>Re: Slow Select By Attribute processing</title>
      <link>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524419#M41117</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi Thom,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I was a little worried it wouldn't work for that many records too but when I changed the shapefile to a feature class in a file geodatabase and run the script it worked. Before as a shapefile it just crashed. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I have read on other posts that people have had similar issues with the SQL string failing with too many characters, conditions in the statement. I did see one post that mentioned something about using the 'IN' keyword in the SQL syntax e.g. "KEY" IN ( 'COOK1729A' , 'COOK1729B', etc, etc) and building it that way. They had success with this apparently for a large number of conditions. But I haven't tested it ...&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I'll definitely keep the batching code you posted in mind for the future. Its a tidy bit of code that I will no doubt use in the future. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks again Thom.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;m&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 02 Aug 2011 00:46:12 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524419#M41117</guid>
      <dc:creator>MPickering</dc:creator>
      <dc:date>2011-08-02T00:46:12Z</dc:date>
    </item>
    <item>
      <title>Re: Slow Select By Attribute processing</title>
      <link>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524420#M41118</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Good tip! I always forget about SQL's "IN" operator. I just tested it with one of my old scripts, and was able to extract &amp;gt;300,000 unique records in one step - no messing around with merging! &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;So doing it that way, your script would go something like:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="lia-code-sample line-numbers language-none"&gt;
# The field you're querying
queryfield = "KEY"

# Read the file into a list
data = text_file.readlines()

# Strip the data of trailing newlines
cleandata = [x.strip() for x in data]

# Put single quotes around each string for SQL-friendliness
queryterms = ["'%s'"%x for x in cleandata]

# Put ", " between each of the queries
comma_sep_terms = ", ".join(queryterms)

# Then make the full expression
querystring = '"%s" IN (%s)'%(queryfield,comma_sep_terms)

# Use the Select tool once
gp.Select(input_shapefile,output_dataset,querystring)
&lt;/PRE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Much&lt;/STRONG&gt;&lt;SPAN&gt; cleaner! It seems odd, because I've always understood that "IN" is procedurally equivalent to multiple "OR"s (if anything a little slower), but for whatever reason, the GP prefers it.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks heaps for mentioning that &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 11 Dec 2021 22:52:00 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524420#M41118</guid>
      <dc:creator>ThomMackey</dc:creator>
      <dc:date>2021-12-11T22:52:00Z</dc:date>
    </item>
    <item>
      <title>Re: Slow Select By Attribute processing</title>
      <link>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524421#M41119</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Good to know it worked and glad I could return the favour. I think I will use this method too. &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks again Thom, you've saved me a lot of frustration and time.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;m&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 02 Aug 2011 02:41:34 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/slow-select-by-attribute-processing/m-p/524421#M41119</guid>
      <dc:creator>MPickering</dc:creator>
      <dc:date>2011-08-02T02:41:34Z</dc:date>
    </item>
  </channel>
</rss>

