ArcGIS Pro 3.0.1: What is the benefit of indexing a feature class for a particular field?
For example, a layer of around 340,000 records is indexed based on the “parcelnumber” field knowing that this field has around 16,000 unique values. How the benefit of this indexing can be felt?
Solved! Go to Solution.
Indexes speed up everything that uses WHERE statements. I don't know if the search feature does that, it's possible that you won't get a speed boost here.
But there is a noticeable difference in selecting a record (table from a fgdb, can't test in an egdb right now):
from timeit import timeit
timeit(lambda: arcpy.management.SelectLayerByAttribute("TestTable", "NEW_SELECTION", "IntegerField = 172649"), number=100)
#301.5261236000006
timeit(lambda: arcpy.management.SelectLayerByAttribute("TestTable", "NEW_SELECTION", "OBJECTID = 172649"), number=100)
#244.7347675000001
A query on the non-indexed field took 3.01 seconds on average. On the indexed field it took 2.44 seconds, that's a 20% speedup.
Attribute indexes in the geodatabase—ArcGIS Pro | Documentation
every little bit helps to locate data even if you have many unique records
Short, non-technical answer: Indexes on a field make queries based on that field faster. ArcGIS takes care of most of the indexes (e.g. on OBJECTID fields and on fields used in relationship classes).
More explanation: Database Indexes Explained - Essential SQL
If indexing is all about enhancing the performance in searching the data stored in a table, then how to prove that this performance is enhanced as indexing is applied? How it can be felt?
For example, in the screenshot below, what is the difference between searching the parcel #73 with and without indexing?
Honestly, I am not even sure if the Find or Find & Replace dialogs rely on indexes. Indexes are used by database engine software to determine optimal execution plans. In the case of the Find tool, it might just be using a cursor to iterate through records and checking field values that match.
Indexes speed up everything that uses WHERE statements. I don't know if the search feature does that, it's possible that you won't get a speed boost here.
But there is a noticeable difference in selecting a record (table from a fgdb, can't test in an egdb right now):
from timeit import timeit
timeit(lambda: arcpy.management.SelectLayerByAttribute("TestTable", "NEW_SELECTION", "IntegerField = 172649"), number=100)
#301.5261236000006
timeit(lambda: arcpy.management.SelectLayerByAttribute("TestTable", "NEW_SELECTION", "OBJECTID = 172649"), number=100)
#244.7347675000001
A query on the non-indexed field took 3.01 seconds on average. On the indexed field it took 2.44 seconds, that's a 20% speedup.
I tried the script on my data but ends with the error below:
What could be the issue here?
---------------------
from timeit import timeit
timeit(lambda: arcpy.management.SelectLayerByAttribute("Parcels", "NEW_SELECTION", "ParcelNumber = 72"), number=100)
---------------------------------
Your ParcelNumber is probably a text field. SelectLayerByManagement takes a SQL where clause. In SQL, you have to enclose text values with single quotes:
timeit(lambda: arcpy.management.SelectLayerByAttribute("Parcels", "NEW_SELECTION", "ParcelNumber = '72'"), number=100)
It works fine but the time is not displayed.
How to display the time?
Huh, it just prints out the time for me...
t = timeit(lambda: arcpy.management.SelectLayerByAttribute("Parcels", "NEW_SELECTION", "ParcelNumber = '72'"), number=100)
print(t)