ArcGIS Pro 3.0.1: What is the benefit of indexing a feature class for a particular field?

JamalNUMAN · ‎08-29-2022

For example, a layer of around 340,000 records is indexed based on the “parcelnumber” field knowing that this field has around 16,000 unique values. How the benefit of this indexing can be felt?

----------------------------------------
Jamal Numan
Geomolg Geoportal for Spatial Information
Ramallah, West Bank, Palestine

JohannesLindner · ‎08-29-2022

Indexes speed up everything that uses WHERE statements. I don't know if the search feature does that, it's possible that you won't get a speed boost here.

But there is a noticeable difference in selecting a record (table from a fgdb, can't test in an egdb right now):

from timeit import timeit

timeit(lambda: arcpy.management.SelectLayerByAttribute("TestTable", "NEW_SELECTION", "IntegerField = 172649"), number=100)
#301.5261236000006

timeit(lambda: arcpy.management.SelectLayerByAttribute("TestTable", "NEW_SELECTION", "OBJECTID = 172649"), number=100)
#244.7347675000001

A query on the non-indexed field took 3.01 seconds on average. On the indexed field it took 2.44 seconds, that's a 20% speedup.

Have a great day!
Johannes

View solution in original post

DanPatterson · ‎08-29-2022

Attribute indexes in the geodatabase—ArcGIS Pro | Documentation

every little bit helps to locate data even if you have many unique records

... sort of retired...

JohannesLindner · ‎08-29-2022

Short, non-technical answer: Indexes on a field make queries based on that field faster. ArcGIS takes care of most of the indexes (e.g. on OBJECTID fields and on fields used in relationship classes).

More explanation: Database Indexes Explained - Essential SQL

Have a great day!
Johannes

JamalNUMAN · ‎08-29-2022

If indexing is all about enhancing the performance in searching the data stored in a table, then how to prove that this performance is enhanced as indexing is applied? How it can be felt?

For example, in the screenshot below, what is the difference between searching the parcel #73 with and without indexing?

----------------------------------------
Jamal Numan
Geomolg Geoportal for Spatial Information
Ramallah, West Bank, Palestine

JoshuaBixby · ‎08-29-2022

Honestly, I am not even sure if the Find or Find & Replace dialogs rely on indexes. Indexes are used by database engine software to determine optimal execution plans. In the case of the Find tool, it might just be using a cursor to iterate through records and checking field values that match.

JohannesLindner · ‎08-29-2022

Indexes speed up everything that uses WHERE statements. I don't know if the search feature does that, it's possible that you won't get a speed boost here.

But there is a noticeable difference in selecting a record (table from a fgdb, can't test in an egdb right now):

from timeit import timeit

timeit(lambda: arcpy.management.SelectLayerByAttribute("TestTable", "NEW_SELECTION", "IntegerField = 172649"), number=100)
#301.5261236000006

timeit(lambda: arcpy.management.SelectLayerByAttribute("TestTable", "NEW_SELECTION", "OBJECTID = 172649"), number=100)
#244.7347675000001

A query on the non-indexed field took 3.01 seconds on average. On the indexed field it took 2.44 seconds, that's a 20% speedup.

Have a great day!
Johannes

JamalNUMAN · ‎08-30-2022

I tried the script on my data but ends with the error below:

What could be the issue here?

---------------------

from timeit import timeit

timeit(lambda: arcpy.management.SelectLayerByAttribute("Parcels", "NEW_SELECTION", "ParcelNumber = 72"), number=100)

---------------------------------

----------------------------------------
Jamal Numan
Geomolg Geoportal for Spatial Information
Ramallah, West Bank, Palestine

JohannesLindner · ‎08-30-2022

Your ParcelNumber is probably a text field. SelectLayerByManagement takes a SQL where clause. In SQL, you have to enclose text values with single quotes:

timeit(lambda: arcpy.management.SelectLayerByAttribute("Parcels", "NEW_SELECTION", "ParcelNumber = '72'"), number=100)

Have a great day!
Johannes

JamalNUMAN · ‎08-30-2022

It works fine but the time is not displayed.

How to display the time?

----------------------------------------
Jamal Numan
Geomolg Geoportal for Spatial Information
Ramallah, West Bank, Palestine

JohannesLindner · ‎08-31-2022

Huh, it just prints out the time for me...

t = timeit(lambda: arcpy.management.SelectLayerByAttribute("Parcels", "NEW_SELECTION", "ParcelNumber = '72'"), number=100)
print(t)

Have a great day!
Johannes