Filter Data by Database View or Definition Query for better performance

1258
3
Jump to solution
12-22-2022 12:56 AM
ChristianJungnickl
New Contributor II

Hello,

I am currently looking into factors influencing the performance within our organizations geodata infrastructure (ArcSDE in SQL Server, ArcMap 10.6).  Looking through our ArcMap Projects, I found that we exclusively use definition queries when filtering data from database tables.

Now I wonder, if it would make any difference filtering the data prior loading into ArcMap by creating database views. Does anybody have an answer for me whether we can expect gain in performance (e. g. loading the mxd document) when using database views instead of definition queries?

Happy Holidays Everybody

0 Kudos
1 Solution

Accepted Solutions
Bud
by
Notable Contributor

Assorted uneducated thoughts:

 

A while back, I opened a WORKORDER table from an external COTS system in ArcMap. The table had over 200 columns. I was surprised how slow the table was in ArcMap, despite the fact it only had a few thousand rows.

I remember getting better performance when paring down the columns to just a handful of columns. Either in a view or a query layer. So paring down the number of columns ahead of time might be something to consider. (Definition queries don’t omit columns; only proper queries like views or query layers can omit columns.)

 

Idea: Don't SELECT * when fields are disabled in properties


Also, a view could make use of proper joins. Whereas I’ve wondered if “joins” in ArcGIS use subqueries, not proper joins; proper joins are often faster.

 

And then there are attribute indexes. I don’t know if indexes would improve performance in SELECT queries with WHERE clauses or not. But I think indexes would help with other things like JOINs. See: SQL Performance Explained


You could also make sure that your WHERE clauses are structured to be as performant as possible. For example:

  • LIKE is slow. =, IN, EXISTS are faster.
  • If a column is wrapped in a function such as UPPER(), then an index can’t be used when querying that column.
  • Subqueries are slow.

 

In newer versions of ArcGIS Pro, you can see what queries are being sent to the database by looking at the Diagnostic Monitor. That might help you troubleshoot slow tables.

 

It would be nice if we could control how a table or layer gets cached in ArcGIS Desktop. For example, if a table is slow in ArcGIS Pro, tell Pro to cache it in RAM and only refresh it every n minutes. And refresh it in the background; don’t make me wait. That might be better than refreshing the table after every click the user makes in the map or attribute table. Idea: Control caching settings for better performance in event layers & query layers

 

You could also look into compressing versioned data, refreshing database statistics, spatial indexes, and Geoprocessing, Resolution, Tolerance, and Hair. Also, it’s occurred to me that organizations might see performance improvements after upgrading their enterprise geodatabase to a newer database version. That might “refresh” the data as a last resort. But I don’t have expertise in any of those things.

 

You could look into Indexed Views in SQL Server. https://stackoverflow.com/questions/3986366/how-to-create-materialized-views-in-sql-serverOr other ways of pre-computing queries, like exporting a subset of a table on a schedule. And use that exported table in ArcMap instead, which might be faster.

 

Or even a simple data cleanup exercise might help. Delete old and unneeded columns.

 

Edit: 

Scale ranges for layers and labels might help too. I.e., make layers or labels only visible at certain scales -- some layers/labels might not be needed when zoomed out or zoomed in.

 

For polygon and polyline layers, consider removing superfluous vertices via "Simplify" geoprocessing tools, etc..

 

A few posts related to automation:

I haven't actually implemented that stuff. I was just looking for clean ArcGIS automation solutions as a non-IT person.
I haven't really found what I'm looking for yet.

View solution in original post

3 Replies
Bud
by
Notable Contributor

Assorted uneducated thoughts:

 

A while back, I opened a WORKORDER table from an external COTS system in ArcMap. The table had over 200 columns. I was surprised how slow the table was in ArcMap, despite the fact it only had a few thousand rows.

I remember getting better performance when paring down the columns to just a handful of columns. Either in a view or a query layer. So paring down the number of columns ahead of time might be something to consider. (Definition queries don’t omit columns; only proper queries like views or query layers can omit columns.)

 

Idea: Don't SELECT * when fields are disabled in properties


Also, a view could make use of proper joins. Whereas I’ve wondered if “joins” in ArcGIS use subqueries, not proper joins; proper joins are often faster.

 

And then there are attribute indexes. I don’t know if indexes would improve performance in SELECT queries with WHERE clauses or not. But I think indexes would help with other things like JOINs. See: SQL Performance Explained


You could also make sure that your WHERE clauses are structured to be as performant as possible. For example:

  • LIKE is slow. =, IN, EXISTS are faster.
  • If a column is wrapped in a function such as UPPER(), then an index can’t be used when querying that column.
  • Subqueries are slow.

 

In newer versions of ArcGIS Pro, you can see what queries are being sent to the database by looking at the Diagnostic Monitor. That might help you troubleshoot slow tables.

 

It would be nice if we could control how a table or layer gets cached in ArcGIS Desktop. For example, if a table is slow in ArcGIS Pro, tell Pro to cache it in RAM and only refresh it every n minutes. And refresh it in the background; don’t make me wait. That might be better than refreshing the table after every click the user makes in the map or attribute table. Idea: Control caching settings for better performance in event layers & query layers

 

You could also look into compressing versioned data, refreshing database statistics, spatial indexes, and Geoprocessing, Resolution, Tolerance, and Hair. Also, it’s occurred to me that organizations might see performance improvements after upgrading their enterprise geodatabase to a newer database version. That might “refresh” the data as a last resort. But I don’t have expertise in any of those things.

 

You could look into Indexed Views in SQL Server. https://stackoverflow.com/questions/3986366/how-to-create-materialized-views-in-sql-serverOr other ways of pre-computing queries, like exporting a subset of a table on a schedule. And use that exported table in ArcMap instead, which might be faster.

 

Or even a simple data cleanup exercise might help. Delete old and unneeded columns.

 

Edit: 

Scale ranges for layers and labels might help too. I.e., make layers or labels only visible at certain scales -- some layers/labels might not be needed when zoomed out or zoomed in.

 

For polygon and polyline layers, consider removing superfluous vertices via "Simplify" geoprocessing tools, etc..

 

A few posts related to automation:

I haven't actually implemented that stuff. I was just looking for clean ArcGIS automation solutions as a non-IT person.
I haven't really found what I'm looking for yet.

ChristianJungnickl
New Contributor II

Thank you for adding your thoughts!

The ability to omit columns when using views I didn't even consider yet. I definitively have to look into that aspect. 

ChristianJungnickl
New Contributor II

So far I only considered the (to me) more obvious factors influencing performance like RAM, number of layers, complexity of symbols, labeling properties and few others. 

We also run a weekly script recalculating table statistics and rebuild indexes. But other than that I never questioned our current data management setup, which is centered more within ArcMaps Definitions Queries. I am going to try out indexed views with additional pre-selection of only needed columns before loading data into ArcMap. These measures seem promising to me concerning gaining performance in our GIS projects (which include roughly 180 layers with each having at least 30 columns).

Thanks a lot for your insights Bud, it opened a new perspective for me