Filegroups in SQL Server

KevinDunlop · ‎06-27-2017

All,

I have a questions about filegroups in SQL Server and geodatabases. Is there any performance or other benefit to have multiple filegroups for things like vector data, indexes, A/D tables compare to have 1 filegroup with many files in it? I have check the ESRI help document and search the forums but it mute on what is better. It tells you how to set the dbtune keywords to use different filegroups but not if you should use them or how many. If you go into some of the extensions such as data reviewer and workflow manager, they say to use 10 different filegroups (see Creating data files for the Reviewer workspace in SQL Server—Help | ArcGIS Desktop ).

So anyone have experience with what is a best practice? Should I use 1 filegroup with say 10 files. Or should I have small set of filegroups with 2-3 files each for things like vector, indexes, and versioning tables. Or should I use all 10 different filegroups as describe in the help doc above?

Some notes about our system. We are using SQL Server 2016. We are different drives for the data files and log files but all data files will be on the same drive. We are using synchronize AlwaysOn with 2 replicas. We will be using ArcGIS 10.4.1. Based on some tests we estimate our database will be roughly 45-60GB when everything is load. We will have versioning, archiving, and topologies in the database.

Thanks,

Kevin

VinceAngelo · ‎06-27-2017

The purpose for filegroups (back when Sybase created them in the mists of time) was to distribute data across devices within one container of the RDBMS. At the time, disk volumes were tiny (50Mb-200Mb on a workstation, 900Mb on a server), so "large" tables (1-2Gb) required multiple devices. Database admins would spend days and weeks organizing the striping for data and indexes across volumes in order to get optimal performance.

Given the size of modern devices, where the controller cache is measured in gigabytes, and the block sizes may be 64-256k, and the device itself is composed of clusters of NVRAM with submillisecond seek time, the purpose of filegroup use has languished in significance, to the point that it may not matter. It doesn't yet make sense for the RDBMS provider to remove the capability, but it certainly won't be emphasized in vendor training.

More often than not, customizing filegroups will only encourage fragmentation, hurting overall performance. Let's say you create ten data files on two independent disks (five each) for ONE large table. If the database is adding rows to block groups in round-robin order, then subsequent pages are likely to be 1/5th of the device storage away from each other, causing increased access latency over two files (one per disk). When you take auto-growth in file allocation and volume optimization into account, you're completely at the mercy of load order and lack of file defragmentation. This is likely why PostgreSQL has abandoned RDBMS control over disk access, and just uses the filesystem as-is, with a "tablespace" only being a mount point for the data to be written, and relying on defragmentation for the rest.

However, if you have TEN small tables, accessed somewhat independently, as in the documentation you referenced, then you might get a benefit, if the data files aren't too fragmented. Unfortunately, the best way to tell if that optimization technique works in your context is to try two or more different configurations and decide if the difference is worth the effort of altering the DBTUNE. If you don't want to do the testing yourself, then using the recommended configuration should probably be the default.

- V

View solution in original post

JakeSkinner · ‎06-27-2017

Hi Kevin,

I found the following help document on Geodatabase configuration and it discusses FileGroups:

Geodatabase configuration keywords and disk configuration in SQL Server—ArcGIS Help | ArcGIS Desktop

KevinDunlop · ‎06-27-2017

Thanks, that is similar information to what I have found with other extensions. I was hoping there was someone who has had experience with this that could tell me what works best for the overall geodatabase.

VinceAngelo · ‎06-27-2017

The purpose for filegroups (back when Sybase created them in the mists of time) was to distribute data across devices within one container of the RDBMS. At the time, disk volumes were tiny (50Mb-200Mb on a workstation, 900Mb on a server), so "large" tables (1-2Gb) required multiple devices. Database admins would spend days and weeks organizing the striping for data and indexes across volumes in order to get optimal performance.

Given the size of modern devices, where the controller cache is measured in gigabytes, and the block sizes may be 64-256k, and the device itself is composed of clusters of NVRAM with submillisecond seek time, the purpose of filegroup use has languished in significance, to the point that it may not matter. It doesn't yet make sense for the RDBMS provider to remove the capability, but it certainly won't be emphasized in vendor training.

More often than not, customizing filegroups will only encourage fragmentation, hurting overall performance. Let's say you create ten data files on two independent disks (five each) for ONE large table. If the database is adding rows to block groups in round-robin order, then subsequent pages are likely to be 1/5th of the device storage away from each other, causing increased access latency over two files (one per disk). When you take auto-growth in file allocation and volume optimization into account, you're completely at the mercy of load order and lack of file defragmentation. This is likely why PostgreSQL has abandoned RDBMS control over disk access, and just uses the filesystem as-is, with a "tablespace" only being a mount point for the data to be written, and relying on defragmentation for the rest.

However, if you have TEN small tables, accessed somewhat independently, as in the documentation you referenced, then you might get a benefit, if the data files aren't too fragmented. Unfortunately, the best way to tell if that optimization technique works in your context is to try two or more different configurations and decide if the difference is worth the effort of altering the DBTUNE. If you don't want to do the testing yourself, then using the recommended configuration should probably be the default.

- V

KevinDunlop · ‎06-27-2017

Thanks for the great info Vince. I am going to stick one filegroup now. Do you know if there is a benefit to have multiple files (on the same drive) in that filegroup or should I just use a single large file?

VinceAngelo · ‎06-27-2017

Again, it depends. Generally, a group with multiple files within one filesystem encourages fragmentation, but it depends on whether the blocks are being allocated round-robin or "fill one first".

- V