Multiple GDBs vs Single GDB

mpboyle · ‎01-19-2016

Our production enterprise system is currently 1 sde gdb (Sql Server 2014) with several (45 to be exact) datasets and 9 editable versions (based on department). The sheer number of datasets makes it a little cumbersome for users to find what they may be searching for.

I was wondering if anyone has tried breaking down a large sde gdb into several smaller gdbs, perhaps 1 sde gdb per department? If so, is there is a performance advantage of having 1 large sde gdb versus several smaller sde gdbs? Is response time better for smaller gdb versus larger gdbs? Would reducing the number of versions have a benefit on performance?

I'm hoping to get some pros / cons of having 1 large sde gdb which is like a one-stop-shop for any GIS needs or several smaller sde gdbs meant for departmental needs.

Thanks!

George_Thompson · ‎02-09-2016

Here is blog that covers that information:

Simplify Your Life: Working with Schema Changes in Replicas | Support Services Blog

--- George T.

ShannonPugh · ‎01-28-2016

42 datasets

613 feature classes

JamesCrandall · ‎01-19-2016

Perhaps break it down by usage type instead (this is extremely simplified).

Publish (read-only, non-versioned)

Edit (versioned workspace)

The Publish data would be highly-available data that is not affected by versioning/editing overhead and easily consumed by data viewers and probably a better option if you intend to publish services off of the data.

The edit data would only be available to editors and internal applications that require edit and/or versioned processes.

You'd need to have some extract-load processes in place to move data between the two environments, but you'd de-couple the versioning/editing from the published data for enterprise usage. This way db admin tasks that require more involved/time consuming processes will not affect published, widely used data).

mpboyle · ‎01-19-2016

James Crandall,

We have thought about this approach as well, and already kind of have it implemented in our 1 production sde gdb. We have 'Publish' datasets that are non-versioned and read-only. These are mainly datasets that do not change over time or are feature classes that are produced from other business systems through views.

Our web side of things is a replicated file gdb on our app server. This is meant to separate our web environment from our production environment and has dramatically reduced the number of connections to our production sde gdb.

JacobBoyle · ‎01-19-2016

In the 10 years I've been doing SDE administration, I've seen pros and cons for both configurations.

In a SQL environment, I prefer multiple databases. I find that it makes everything easier for the end user to understand. It gives you an additional level of organization for data as well. I also find user management easier by only adding who needs read/write to specific datasets to only the databases they need. Everything else is maintained with a read-only group. It really doesn't have an impact on performance as long as you right-size your environment. We are currently running over a dozen Enterprise Geodatabases.

AlexanderNohe1 · ‎01-19-2016

Dan Patterson - That's why we create backups. You could then store these backups elsewhere and then load the backup on a new machine if you need to.

Matthew Boyle - I think the main advantage to having it all on one SDE server is that it is centrally located and you do not have departments with duplicate files floating around in their database. This means that the water department can modify and update their mains while the planning department can use the latest datasets from the water department within their maps. You can furthermore, set permissions on the users of the database so they cannot modify datasets that their department does not have explicit permission over. As in my previous example, the planning department cannot modify the water mains, they can only view them. This allows the water department to be in charge of the creation and management of water mains while other departments can only use them.

As far as a performance increase from one database versus multiple, I cannot speak upon this. If its all on the same machine, I would assume that it would decrease performance but I am not sure.

Hope this helps!

DanPatterson_Retired · ‎01-19-2016

Duh... I know about backups, hence there is no reason to pile everything into one in the first place (this of course assumes that people verify the validity of their backups... something I have seen "go-south" when a backup was needed)

mpboyle · ‎01-19-2016

Dan Patterson - haha, unfortunately we've been bitten by the backup validity as well

ChrisDonohue__GISP · ‎01-19-2016

Adding some more ESRI enterprise geodatabase folks in case they want to weigh in:

Jake Skinner

Vince Angelo

Asrujit SenGupta

Chris Donohue, GISP

VinceAngelo · ‎01-19-2016

Forty-five databases is quite a few. The rule of thumb I'd recommend is to focus on use patterns -- Keep stuff which is used together near to one another.

RDBMSes are designed to work well with lots of rows, but not so much with lots of tables. Databases are an excellent way to isolate the table count between objects that are never used together, but I expect there's a Law of Diminishing Returns with respect to Too Many Databases. You may find that bumping up the RAM on the database server is enough to keep up with a large pool of databases, but at some point, the additional worker processes are going to start impacting your available CPU.

Please do keep a close eye on best practices with respect to feature datasets: Using FDS as a "folder" solely for UI presentation is a burden on your geodatabase. There are a number of places where Desktop performance is linked to O(number_of_featuredatasets) and O(number_of_featuredatasets^2). I realize that presenting oodles of tables can be a burden as well, but clever naming policies can handle many of the "flat" presentation issues.

- V