Our production enterprise system is currently 1 sde gdb (Sql Server 2014) with several (45 to be exact) datasets and 9 editable versions (based on department). The sheer number of datasets makes it a little cumbersome for users to find what they may be searching for.
I was wondering if anyone has tried breaking down a large sde gdb into several smaller gdbs, perhaps 1 sde gdb per department? If so, is there is a performance advantage of having 1 large sde gdb versus several smaller sde gdbs? Is response time better for smaller gdb versus larger gdbs? Would reducing the number of versions have a benefit on performance?
I'm hoping to get some pros / cons of having 1 large sde gdb which is like a one-stop-shop for any GIS needs or several smaller sde gdbs meant for departmental needs.
A great question Mathew, and one that I believe is pertinent to many of us - thanks for asking it, I've learnt a lot from reading this series of posts.
We're a small to mid size city and have multiple SDE GDBs within an Oracle environment. These are split into public works, planning etc. We don't have many desktop GIS users but that being said even us within the GIS department find it difficult to find certain data tables. With that in mind we're moving back to a single SDE GDB with a backup/testing SDE GDB but with adherence to a new naming policy that promotes logical, obvious and clear names for our data. Specifically datasets will only be used for networks and the like however I'm considering using them for project data too (ie periodic projects where the data is project specific). Additionally, use of the metadata for clearly describing the data and adding clear tags aid in returning appropriate results via the Search tool.
FYI, an ESRI rep had informed me once that it was ESRI's original idea to have the local government model within one GDB however most organizations adopted a multiple GDB approach and ESRI has been supportive of it.
A last note, within Oracle we can create an Oracle user and then issue rights to layers within multiple SDE GDBs. This allows us to issue a single SDE connection file and the use can access the multiple GDBs through it.
Hello Matt, there are lots of great comments on your topic - good question!
For making feature classes easier to identify or find, we use the ISO codes. So for example, i18_CalTrans_MjrHwy_15, where the i18 (transportation ISO), CalTrans (author of data), MjrHwy (subject), 15 (year of publication). And then, if additional resources/information are needed, you can create a dictionary/catalog of your data which would include metadata and use restrictions.
Good luck and be sure to provide your method (if not proprietary, of course).
Perhaps break it down by usage type instead (this is extremely simplified).
Publish (read-only, non-versioned)
Edit (versioned workspace)
The Publish data would be highly-available data that is not affected by versioning/editing overhead and easily consumed by data viewers and probably a better option if you intend to publish services off of the data.
The edit data would only be available to editors and internal applications that require edit and/or versioned processes.
You'd need to have some extract-load processes in place to move data between the two environments, but you'd de-couple the versioning/editing from the published data for enterprise usage. This way db admin tasks that require more involved/time consuming processes will not affect published, widely used data).
Good luck and be sure to provide your method (if not proprietary, of course).
I am pleased with the single gdb local government model and I would recommend it. We have 29 datasets, 500+ feature classes, and 70+ stand alone tables/relationship classes. Logical naming prefixes make it easy to browse feature datasets and feature classes for departments and divisions (e.g, WaterDataset, SewerDataset, FloodDataset). Versioning and privileges are also easily managed at the feature dataset level. I am sure to have redundant backups through 1. SQL Maintenance Plan and 2. a nightly python script copying sde to a local file gdb. I have a backup validation process in place as well. I do keep static data such as aerials and DEMs in a separate sde database on a less frequent backup schedule.
In SQL Server I have found that performance is based more on the data storage type and version of SQL you are using (SDEBINARY vs SQL GEOMETRY), but that's another topic: Poor performance with SDE on SQL Server 2012.
Just my two cents... good luck.
Thought you might be interested in our experience.
We have been managing an enterprise geodatabase (Oracle SDE) since SDE v3.0 which has grown to cater for 9 Queensland (Australia) state government departments with > 2000 datasets. As a result of relacing the database hardware in 2014 we undertook a complete redesign of the management structure within the DB. Our original plan was to use multiple SDE schemas within a 'parent' SDE but there were several issues which made this unacceptable in our situation. We then considered multiple databases with there own SDE however the number of DB's required meant there would be resource issues on the DB server (even though it is an 1/8th rack Exadata).
To give you a feel for the size of some of these datasets, the Queensland Cadastre and drainage datasets each contain > 3 million records.
Where we ended up was:
A read only query SDE database
A read/write update SDE database (with replication to the query database through python/manual processes)
A read/write application SDE database (provides r/w access to web based applications)
A read/write imagery SDE database (contains the mosaic datasets for all of our georeferenced imagery and elevation models)
We use schema 'owners' to identify which department/group has responsibility for the datasets and roles (viewer & editor) for each schema as well as a general 'viewer' role to manage the datasets. All of our web services run off the query DB and are published using AGS to the DMZ using an F5 gateway and token security.
Have there been issues? Plenty, but the major one has been providing an enterprise search mechanism which makes it easy for people to find what they are chasing. Even though we employ a naming convention (which not everyone follows) and require fairly strict metadata (which again not everyone follows) it is fairly daunting to search through >2000 datasets. We attempted to use the enterprise search functionality of AGS however this bypasses our security model and means anyone using the search connects to the DB as the 'search user' which is not desirable in our situation. We are now working towards using GeoPortal to provide a search mechanism. Once we are happy with the standard of the metadata we may integrate these as secure records in our enterprise GeoPortal (http://qldspatial.information.qld.gov.au/catalogue/custom/index.page ).
Hope this is of some help and we are more than happy to share our experiences.
I have at the moment 2 "read only" GDBs in SQLSrver 2012 (University setup with more than 100 Desktops users, all connecting to this common datasource through layer files). One containing around 1000 featue classes and the other around 250 raster datasets. The problem is that the connection time seems to increase to a level where it annopys the users. To make a connection to GDB containing 1000 feature classes takes up to 30 seconds. This has forced us to split our GDB into several smaller databases contaning less than 300 feature classes. the connection time is then decreased to less 4 seconds.
On a side note, it seems many of us keep rasters in the gdb.
Since arcgis started to use gdal, we have abandoned the raster-in-db and started using .vrt mosaics that point to the individual files, all kept in a file share.
This is as good as a database, because we can update the individual files when needed. The only drawback is still the rebuilding of pyramids. We have not solved this. If a single image file is updated we have to rebuild the entire pyramids file (.ovr). In a GDB we can update a portion of the raster dataset and partially rebuild the pyramids. On the up side, we rarelly need to update files. Luckilly, what really happens is when get new imagery we usually get it in a new year/date and create a new mosaic, so rebuilding pyramids is really a non issue.
The advantages are there is no rdbms maintenance nor server resources allocated to using raster data. Disk access and network bottlenecks still apply just as with raster-in-db. And backups are easier?
I am surprised so many people are storing the rasters in SDE vs building caches. I brought our imagery catalog down from 600 gb to 60 gb with a 10x increase in speed and no quality loss. Building the caches did take 100+ hours of processing time but it is only once a year. Also caches are so much easier to distribute to the 10 remotes offices. No changing paths just copy and paste. Also easy upload to ArcGIS Online. For updates the import tool can be used so hopefully no need to rebuild it all.
We use Feature Datasets also but esri guys told us flat is better. We run a default and edits version where edits is the current live set then each week i compare default to edits looking for any major mistakes (someone deleted 500 records for example). I have an excel pivot table that shows me record counts for every db on every server so easy to check. If it all looks good I run weekly rec, post, compress, rebuild analyze. So a weekly mini QA.
The main pain with a default and edits version is having to use Set Current version in sql all the time. And some programs can not link to a version other than the default which causes issues.
For access we store the sde connection files in a common network location (no need to install for each user as mentioned). Also use layer files but they break a lot. We also store our toolboxes on a network drive which has been a life changer in keeping everyone using the most current toolbox. Note this can be done for python addins also.
We also run backups out to a geodb which works well. I also build a script to backup our Hosted FS to a local copy using ArcREST. (note there is a bug in this so no backup over 32,000 records).
We are considering going to one or 2 big DBs versus the 10 we have and then just replicating based on office extent. That way not too much is getting replicated out to the offices. We have 10 dbs with 10 FDs in each with 5-7 in each of those FDs. Speed is not great but also server is old.
Also caches are so much easier to distribute to the 10 remotes offices.
I have to second this statement. That is how I distribute to remote office with questionable internet connection too. No need to go thru the service once the cache is created (for those offices....local users have the option).
Doug, question.... I have tried storing my connection file in a local network drive, and have tried embedding them in a python addin, which when moved to another machine give the connection a long/nasty strange name in the users profile folder. This is a probably a non-common use of a the connection file.....but wondering if you ever ran into that (not wanting to highjack this thread, so if you have, I can break this off to another discussion).
I build my caches with ArcMap/scripts not with Server. No need to even have server. But yes they have been a life saver. Esp in our 1 inch imagery. I reduced the size 97% with no quality loss. Getting 2 TB everywhere would have been a nightmare.
Yes we have seen weird strings get into the mxd. Even adding from the same connection file twice can result in 1 pointing to the file and 1 pointing to a weird long string. We have never found any rhyme or reason to this. Not sure why you say you copy them to a local machine though. We just leave them on the network where everyone can hit them - for example \\ad.you.com\office\connfiles (which is added as a folder connection in catalog) then we have a folder for Edits and for Default versions. Also we point all our python scripts and toolboxes there. That way if something changes we can change them in just one spot. (but of course the mxd will not change).
I have also pointed to them in a python addin no problem. Maybe you have issues because you are copying them over vs pointing to the network and the new machine can not see that path? I would avoid copying anything ever since then you have a million copies all over the place. We have been going to a single network spot for everything - even over a WAN. So a user in another office actually points to a toolbox on the main office network - we have seen no speed issues with this at all. We do this for add-ins also - point to one central main office network location for all 10 offices. We do however have connection files local at each office since they point to the local office sql server.
Hope that makes sense hard to explain it in text.
Hi Doug, thanks for your thoughts on the issue. I think the long connection file name was coming into play when I would deploy the addin and run one of the tools. I was pointing to a common network location when I built the addin, and the connection was a nice, user friendly looking name. Come to think of it, I think it has to do with using the connection file as a "default" in my tools parameter GUI...not so pretty when the user ran the tool. Instead, deploying the addin to the new machine put it in their Appdata/Roaming and/or with the addin files themselves. Anyway, I moved on and that will be a discussion for another time, I think (and maybe a new thread in the Python area).