Hello,
There is a problem I keep experiencing repeatedly when it comes to serve customers in my organization.
I find it very confusing to balance the way our geographic data is managed and then published to different needs.
Three common stored data types for layers we use are:
- File system based - inside FGDB files.
- SQL based - SDE layers.
- Portal based - an internal Mongo DB data which is unmanagable at this point.
So we decided to have all of our business data, which is considered to be more dynamic than other layers, to be stored inside our SDE.
Some layers we receive once a year from an external source and they are all stored in an FGDB.
From time to time, some employers request different geographic analysis and as an output, they receive a "temp" layer which is stored in our Portal.
We published a few map services which were divided into different information groups to represent different data layers and when applications are made to customers, the layers can be consumed either through those "chuck" services that holds the required layer, or as an individual layer through the Portal after linking it to a layer item.
So each time a client request a few layers from the DB the same question rise..
Should we publish another service to hold he's layers all together and consum a lot of valuable CPU resources and duplications beteween services??
Should we just gather seperated layers as items from the Portal to a Webmap without having them sorted as groups??
Should some of them be consumed as an FGDB file??
Is there a clear way to manage the data and avoid duplications?
Are there any other organizations that experiencing this and can share their wisdom?
Thanks,
Shay.
My 2c but I am not an expert, just know a few experts .
If feature access is not needed (ie web editing), we keep everything in a fgdb. Even if the data is in SDE and is getting edited via desktop, we use scripts on a regular basis to copy it across. If a layer is to be edited via a webmap/app, then it stays in the enterprise db.
You correctly state that it is the service itself which is expensive on server resources. So we try and minimise those.
So there will be a set of base services (usually focused around the business groups), but also the individual layers added as separate items. That way you can always throw a WebMap/App together using these as the source.
Thanks Neil, it's just gets all messy so easily when clients starts requesting layers in an application and quickly after that comes the request for groups.. which requires a new service..
So all your services reads FGDB files and you try to minimise the ammount with just several group services? Pretty much what we're trying to maintane only that we decided to have the SDE as the layers storage, rather than FGDB files.. Why would you prefer having it all as FGDB if you don't mind me asking? it supports editing and provides faster performance from what I know
The main reason for the fgdb is because its faster. That's why most of the data gets copied across.
Editable layers (feature access) have to be in the enterprise SDE. And we try to group all these into one service for publishing, then use the individual items inside the various WebMap/Apps.
But, yes, it can become messy. There is lots of room for improvement in Portal. It would be nice to be able to share just the WebMap/App to a certain group without exposing the underlying feature layers as well.
We utilize a format very similar (identical?) to Neil's. I'll lay it out in some detail in case it helps your thinking.
Our enterprise geodatabase (EGDB) is composed of Feature Data Sets (FDS) that delineate various groups of data.
These FDSs then are typically published as group layer in an mxd that is then published as a map service.
The mxd is pointing, like Neil, at a file.gdb up on the ArcGIS Server.
This allows for speed and guarantees no data corruption of the EGDB and keeps the data read only.
(Feature Services are another game and require an EGDB as pointed out by Neil. In our case, we want these for offline work and the data can remain read only. So we're replicating to a secondary read only EGDB where we then creating Feature Services.)
We are in essence duplicating our EGDB into a file.gdb with Python scripts.
We've been at this for over a decade so pretty much all of our layers are already into the file.gdb and follow the same FDS layout. And almost all the layers are on mxds and published via a map service.
We do have two large file.gdbs that are published. One is all of our assets (pipes, valves, etc...) and the other is all of our 220,000+ service points (or meters.) These are split this way basically due to how we obtain and process the underlying data and its data sources. We are considering splitting our large asset file.gdb into smaller separate ones. e.g. one for water assets, one for wastewater, etc... We think this might make sense with Portal.
I think a big key here is proper use of the FDS to keep data organized.
We also deal with a lot of external data from related tables that is typically brought in and updated via links/views using scripts and arcobject code. There are numerous tables and even Feature Classes that are not inside a FDS.
But the more organized you can keep your data, the easier it is to keep your map services well defined and organized.
But there are still new layers or unpublished layers that occasionally come up.
Typically though they fit into our existing FDS layout.
So adding them to a published map service is nothing more than adding them onto an mxd and republishing it.
We also have a lot of data from other local entities that is scripted and stored in our EGDB. Since we don't edit this data, I am considering pulling it out of the EGDB and keeping it in file.gdbs for both the servers and the local data editors. The sticking point is the large number of existing mxds that would need modification.
I think that having a standard set of map services is what you are looking for to keep things organized. Don't create a new map service when someone wants a new layer. Just add it to an existing map service; unless it's a new layer that doesn't fit your existing layouts. We just had such a case so we had to create a new map service which is publishing previously unpublished data from an existing FDS.
At the end of the day, it's really all about the layout of your EGDB. If you have a well organized one, then the organization of your published services should flow from it.
I have heard others describe the Data Store as sort of a wild wild west approach to a database and think that's a fairly good analogy. Since it's a black box to us, we're leaving it for end users to create and publish their own smaller unique data sets. Any data that is considered to be critical to the organization belongs in our EGDB as the system of record.
Our Portal is new and I have been struggling with how to provide our users to access to our normal data sets.
I could tie into our existing 10.1 ArcGIS Servers (AGS) but have decided that the best long term way to go is to create another Federated AGS that is part of the Portal site. The purpose of which is to publish our standard map and geoprocessing services that are then made available to Portal users.
I think that the only thing I could add to that very detailed account is that if you add another layer to an existing mxd / map service and republish, just make sure it gets added at the bottom of the toc. You have to move it down because the default is to add it at the top. Otherwise all the layer pointers to your layers / feature items etc will get highly messed up in your map services.
That is an excellent and critical point to make Neil!
And is also one of the hurdles of republishing Feature Services.
If you have moved or renamed them on a map/web app, and then republish the FS, things go back to the original setups.
I believe that issue is being cleaned up with the 10.5 release. (I may not have described the FS issue all that well.)
I actually got on here to describe an alternate format that I have seen used by a GIS group that was really very good but got sidetracked. 😉
This organization had started out with a Microsoft SQL EGDB. But this is fair bit of overhead and cost in licensing, hardware (even virtual) and personnel. You now need a server (or 2-3 for Dev & Test), a DBA and one versed in GIS and SDE, etc... This group was finding it was a fair bit of work to deal with the EGDB. The manager also noticed that the editing of the individual Feature Classes (FC) were all typically being edited by single users.
The manager made the decision to go away from an EGDB and so they split the data out into individual file geodatabases that were owned by the individual editors. When an editor has a new data release ready to go, they copy the file.gdb into a staging folder and set a semaphore file. There are then scripts running in the background that see the semaphore pop up which triggers a script to run and copy the staged file.gdb into another staging area where the data is QC'd. When the QC is successful, another semaphore is set and that night, a script sees the new semaphore and the now QC'd data is then copied into and onto the production ArcGIS Servers.
Obviously what makes this scenario viable is if your data sets can be setup in such a way that they are owned and edited by a single user. I guess there are ways to deal with multi-user file.gdbs but if you're going that route, you should be on an EGDB (seems to me anyway...)
I've explained this scenario to some long term GIS folks who are very suspicious of it (understandably.)
And I think many folks think it might work for smaller organizations but not a big one. However, the group that was using this method was a very large county with some very big data sets.
But again, at the end of the day, it was all about how the underlying system of record is laid out.
Thanks for all those great insights from both of you very interesting and surprising I must say!
I didn't expect to hear that the FGDB method would be faster than having the data in an EGDB.
That's mostly because at our organization we use only VM's and I don't have any praises for our network bandwith quality when it comes to read/write.
Also, the first thing that comes up in my mind with that FGDB method is obviously the concern of serving multiple users trying to read from the same resource... But you mentioned a big enterprise that already took that approach and finds it good so I can't argue with that..
Another con that was mentioned already is the duplicated data across servers.. If you get better performance I guess it might be worth it.
Regarding Neil's critical point on the layer's id pointer when adding to an existing mxd project - I saw that ESRI enabled an option to set a permanent id for layers in an MXD project. Might be available only on version 10.3+ tho.
So I understand we all share the same problem of keeping track on so much data across different storage places? having it in EGDB, FGDB, DataStore/Blackbox and publishing them as map service/s and/or Portal items...
I was thinking on creating a logger script to scan all services with their layers contained to report back for duplications, changes, do a portal items verification and such... some kind of a maintenance script.
Until we have such script, we pretty much relay on the person who's incharge of creating and publishing all of our geo data either as a new service, or another layer in an existing service among many and it's all documented as a Sharepoint list.. Kind of like the wild wild west Paul mentioned but with a frustrated sherif.
Shay.
Hi Shay:
To comment and clarify: the FGDB is typically faster because it's usually setup as a file directly attached to the server. I mean, it will sit somewhere on the root drive of the server so it's being accessed as a local file. Of course in a VM world, this can mean something different. Our VMs are able to access our shares at backplane speeds or close to anyway.
An EGDB is typically accessing across a network with both the network transports in the way as well as the dB drivers, etc.. You then have at least 3 potential areas of delay, the network, the dB driver and the dB server (and probably some other delays I'm not considering.)
With multiple users reading a FGDB, via a server, you're accessing the data via the map service which typically has caching going on. In our case, we put a copy of the FGDB onto each server. At one time I was considering a file share in order to just one copy of the data but our Esri rep convinced me that a copy per server is the way to go for speed.
As I mentioned, I have been considering moving data from other sources out of the EGDB and into a FGDB scenario for our desktop users. And I too have wondered about the access. But like you point out, the county is (or at least was, not sure of current status) accessing the FGDB data for the desktop users. I believe that was around 30 concurrent users.
There were some issues when they first went to FGDB but that was stuff related to NETAPP and drivers, etc...
Eventually it all worked out and the response times were quite good. The external data published to the public is also via FGDB, map services and a website app built with GeoCortex.
I think a lot of how you setup to handle this is based on organization structure, # of users, etc... Some places have distributed GIS groups while others have one centralized group (or maybe like us, two centralized groups, the GIS Analyst side and the IT side but we work together.)