Sychronising data between Hosted Feature Services and File Geodatabases?

12-04-2022 02:57 AM
MVP Regular Contributor

We're looking for ways to synchronise Hosted Feature Services with our main file server geodatabases. One approach may be to do away with hosted and use lots of Enterprise GDBs. However, I'm wandering if anyone has come up with a novel way to synchronise the Hosted data with existing File GDBs within our projects. 

The aim of the game is one-truth data management, so we're reluctant to operate with a Hosted environment due to other geodata requirements across the business.

..Maps with no limits..
0 Kudos
3 Replies
MVP Esteemed Contributor

We get away with all but 2 of our services being hosted copies of authoritative data elsewhere, and we never have to really think about it. The sync process relies heavily on the ArcGIS Python API. Here's how we do it:

  1. Layers
    1. Source Data
      1. Can be file-based, some service, a DB, etc
      2. Has editor tracking enabled
        1. Can work without, but is much easier with it
      3. Has globlaIDs
    2. Destination layer
      1. Hosted in Portal / AGOL
      2. Has a "sourceGUID" field
    3. Auxiliary layer
      1. Table names
      2. "last updated" timestamps
  2. Scripts
    1. Timestamp based
      1. For a given table, pulls the "last updated" timestamp from auxiliary layer
      2. Queries source data for features edited since that timestamp
      3. Queries destination layer for features whose "sourceGUID" field matches any globalID from the source query.
      4. Merges destination globalid / objectid with source attributes
      5. Pushes source attributes back to destination layer, editing features in place
    2. Comparison based (when editor tracking is not on source layer)
      1. Query entire source layer to dataframe, set globalid as index
      2. Query entire destination layer to dataframe, set sourceGUID as index
      3. Use pandas library compare function
        1. Identifies rows with attribute edits
      4. Compare indices
        1. Identifies rows in source not in destination (adds)
        2. Identifies rows in destination not in source (deletes)
      5. Submit adds, updates, and deletes to destination layer

It sounds like a lot, but at the end of the day, you only have to edit features that actually were edited in the source. We keep a number of layers with 100k + features in them up to date with their sources nightly, and the process takes less than a minute per layer.

I recently presented on this topic at a regional GIS conference. You're welcome to look at the notes here:

It can be complex to set up, but the end result is totally worth it. You could take it a step further and use the compare function to identify not just rows with edits, but columns as well, further paring down the data exchanged during the sync process.

- Josh Carlson
Kendall County GIS
0 Kudos
MVP Regular Contributor

have a look at this: 

Scott Tansley
0 Kudos
MVP Regular Contributor

@jcarlson really nice approach that. As you say, complex to set up, but ultimately a very elegant solution. im going to weigh up the options (to EGDB or not to EGDB) before delving into this one

..Maps with no limits..
0 Kudos