Hello, I have a bit on a conundrum. I have asset GIS data sitting in an ESRI Geodatabase, and the same data sitting separately in an asset management database. My job is to keep both databases in sync with each other. It has been some time since the initial copy was done so there are some differences in data and they have fallen out of sync. The asset management database is the point of truth for attribute data, with the GIS database being the point of truth for geometry data.
The ultimate goal is to have this data displayed on web maps, so when changes are made in the asset management database a daily scripted process using FME will write this attribute information to the ESRI table.
There are a number of ways I am considering approaching this. One such approach is creating a new dataset from the spatial data with only the primary key, and performing a table join with the attribute data coming from the attribute management database as an imported CSV and publishing these joined datasets as a map image.
Another method could be using a change detection in FME and updating the ESRI table when changes are made in the asset management database.
I am sure there are many options, but I am unsure with how best to approach this, and if anyone has any information or advice it would be greatly appreciated.
"Best" approaches don't generally exist, but storing only geometry in one database, and everything else in another, then linking between them in real time is an anti-pattern that approaches worst case.
The least-worst case solution set has all the data for each table in one place. There are many ways to get there, but they all require more details than is really appropriate for a public forum (and often involve multi-month implementation contracts).
I have implemented change detection solutions that use a hash of the data row contents to identify which records have changed over time, so that only the records which need update are updated, allowing me to keep table collections exceeding 160 million rows in sync with 20 minutes of processing for 200k-800k (often redundant) change messages a day (ironically, the system generating the change messages takes 4-5 hours to identify the change candidates, and most of those 20 minutes are due to transmission delay across a wide area network).
Suffice it to say that "maintenance" and "publication" databases are in your near future. If you calculate a reasonably secure hash during data ingest into a staging table, and preserve the hash of the existing data as you load it, you can drive a simple UPDATE statement into the publishing tables in nearly no time, without any publishing downtime.
- V
I just recently did a project like this between a customer information system and GIS using FME. I used the CRC Calculator for change detection and then deleted, inserted, and updated my target feature class in GIS. Then I set the FME workspace up as job on a server that runs every night.
https://docs.safe.com/fme/html/FME-Form-Documentation/FME-Transformers/Transformers/crccalculator.ht...
In another project, with Cityworks and GIS, I kept the only the spatial data in GIS and then joined the attribute data with a Oracle database link to the non-spatial system. This worked okay too, but it is a little show and clunky to work with.
Great to hear you have used this method before, I am thinking of going the change detection route for speed rather than the overhead from a joined table.
What was behind your decision for using the CRC calculator rather than just the general ChangeDetector transformer in FME?
I did try the ChangeDetector but it was a little bit unstable and slow with my data.
So, in the end I went with the CRC calculator. I did have to sacrifice updates on my main target table (I have a few derivative tables that do get updates basted on the output of other transformers). So, if there was any change in a row it just deletes it and then inserts it back.