Select to view content in your preferred language

2 SDE DBs replication strategy

579
5
04-09-2013 01:30 AM
VaL
by
Occasional Contributor
Hi all,

We have a practical problem here with some SDE DBs replication/synchronization.

here is what we have:

Office1 has SDEDB1 with FC1 with 100 features, this is the parent DB, running on oracle.
Office2 has SDEBD2 with no FCs to start with.

Office 1 and 2 are in different countries but have some (slow speed) lan connection. between.

What happens is that FC1 in SDEDB1 doesn't get edited, i.e we don't change the number of features in it or edit features attributes.
From time to time we receive a completely new FC1.

Here are some questions:

1. If FC1 participate in a replica, and after the initial synch between SDEDB1 and SDEDB2, SDEDB2 should have the same FC1 in it. Is this the case?
2. If we delete FC1 from SDEDB1 and import another, new FC1, and resynchronize what will be sent to SDEDB2?
3. What is the best way to have SDEDB1 and SDEDB2 in synch when we substitute FCs and dont edit FCs?

Thanks.
0 Kudos
5 Replies
VinceAngelo
Esri Esteemed Contributor
ArcGIS replication uses versioning.  If your layer gets completely replaced
with each update, you should consider using a different update methodology.
Then again, 100 features is a trivial number, so the bandwidth and processing
cost is negligible.

I would recommend that if you want to use replication, that you use it per
design (edit the existing table), rather than breaking the replication by
dropping and re-adding the feature class.

- V
0 Kudos
VaL
by
Occasional Contributor
ArcGIS replication uses versioning.  If your layer gets completely replaced
with each update, you should consider using a different update methodology.
Then again, 100 features is a trivial number, so the bandwidth and processing
cost is negligible.

I would recommend that if you want to use replication, that you use it per
design (edit the existing table), rather than breaking the replication by
dropping and re-adding the feature class.

- V


Thanks,
I know replication uses versioning. Didn't mention it in the example to keep it simple.
Also for that the example is for 1 FC of 100 features. In reality it will be around 100 FCs totaling some 500 MB of data.
So it seems bandwidth might be an issue (although the link speed is ~128 KBytes/s).
Just out of interest is there something that will merge FC1 to the new FC1, passing the changes from the new FC1 to the old one? And the old FC1 to be replicated.
The only other option I see is scripted copying of FC1 from SDEDB1 to SDEDB2.
Am I missing some options?
0 Kudos
VinceAngelo
Esri Esteemed Contributor
You probably shouldn't simplify the problem statement while looking
for a replication strategy.  That complexity must be reflected in all
possible solutions.

Is your networking 128 kiloBITS/sec or 128 kiloBYTES/sec?  The order
of magnitude difference is significant.

Frequency of update is the other big factor.  Solutions which take a day
to complete are more attractive on a monthly change schedule than
on a twice daily change schedule.

Oracle supports a number of replication technologies, though not many
of them are low-bandwidth solutions.

It's certainly possible to do change detection as a application solution,
but doing it for 100 tables is 100 times more difficult than doing it
for one table with 100 times the rows.  How much time do you have
to dedicate to getting this solution working?

Change detection is as simple as using a cyptographic hash across row
sets, and identifying insert, update, and delete change events.  But to
pull it off, you need a stable set of primary keys and one state table
{key_val,hash_val} for each change-detected table, plus temporary
storage for the update table and a delta table.  I created three views
for the one table I needed to replicate, though you might get away
with queries to do the same outer joins necesary for insert/delete
detection.

- V
0 Kudos
VaL
by
Occasional Contributor


Is your networking 128 kiloBITS/sec or 128 kiloBYTES/sec? 

Frequency of update is the other big factor. 

It's certainly possible to do change detection as a application solution,
but doing ...... How much time do you have
to dedicate to getting this solution working?

- V


Hi
The connection speed is in KBYTES with Y.
Actually I have seen it even faster (~2 Mbytes/s - but I assume this is just a peak/burst), so lets consider it slow. Also I have seen very fast networks and data import into SDE in such a network still takes way more time than if you just copy the same amount of data over in windows. That is understood though.
I dont think I will need to update (and replicate) data more than once a week. In any case I think if I run the copying process from SDEDB1 to SDEDB2 over night it will be finished in the morning.

There are some other consideretion here - there are some relationship classes in SDEDB1. I cant seem to be able to copy RCs between two databases. Is this right?

I am not quite clear what your idea is about change detection - could you elaborate more. How do you implement it (python?). All the tables are fairly small, so if I get to understand what you are saying I might give it a try, regardless of timing.
0 Kudos
VinceAngelo
Esri Esteemed Contributor
I don't have the time to explain fully, but the condensed version is thus: I had
a client that was processing a million row table for several hours every night
using SQL.  Every row was updated, but often no change took place. Most
normal replication methods would progagate these changes, but that was not
acceptable, because the update bandwidth was very small (a hand-carried CD,
with a limit of 50Mb/day).  I added a DIGEST method to the 'asc2sde'  loader,
such that a query of the relevant columns (rowid plus a few dozen others)
generated a table of {rowid,checksum}, then compared the new checksums
to the old (using OUTER JOIN views), generating update, insert, and delete
text files, which could then be transferred to the second instance. I then
modified the checksum archive of the transmitted changes. There were a
number of maintenance utlities, so that the receiving system could transmit
a current checksum report to force resync, and this system operated for years
until a new processing methodology was devised.

The thing is, to implement this methodology, you'd need a checksum table
for each data table, and a number of scripts to wrap the column lists...  It
would not scale well to hundreds of tables, and with much larger bandwidth,
you can use database replication techniques.

- V
0 Kudos