I don't have the time to explain fully, but the condensed version is thus: I had
a client that was processing a million row table for several hours every night
using SQL. Every row was updated, but often no change took place. Most
normal replication methods would progagate these changes, but that was not
acceptable, because the update bandwidth was very small (a hand-carried CD,
with a limit of 50Mb/day). I added a DIGEST method to the 'asc2sde' loader,
such that a query of the relevant columns (rowid plus a few dozen others)
generated a table of {rowid,checksum}, then compared the new checksums
to the old (using OUTER JOIN views), generating update, insert, and delete
text files, which could then be transferred to the second instance. I then
modified the checksum archive of the transmitted changes. There were a
number of maintenance utlities, so that the receiving system could transmit
a current checksum report to force resync, and this system operated for years
until a new processing methodology was devised.
The thing is, to implement this methodology, you'd need a checksum table
for each data table, and a number of scripts to wrap the column lists... It
would not scale well to hundreds of tables, and with much larger bandwidth,
you can use database replication techniques.
- V