Compress & statistics - best practices again?

MartinAmeskamp · ‎10-08-2014

Hi, we (and our customers) have been using versioned databases, mainly on Oracle, for a long time now, and we used to employ and recommend a pattern of analyze - compress - analyze, with a heavy emphasis on the benefits of a full compress. The pattern of version usage varies, some sites use work order versions that are created and edited by one person, and then checked and posted by another, so there's typically a backlog of versions (e.g. 100-200) around to prevent a full compress. Other sites use "personal" versions that are frequently reconciled, posted and reused, and then deleted on a Friday afternoon to achieve a "clean" database for the weekend. In many cases, SDE administrators check the number of states after compress, and if it is significantly greater than 1, they get rather nervous. Lately, replication has become more common, and the hidden replica versions make admins anxious because achieving a full compress gets increasingly difficult (there's that very helpful white paper "Compressing ArcSDE Geodatabases That Contain Replicas" that describes the situation and what to do about it).

Now, I find that a full compress, followed by updating statistics, may be harmful rather than helpful: Oracle believes that the delta tables are empty, and while they are indeed empty, this is fine, but in the course of an editing day, they start to accumulate data, and while statistics don't reflect this, access to delta tables may well be suboptimal. A typical example is the SDE_STATE_ID index being used rather than the SDE_STATE_ID, OBJECTID one, even though both values are available for use. Yes, Oracle will detect bad execution plans eventually and realize that the tables aren't empty any more, but that may take some time.

So, my recommendation would be to go one of two ways:

If you want to achieve full compresses, make sure your statistics lie and reflect populated delta tables: Populated statistics for empty table are a lot less harmful than empty statistics for populated tables. The easiest way of doing this is to analyze delta tables in a populated phase and then lock statistics.
The other way is to forget about full compress and to strive for a "stationary" database where there's always a reasonable number of versions around, and delta tables always have some data in them. Of course, the usual maintenance operations, such as soft reconcile scripts and most of the best practices of the white paper mentioned above still apply - you don't want "old versions" around that haven't been reconciled for a long time and prevent a lot of edits from being compressed.

Any thoughts or experiences about this?

Thanks a lot, Matin