Postgres pg_dump backup strategy separating .__attach* attachment table data from other GIS data?

CalvinHarmin · ‎07-27-2025

I am new to PostgreSQL. I am migrating our production GIS data from SQL Server to Postgres. I have completed all of the testing I can think of to ensure our migrated data will perform exactly the same. Everything looks good.

I am now at the step of ensuring I have a solid backup strategy in place before doing the 'true' migration of all our data and republishing all the things, etc. Our standard backup policy is to keep 2 days of SQL Server database backups locally (number of backups per day depends on the database in question), but we keep 90 days of backups offsite, and actually that is even duplicated to another offsite backup location as well.

These offsite backup locations do de-duplication of data, but de-duping will not work on compressed data as far as I understand.

My enterprise geodatabases contain many GB of feature class attachment data (mostly images), that are important, but I am thinking of backing up the attachment data on a separate schedule from the other data. My current backup strategy is to pg_dump my egdb database(s) to a directory format (compressed by default), so that I can pg_restore using the necessary steps/order of operation properly for edgb restoration. My testing so far shows that this process works correctly, and certain egdb functionality/ownership is broken when these steps are not followed.

Anyway, I am thinking of having two separate pg_dump schedules, one that does not include attachment data and one that only includes attachment data. For 'no attachment' backup, this uses the pg_dump argument

--exclude-table-data=*.*__attach*

Since all attachment tables include "__attach" (leading double-underscore), this seems like a safe way to filter out the data from any attachment tables owned by any schema in the database. This includes the attachment tables themselves, which I assume is important to preserve the table relationships, but does not include table data. My testing so far shows this works correctly; the *__attach* tables exist, but have no records.

On the other side, the backup that only includes attachment tables + data uses the argument

-t *.*__attach* -a

Where -t specifies only tables matching this pattern should be dumped. Additionally, -a specifies that only data from these tables should be exported. This also seems to work as expected; pg_restore recognizes this as a data-only dump and does not indicate errors where the tables already exist in the database; it just loads the data.

Does anyone have experience with something like this, and know of any concerns for using this kind of strategy? Since my attachment data changes much less frequently than my GIS data, I figure a potential loss of a handful of images from a database failure is worth the overhead of doing full backups of 10's of GB of data for every dump (likely ~~twice~~ four times a day), compared to some hundreds of MB of GIS data for each dump. I'm not sure what the actual size is so I'm just ballparking.

CalvinHarmin · ‎07-28-2025

Actually I wonder if I should pg_dump attachment data in an uncompressed dump format, so that our offsite server can de-dupe nearly all of the attachment data, assuming each image BLOB can be read as identical blocks... but that would still require a lot of gigabytes of backup transfer if they are backed up multiple times a day since de-duping happens after transfer to our offsite server, rather than before. In the grand scheme of things, though, I believe we are transferring TB's of backup data multiple times each day from our various databases for our organization. The GIS portion of the backup data may just be a drop in the bucket, relatively. Even so, if I have the ability to lighten that load then it's probably a good practice, as I assume this load will only increase with time.

Postgres pg_dump backup strategy separating *.*__attach* attachment table data from other GIS data?

Postgres pg_dump backup strategy separating .__attach* attachment table data from other GIS data?