Migrating from MSSQL to PostgreSQL

MKF62 · ‎03-15-2021

My organization who piggybacks off software from our umbrella organization may be moving to a new umbrella organization. They use PostgreSQL while we are currently using MSSQL. Unfortunately, I will no longer have access to the database server once we move, thus my "migration" is really more like a restore operation. I'll have .bak files from MSSQL and I've also placed our data in FGDB and CSV files for preservation. I'm looking for strategies to successfully migrate our databases without doing a massive rebuild from scratch. Has anyone done this; is this possible? Any advice?

jcarlson · ‎03-15-2021

If both databases are registered enterprise geodatabases and you just want a 1:1 migration with no adjustments to schema, etc., a simple Copy could work.

You can specify an entire database as the input and output.

- Josh Carlson
Kendall County GIS

MKF62 · ‎03-15-2021

Thanks - I forgot to put the most important part of this all in my post (will edit), but I will not have access to our old server during migration. Unfortunately, all I'll have to work with is the backups I take with me (.bak files from MSSQL and the stuff I'm porting to fgdb/csv). I guess maybe once I mint the new PostgreSQL database as a enterprise geodatabase, then I just copy/paste layers over from my FGDB. That might be the only way to do it, I guess.

jcarlson · ‎03-15-2021

Ahhh, I see! Well in that case, you should be able to use Copy first from Old SDE → FGDB, then later from FGDB → New SDE.

I'd just run it wholesale and not bother with segregating the tables at all. Introducing a filetype into the process that has no enforced datatype (CSV) opens the door to error when the loading process attempts (perhaps incorrectly) to infer what datatypes are in the CSV.

Alternatively, you could just get psql running on your local machine and temporarily move it to a database there. Then you'd be able to preview how your data will look post-migration. But that may be more work than it's worth.

- Josh Carlson
Kendall County GIS

MKF62 · ‎03-15-2021

Good call. Thanks for the tip about the CSVs too.

jcarlson · ‎03-15-2021

Any time! And good luck! Postgres is the best. Working with your DB via pgAdmin is so much nicer than SQL Server Manager, I think.

- Josh Carlson
Kendall County GIS

VinceAngelo · ‎03-15-2021

Keeping a database backup isn't going to be of much use if you won't have a SQL Server instance into which to restore it (sort of like making floppy backups before taking delivery of a new ultra-thin laptop).

I usually have the SQL used to create my tables available, so I've been able to create a new empty template in the PG database, then use SQL to optimize the table organization for spatial index order, or at least somewhat spatially defragmented (e.g., counties left to right within states in UTM Zone order), before re-registering with the geodatabase. If you're going to have to migrate, you might as well get some optimization out of the deal.

Sometimes you can create views for the data export to add an optimal ORDER BY when loading the FGDB out of the database, other times it easier to FeatureClassToFeatureClass into a staging schema, then use

INSERT INTO finalschema.table1(
       {fieldlist})
SELECT {fieldlist}
FROM   tempschema.table1
ORDER  BY sort_seq_added_earlier

Integrating as many other moderately costly updates into your design as the migration effort can support will manifest in reduced runtime in the new instance. Conversely, not taking the time to optimize can hobble the performance of the new environment indefinitely.

- V

MKF62 · ‎03-15-2021

I apologize in advance for what are probably dumb questions/interpretations, but I am not a DBA of any sort and (extremely unfortunately) am wildly out of my realm of expertise on this project. I'm the only person they have that has any resemblance of database knowledge, thus I've been tasked with this migration project (hurray...). So, all that said, I would still like to try to understand what you're describing if you don't mind explaining further.

I'm going to start at the "ORDER BY" part of your post first; are you saying that you're reordering/restructuring the fields of your feature class before you export to the FGDB for optimization purposes? Then, if you back up to the first part of your post about the spatial index optimization, it sounds like you use the SQL you have to recreate an empty table in the PG database, but you're reordering fields(?) again to create a more efficient spatial index (and potentially match the adjustments made when you exported data to your FGDB). After doing that, you register the table with the geodatabase and load your data into the empty table from the now-optimized data you exported to the FGDB.

Am I way off (I am assuming I am)?

VinceAngelo · ‎03-15-2021

Oh, dear. It's not very fair to assign this task to a non-DBA. Database transition is the time when inexperience can introduce inefficiencies which are much harder to eliminate down the line.

ORDER BY is the database's "sort on" directive. It's a row thing not a column thing (though reorganizing columns that have been added in dribs and drabs is a good example of a low-cost update during transition).

Spatial fragmentation occurs when data is added to a table in an unordered fashion, such that, when a basic draw request at full extent is made, the features seem to draw in random order. Spatial defragmentation is the process of organizing row order such that draw requests result in a systematic fill, such that, when the extent is at a much more restrictive scale, the features that are drawn are from at most several nearby clusters of rows (reading fewer rows to meet a query request means a faster draw time, and less space consumed in the cache for any one query, which means other queries can also be faster). In extreme cases, spatial defragmentation of, say, 10 million lighting strike records, can result in simple query times dropping from minutes to sub-second. In regular use, it might mean the difference between 700-800 and 200-300 milliseconds during a JOIN -- still significant, but you're much less likely to get rose petals scattered before you and cheering crowds forming spontaneously on your way back to your car.

- V

MKF62 · ‎03-15-2021

Tell me about it. I am terrified to be doing this!!

I am happy to say I do know what an "order by" clause is at least; I saw the word "schema" and my brain latched on to that, making me think about reorganization of the table structure instead of application to rows. It's rather late at night here, I think I need to go to bed, haha.

Thanks for the explanation on the spatial fragmentation; I always thought drawing order was somewhat arbitrary. If it's the case that they can be organized at something like a state level, where you order all the rows by state, moving from left to right and top to bottom (e.g. start the table with features in Washington, end with features in Florida), I could easily rewrite the feature classes with python (more my speed 😅) to be in such an order. Then I assume a newly calculated spatial index would be more efficient, as you've described?

Anyway, thanks for the guidance! Our datasets are fairly small so I think for now, I better stick to having this all culminate in a functional implementation vs an optimized one, given my inexperience and short timeframe to figure it all out. Just trying to understand all you are describing for my own edification 🙂