Migrating many personal geodatabases to file geodatabases

LindsayRaabe_FPCWA

With crunch time coming and ArcMap set to become a thing of the past for many GIS users, even after migrating to ArcGIS Pro for the majority of works, there are still cases where ArcMap has been held on to as a way of accessing datasets that were stored in Personal geodatabases (.mdb). However, once ArcMap is gone, there will be no way to view this data using ArcGIS Pro. If you only have a few .mdb's, this isn't a big deal as you can manually create a new .gdb and migrate your datasets across using the tools already at your disposal in ArcMap.

But....if you have many .mdb's, this soon becomes unfeasible. We fell into this category where we had many archived and backup datasets stored as .mdb's that we may want to access in the future. With ArcMap planned to be uninstalled from all machines across our organisation later this year, we decided it was time to convert these to file geodatabases before that happened to guarantee future accessibility.

To achieve this, I used ArcMap's python window with 2 sets of 2 scripts as detailed below.

NOTE: When running the below scripts to update data on our network, they were very slow to perform almost every task. Moving the data (make a copy!) to the local C drive of my machine sped up the processing time immensely (Many many hours down to minutes!). It also reduced the number of failed copy tasks to almost 0.

Script 1: This script would be copy and pasted into ArcMaps python window. Update the input_folder parameter with the parent folder containing personal geodatabases. This will also loop through all subfolders and their contained .mdb's. The loop passes the .mdb pathway to the second script. Update the second_script_path parameter with the pathway to the second script.

Script 2: Called by script 1, this script is saved in a location of your choosing. It undertakes the creation of a matching .gdb and copies across any datasets, feature classes or tables found in the input .mdb.

The process was broken out into 2 scripts instead of being run in a single instance (script) because I found that after the first folder in the loop, all .mdb's found in subsequent folders would fail to have their data copied, returning many error messages. The second script is executed per .mdb found and this resets all the inputs, removing whatever issue caused it to fail when looped in the same instance.

Script 1

import os
import subprocess
from datetime import datetime

def iterate_folders(input_folder, second_script_path):
    # Create a log file
    log_file = os.path.join(r"C:\temp", "MDB_to_GDB_log_{}.txt".format(datetime.now().strftime('%Y%m%d_%H%M%S')))

    def log_message(message):
        print(message)
        with open(log_file, 'a') as log:
            log.write(message + '\n')

    # Iterate through all subfolders and files
    for root, dirs, files in os.walk(input_folder):
        for file in files:
            # Check if the file is a .mdb file
            if file.endswith(".mdb"):
                mdb_path = os.path.join(root, file)
                log_message("Processing MDB: " + mdb_path)
                # Call the second script to create GDB and copy items
                subprocess.call(["python", second_script_path, root, log_file])

# Example usage
input_folder = r"C:\temp\FoldersWithMDBs"
second_script_path = r"C:\Temp\Script2.py"  # Specify the full path to the second script
if not os.path.exists(input_folder):
    print("The path " + input_folder + " does not exist.")
else:
    iterate_folders(input_folder, second_script_path)

Script 2

import os
import sys
import arcpy
from datetime import datetime

def create_gdb_and_copy_items(folder_path, log_file):
    def log_message(message):
        print(message)
        with open(log_file, 'a') as log:
            log.write(message + '\n')

    # Iterate through all files in the folder
    for file in os.listdir(folder_path):
        # Check if the file is a .mdb file
        if file.endswith(".mdb"):
            mdb_path = os.path.join(folder_path, file)
            gdb_name = os.path.splitext(file)[0] + ".gdb"
            gdb_path = os.path.join(folder_path, gdb_name)
            log_message("Processing MDB: " + mdb_path)

            # Check if the GDB already exists
            if not arcpy.Exists(gdb_path):
                try:
                    # Create a new file geodatabase using arcpy
                    arcpy.management.CreateFileGDB(folder_path, gdb_name)
                except Exception as e:
                    log_message("Error creating GDB {}: {}".format(gdb_path, str(e)))
                    continue
            else:
                log_message("GDB already exists: " + gdb_path)
            
            try:
                # List all datasets in the .mdb
                arcpy.env.workspace = mdb_path
                datasets = arcpy.ListDatasets("*", "All")
                log_message("Datasets: " + str(datasets))
            except Exception as e:
                log_message("Error listing datasets in {}: {}".format(mdb_path, str(e)))
                datasets = []
            
            try:
                # Copy each dataset to the new .gdb
                for dataset in datasets:
                    if not arcpy.Exists(os.path.join(gdb_path, dataset)):
                        arcpy.management.Copy(dataset, os.path.join(gdb_path, dataset))
                        log_message(str(dataset) + " copied")
                    else:
                        log_message(str(dataset) + " already exists in GDB")
            except Exception as e:
                log_message("Error copying dataset {}: {}".format(dataset, str(e)))
                continue
            
            try:
                # List all feature classes in the .mdb
                arcpy.env.workspace = mdb_path
                fcs = arcpy.ListFeatureClasses("*", "All")
                log_message("Feature classes: " + str(fcs))
            except Exception as e:
                log_message("Error listing feature classes in {}: {}".format(mdb_path, str(e)))
                fcs = []
            
            try:
                # Copy each feature class to the new .gdb
                for fc in fcs:
                    if not arcpy.Exists(os.path.join(gdb_path, fc)):
                        arcpy.management.Copy(fc, os.path.join(gdb_path, fc))
                        log_message(str(fc) + " copied")
                    else:
                        log_message(str(fc) + " already exists in GDB")
            except Exception as e:
                log_message("Error copying feature class {}: {}".format(fc, str(e)))
                continue
            
            try:
                # List all tables in the .mdb
                arcpy.env.workspace = mdb_path
                tables = arcpy.ListTables("*", "All")
                log_message("Tables: " + str(tables))
            except Exception as e:
                log_message("Error listing tables in {}: {}".format(mdb_path, str(e)))
                tables = []
            
            try:
                # Copy each table to the new .gdb, excluding those starting with 'dbo_'
                for table in tables:
                    if not table.startswith("dbo_"):
                        if not arcpy.Exists(os.path.join(gdb_path, table)):
                            arcpy.management.Copy(table, os.path.join(gdb_path, table))
                            log_message(str(table) + " copied")
                        else:
                            log_message(str(table) + " already exists in GDB")
                    else:
                        log_message("Skipping table: " + str(table))
            except Exception as e:
                log_message("Error copying table {}: {}".format(table, str(e)))
                continue
            
            log_message("Converted " + mdb_path + " to " + gdb_path)

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python create_gdb_and_copy_items.py <folder_path> <log_file>")
    else:
        folder_path = sys.argv[1]
        log_file = sys.argv[2]
        create_gdb_and_copy_items(folder_path, log_file)

Script 2 notes: You may have more than just datasets, feature classes and tables to migrate and thus may need to customise the script to suit your needs. Also, I've had to filter out tables starting with dbo_ as these were connections to long lost databases and caused the script to get stuck trying to access those databases.

The next 2 scripts were intended mainly as a double checking process. Once the first scripts had run, occasionally I would see error messages where a feature class or table failed to copy. This would be caught and logged but the script would continue on. Reviewing the log file helped to identify how many of these failures occurred. One or 2 and you may choose to fix manually.

Script 3: Very similar to Script 1 in that it loops through folders and personal geodatabases. The difference is here is that it would generate the path for .gdb's that should now exist from the located .mdb and would pass this to Script 4 as well. It is executed the same way (by pasting into ArcMaps python window and calls Script 4 using the referenced pathway)

Script 4: This process lists all datasets/feature classes/tables in the input .mdb AND .gdb for comparison. If the lists match, nothing happens, but if they don't match, it attempts to copy the missing items across to complete the list.

NOTE: It has the same filter as Script to exclude "dbo_" tables.

Script 3

import os
import subprocess
from datetime import datetime

def iterate_folders(input_folder, second_script_path):
    # Create a log file
    log_file = os.path.join(r"C:\temp", "MDB_to_GDB_log_{}.txt".format(datetime.now().strftime('%Y%m%d_%H%M%S')))

    def log_message(message):
        print(message)
        with open(log_file, 'a') as log:
            log.write(message + '\n')

    # Iterate through all subfolders and files
    for root, dirs, files in os.walk(input_folder):
        for file in files:
            # Check if the file is a .mdb file
            if file.endswith(".mdb"):
                mdb_path = os.path.join(root, file)
                gdb_name = os.path.splitext(file)[0] + ".gdb"
                gdb_path = os.path.join(root, gdb_name)
                log_message("Processing MDB: " + mdb_path)
                # Call the second script to compare and copy items
                subprocess.call(["python", second_script_path, mdb_path, gdb_path, log_file])

# Example usage
input_folder = r"C:\temp\FolderWithMDBs"
second_script_path = r"C:\Temp\Script 4.py"  # Specify the full path to the second script
if not os.path.exists(input_folder):
    def log_message(message):
        print(message)
        with open(log_file, 'a') as log:
            log.write(message + '\n')
    log_message("The path " + input_folder + " does not exist.")
else:
    iterate_folders(input_folder, second_script_path)

Script 4

import os
import sys
import arcpy

def compare_and_copy_items(mdb_path, gdb_path, log_file):
    def log_message(message):
        print(message)
        with open(log_file, 'a') as log:
            log.write(message + '\n')

    try:
        # Ensure the GDB exists
        if not arcpy.Exists(gdb_path):
            log_message("GDB does not exist: " + gdb_path)
            return

        # List all datasets in the .mdb
        arcpy.env.workspace = mdb_path
        mdb_datasets = arcpy.ListDatasets("*", "All") or []
        log_message("MDB Datasets: " + str(mdb_datasets))
        mdb_fcs = arcpy.ListFeatureClasses("*", "All") or []
        log_message("MDB Feature Classes: " + str(mdb_fcs))
        mdb_tables = arcpy.ListTables("*", "All") or []
        log_message("MDB Tables: " + str(mdb_tables))

        # List all datasets in the .gdb
        arcpy.env.workspace = gdb_path
        gdb_datasets = arcpy.ListDatasets("*", "All") or []
        log_message("GDB Datasets: " + str(gdb_datasets))
        gdb_fcs = arcpy.ListFeatureClasses("*", "All") or []
        log_message("GDB Feature Classes: " + str(gdb_fcs))
        gdb_tables = arcpy.ListTables("*", "All") or []
        log_message("GDB Tables: " + str(gdb_tables))

        # Compare and copy missing datasets
        for dataset in mdb_datasets:
            if dataset not in gdb_datasets:
                try:
                    arcpy.management.Copy(os.path.join(mdb_path, dataset), os.path.join(gdb_path, dataset))
                    log_message("Copied dataset: " + dataset)
                except Exception as e:
                    log_message("Error copying dataset {}: {}".format(dataset, str(e)))

        # Compare and copy missing feature classes
        for fc in mdb_fcs:
            if fc not in gdb_fcs:
                try:
                    arcpy.management.Copy(os.path.join(mdb_path, fc), os.path.join(gdb_path, fc))
                    log_message("Copied feature class: " + fc)
                except Exception as e:
                    log_message("Error copying feature class {}: {}".format(fc, str(e)))

        # Compare and copy missing tables, excluding those starting with 'dbo_'
        for table in mdb_tables:
            if not table.startswith("dbo_") and table not in gdb_tables:
                try:
                    arcpy.management.Copy(os.path.join(mdb_path, table), os.path.join(gdb_path, table))
                    log_message("Copied table: " + table)
                except Exception as e:
                    log_message("Error copying table {}: {}".format(table, str(e)))
            else:
                log_message("Skipping table: " + table)

    except Exception as e:
        log_message("Error processing {}: {}".format(mdb_path, str(e)))

if __name__ == "__main__":
    if len(sys.argv) != 4:
        print("Usage: python compare_and_copy_items.py <mdb_path> <gdb_path> <log_file>")
    else:
        mdb_path = sys.argv[1]
        gdb_path = sys.argv[2]
        log_file = sys.argv[3]
        compare_and_copy_items(mdb_path, gdb_path, log_file)

Conclusion

I hope that this may be use to someone in their migration efforts. It took a fair bit of trial and error to make it work and consider all the unique situations that occur with differing data management strategies over time, so will likely need to be customised for your particular need. A solid starting point and methodology though will hopefully make your efforts a lot simpler!

KoryKramer

Thank you for your efforts to help others in their migration efforts to ArcGIS Pro @LindsayRaabe_FPCWA ! For the sake of sharing with the community, I wanted to point to https://www.esri.com/arcgis-blog/products/arcgis-pro/data-management/migrating-data-tools-to-migrate...

The ArcGIS Pro geodatabase team put together a sample tool that does what you're talking about here. Just adding in case it is of interest to anyone who discovers this thread because they are looking for migration solutions.

LindsayRaabe_FPCWA

@KoryKramer Great! Thanks for including that. Wish I'd found it sooner, but sometimes it comes down to the terms you use when searching!

ShareUser

If anyone is interested in doing something similar from Pro, please send me a message. I have a solution for any personal GDB provided it was not created in Access 97.

ShareUser

For people wondering which option to use (my python code vs the esri tools), it really comes down to your personal preference and skill sets. If you have a strong python backing, mine may be easier for you to customise and run straight from the console, whereas the Esri sample tools will likely suit better if you don't have a strong python backing and just want to extract and run the tools provided.