With crunch time coming and ArcMap set to become a thing of the past for many GIS users, even after migrating to ArcGIS Pro for the majority of works, there are still cases where ArcMap has been held on to as a way of accessing datasets that were stored in Personal geodatabases (.mdb). However, once ArcMap is gone, there will be no way to view this data using ArcGIS Pro. If you only have a few .mdb's, this isn't a big deal as you can manually create a new .gdb and migrate your datasets across using the tools already at your disposal in ArcMap.
But....if you have many .mdb's, this soon becomes unfeasible. We fell into this category where we had many archived and backup datasets stored as .mdb's that we may want to access in the future. With ArcMap planned to be uninstalled from all machines across our organisation later this year, we decided it was time to convert these to file geodatabases before that happened to guarantee future accessibility.
To achieve this, I used ArcMap's python window with 2 sets of 2 scripts as detailed below.
NOTE: When running the below scripts to update data on our network, they were very slow to perform almost every task. Moving the data (make a copy!) to the local C drive of my machine sped up the processing time immensely (Many many hours down to minutes!). It also reduced the number of failed copy tasks to almost 0.
Script 1: This script would be copy and pasted into ArcMaps python window. Update the input_folder parameter with the parent folder containing personal geodatabases. This will also loop through all subfolders and their contained .mdb's. The loop passes the .mdb pathway to the second script. Update the second_script_path parameter with the pathway to the second script.
Script 2: Called by script 1, this script is saved in a location of your choosing. It undertakes the creation of a matching .gdb and copies across any datasets, feature classes or tables found in the input .mdb.
The process was broken out into 2 scripts instead of being run in a single instance (script) because I found that after the first folder in the loop, all .mdb's found in subsequent folders would fail to have their data copied, returning many error messages. The second script is executed per .mdb found and this resets all the inputs, removing whatever issue caused it to fail when looped in the same instance.
Script 1
import os
import subprocess
from datetime import datetime
def iterate_folders(input_folder, second_script_path):
# Create a log file
log_file = os.path.join(r"C:\temp", "MDB_to_GDB_log_{}.txt".format(datetime.now().strftime('%Y%m%d_%H%M%S')))
def log_message(message):
print(message)
with open(log_file, 'a') as log:
log.write(message + '\n')
# Iterate through all subfolders and files
for root, dirs, files in os.walk(input_folder):
for file in files:
# Check if the file is a .mdb file
if file.endswith(".mdb"):
mdb_path = os.path.join(root, file)
log_message("Processing MDB: " + mdb_path)
# Call the second script to create GDB and copy items
subprocess.call(["python", second_script_path, root, log_file])
# Example usage
input_folder = r"C:\temp\FoldersWithMDBs"
second_script_path = r"C:\Temp\Script2.py" # Specify the full path to the second script
if not os.path.exists(input_folder):
print("The path " + input_folder + " does not exist.")
else:
iterate_folders(input_folder, second_script_path)
Script 2
import os
import sys
import arcpy
from datetime import datetime
def create_gdb_and_copy_items(folder_path, log_file):
def log_message(message):
print(message)
with open(log_file, 'a') as log:
log.write(message + '\n')
# Iterate through all files in the folder
for file in os.listdir(folder_path):
# Check if the file is a .mdb file
if file.endswith(".mdb"):
mdb_path = os.path.join(folder_path, file)
gdb_name = os.path.splitext(file)[0] + ".gdb"
gdb_path = os.path.join(folder_path, gdb_name)
log_message("Processing MDB: " + mdb_path)
# Check if the GDB already exists
if not arcpy.Exists(gdb_path):
try:
# Create a new file geodatabase using arcpy
arcpy.management.CreateFileGDB(folder_path, gdb_name)
except Exception as e:
log_message("Error creating GDB {}: {}".format(gdb_path, str(e)))
continue
else:
log_message("GDB already exists: " + gdb_path)
try:
# List all datasets in the .mdb
arcpy.env.workspace = mdb_path
datasets = arcpy.ListDatasets("*", "All")
log_message("Datasets: " + str(datasets))
except Exception as e:
log_message("Error listing datasets in {}: {}".format(mdb_path, str(e)))
datasets = []
try:
# Copy each dataset to the new .gdb
for dataset in datasets:
if not arcpy.Exists(os.path.join(gdb_path, dataset)):
arcpy.management.Copy(dataset, os.path.join(gdb_path, dataset))
log_message(str(dataset) + " copied")
else:
log_message(str(dataset) + " already exists in GDB")
except Exception as e:
log_message("Error copying dataset {}: {}".format(dataset, str(e)))
continue
try:
# List all feature classes in the .mdb
arcpy.env.workspace = mdb_path
fcs = arcpy.ListFeatureClasses("*", "All")
log_message("Feature classes: " + str(fcs))
except Exception as e:
log_message("Error listing feature classes in {}: {}".format(mdb_path, str(e)))
fcs = []
try:
# Copy each feature class to the new .gdb
for fc in fcs:
if not arcpy.Exists(os.path.join(gdb_path, fc)):
arcpy.management.Copy(fc, os.path.join(gdb_path, fc))
log_message(str(fc) + " copied")
else:
log_message(str(fc) + " already exists in GDB")
except Exception as e:
log_message("Error copying feature class {}: {}".format(fc, str(e)))
continue
try:
# List all tables in the .mdb
arcpy.env.workspace = mdb_path
tables = arcpy.ListTables("*", "All")
log_message("Tables: " + str(tables))
except Exception as e:
log_message("Error listing tables in {}: {}".format(mdb_path, str(e)))
tables = []
try:
# Copy each table to the new .gdb, excluding those starting with 'dbo_'
for table in tables:
if not table.startswith("dbo_"):
if not arcpy.Exists(os.path.join(gdb_path, table)):
arcpy.management.Copy(table, os.path.join(gdb_path, table))
log_message(str(table) + " copied")
else:
log_message(str(table) + " already exists in GDB")
else:
log_message("Skipping table: " + str(table))
except Exception as e:
log_message("Error copying table {}: {}".format(table, str(e)))
continue
log_message("Converted " + mdb_path + " to " + gdb_path)
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python create_gdb_and_copy_items.py <folder_path> <log_file>")
else:
folder_path = sys.argv[1]
log_file = sys.argv[2]
create_gdb_and_copy_items(folder_path, log_file)
Script 2 notes: You may have more than just datasets, feature classes and tables to migrate and thus may need to customise the script to suit your needs. Also, I've had to filter out tables starting with dbo_ as these were connections to long lost databases and caused the script to get stuck trying to access those databases.
The next 2 scripts were intended mainly as a double checking process. Once the first scripts had run, occasionally I would see error messages where a feature class or table failed to copy. This would be caught and logged but the script would continue on. Reviewing the log file helped to identify how many of these failures occurred. One or 2 and you may choose to fix manually.
Script 3: Very similar to Script 1 in that it loops through folders and personal geodatabases. The difference is here is that it would generate the path for .gdb's that should now exist from the located .mdb and would pass this to Script 4 as well. It is executed the same way (by pasting into ArcMaps python window and calls Script 4 using the referenced pathway)
Script 4: This process lists all datasets/feature classes/tables in the input .mdb AND .gdb for comparison. If the lists match, nothing happens, but if they don't match, it attempts to copy the missing items across to complete the list.
NOTE: It has the same filter as Script to exclude "dbo_" tables.
Script 3
import os
import subprocess
from datetime import datetime
def iterate_folders(input_folder, second_script_path):
# Create a log file
log_file = os.path.join(r"C:\temp", "MDB_to_GDB_log_{}.txt".format(datetime.now().strftime('%Y%m%d_%H%M%S')))
def log_message(message):
print(message)
with open(log_file, 'a') as log:
log.write(message + '\n')
# Iterate through all subfolders and files
for root, dirs, files in os.walk(input_folder):
for file in files:
# Check if the file is a .mdb file
if file.endswith(".mdb"):
mdb_path = os.path.join(root, file)
gdb_name = os.path.splitext(file)[0] + ".gdb"
gdb_path = os.path.join(root, gdb_name)
log_message("Processing MDB: " + mdb_path)
# Call the second script to compare and copy items
subprocess.call(["python", second_script_path, mdb_path, gdb_path, log_file])
# Example usage
input_folder = r"C:\temp\FolderWithMDBs"
second_script_path = r"C:\Temp\Script 4.py" # Specify the full path to the second script
if not os.path.exists(input_folder):
def log_message(message):
print(message)
with open(log_file, 'a') as log:
log.write(message + '\n')
log_message("The path " + input_folder + " does not exist.")
else:
iterate_folders(input_folder, second_script_path)
Script 4
import os
import sys
import arcpy
def compare_and_copy_items(mdb_path, gdb_path, log_file):
def log_message(message):
print(message)
with open(log_file, 'a') as log:
log.write(message + '\n')
try:
# Ensure the GDB exists
if not arcpy.Exists(gdb_path):
log_message("GDB does not exist: " + gdb_path)
return
# List all datasets in the .mdb
arcpy.env.workspace = mdb_path
mdb_datasets = arcpy.ListDatasets("*", "All") or []
log_message("MDB Datasets: " + str(mdb_datasets))
mdb_fcs = arcpy.ListFeatureClasses("*", "All") or []
log_message("MDB Feature Classes: " + str(mdb_fcs))
mdb_tables = arcpy.ListTables("*", "All") or []
log_message("MDB Tables: " + str(mdb_tables))
# List all datasets in the .gdb
arcpy.env.workspace = gdb_path
gdb_datasets = arcpy.ListDatasets("*", "All") or []
log_message("GDB Datasets: " + str(gdb_datasets))
gdb_fcs = arcpy.ListFeatureClasses("*", "All") or []
log_message("GDB Feature Classes: " + str(gdb_fcs))
gdb_tables = arcpy.ListTables("*", "All") or []
log_message("GDB Tables: " + str(gdb_tables))
# Compare and copy missing datasets
for dataset in mdb_datasets:
if dataset not in gdb_datasets:
try:
arcpy.management.Copy(os.path.join(mdb_path, dataset), os.path.join(gdb_path, dataset))
log_message("Copied dataset: " + dataset)
except Exception as e:
log_message("Error copying dataset {}: {}".format(dataset, str(e)))
# Compare and copy missing feature classes
for fc in mdb_fcs:
if fc not in gdb_fcs:
try:
arcpy.management.Copy(os.path.join(mdb_path, fc), os.path.join(gdb_path, fc))
log_message("Copied feature class: " + fc)
except Exception as e:
log_message("Error copying feature class {}: {}".format(fc, str(e)))
# Compare and copy missing tables, excluding those starting with 'dbo_'
for table in mdb_tables:
if not table.startswith("dbo_") and table not in gdb_tables:
try:
arcpy.management.Copy(os.path.join(mdb_path, table), os.path.join(gdb_path, table))
log_message("Copied table: " + table)
except Exception as e:
log_message("Error copying table {}: {}".format(table, str(e)))
else:
log_message("Skipping table: " + table)
except Exception as e:
log_message("Error processing {}: {}".format(mdb_path, str(e)))
if __name__ == "__main__":
if len(sys.argv) != 4:
print("Usage: python compare_and_copy_items.py <mdb_path> <gdb_path> <log_file>")
else:
mdb_path = sys.argv[1]
gdb_path = sys.argv[2]
log_file = sys.argv[3]
compare_and_copy_items(mdb_path, gdb_path, log_file)
Conclusion
I hope that this may be use to someone in their migration efforts. It took a fair bit of trial and error to make it work and consider all the unique situations that occur with differing data management strategies over time, so will likely need to be customised for your particular need. A solid starting point and methodology though will hopefully make your efforts a lot simpler!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.