Hi all,
I'm merging together 4416 tables all within a single geodatabase, each table has two records in it. I've run this simple python script twice now, once it made it to table 343 after 7 hours, the second time 791 after 16 hours. Both times the script averaged 49 merged tables per hour which means this script would take a total of 90 hrs to run. Is there any other tool i can run that would make this run faster?
Thanks
import arcpy
arcpy.env.workspace = "C:\\DeleteMe\\tomerge\\tables.gdb"
tblList = arcpy.ListTables()
arcpy.Merge_management(tblList,"Merged")
Solved! Go to Solution.
Good morning Benjamin,
Do the tables all have the same schema? If not, are you doing any kind of field mapping for your merge? I am not 100% sure how the merge tool works when no field mapping info is supplied, but I am thinking it may be adding the fields of each input table into the output, which could be hurting your performance.
You may wish to create your output table ahead of time, and then use iteration to append the input tables. Something along the lines of:
import arcpy
arcpy.env.workspace = "C:\\DeleteMe\\tomerge\\tables.gdb"
target_table = "C:\\DeleteMe\\output\\final.gdb\\target_table"
tblList = arcpy.ListTables()
for t in tblList:
arcpy.Append_management([t], target_table, "NO_TEST")
I was using the Append tool a couple weeks ago to append 60+ feature classes to one target. I think it took about ten minutes.
Good luck!
Micah
Good morning Benjamin,
Do the tables all have the same schema? If not, are you doing any kind of field mapping for your merge? I am not 100% sure how the merge tool works when no field mapping info is supplied, but I am thinking it may be adding the fields of each input table into the output, which could be hurting your performance.
You may wish to create your output table ahead of time, and then use iteration to append the input tables. Something along the lines of:
import arcpy
arcpy.env.workspace = "C:\\DeleteMe\\tomerge\\tables.gdb"
target_table = "C:\\DeleteMe\\output\\final.gdb\\target_table"
tblList = arcpy.ListTables()
for t in tblList:
arcpy.Append_management([t], target_table, "NO_TEST")
I was using the Append tool a couple weeks ago to append 60+ feature classes to one target. I think it took about ten minutes.
Good luck!
Micah
perfect, this saved me 89 hours of processing time.
Thanks!
I think this would be faster:
arcpy.Append_management(tblList, target_table, "NO_TEST")
I agree with Micah Babinski, consider using the Append tool if all of the schemas are the same, and you just need to dump the records from all of the tables into one table. Also, if the Append tool runs into problems, you could consider using the ArcPy Data Access cursors to accomplish the task. For example, create a new empty table and use an insert cursor on the new table and search cursors on all of the existing tables to insert the records from the existing tables into the new table.
Like you and others I have found the Merge tool performs poorly when merging thousands of datasets. I came across this problem a few years ago and gave up and wrote the following VBA code. This code was designed to run in ArcCatalog and is hardwired to specific folders and file names, it is up to you to edit this macro to suit your needs. It works on the same principal as what Joshua Bixby was suggesting in python. It turned what was taking all day to merge into a matter of minutes.
If you are familiar with VBA and have it installed and licensed (which is free) then you could adapt this script?
Public Sub LoadData()
' Description: This VBA macro will search a folder for all tables beginning with zs
' and then check to see if they contain 1 row, if they do then the row is copied into
' the output Merged table.
'
' Usage: You need to create an output table with the same schema as the tables that
' will be loaded into it, this is the pTable_Merged object. All files must be in the
' same workspace, in this example this would be c:\temp\zs. The VBA macro was designed
' to run within ArcCatalog.
'
' Created: 24/9/10
' Author: Duncan D Hornby (ddh@geodata.soton.ac.uk)
' Get a handle on workspace (folder)
Dim pWorkspaceFactory As IWorkspaceFactory
Set pWorkspaceFactory = New ShapefileWorkspaceFactory
Dim pWorkspace As IWorkspace
Set pWorkspace = pWorkspaceFactory.OpenFromFile("c:\temp\zs", 0)
Dim pFeatureWorkspace As IFeatureWorkspace
Set pFeatureWorkspace = pWorkspace
' Open table to load data into, this must have the same schema
' as the tables that will be loaded into it. Note we drop the .dbf
Dim pTable_Merged As ITable
Set pTable_Merged = pFeatureWorkspace.OpenTable("merged_sed04")
' Create an insert cursor
Dim pCursor As ICursor
Set pCursor = pTable_Merged.Insert(True)
' Get a list of tables in workspace
Dim pEnumDataset As IEnumDataset
Set pEnumDataset = pWorkspace.Datasets(esriDTTable)
' Declare some objects
Dim pTable_Load As ITable
Dim pRow_load As IRow
Dim pRowBuffer As IRowBuffer
Dim pFields As IFields
Dim pDataset As IDataset
Dim i As Byte
Dim n As Long ' The number of tables processed
Dim rej As Long ' The number of tables that are rejected
Let n = 1
Let rej = 0
' Main processing loop
Set pDataset = pEnumDataset.Next
Do While Not pDataset Is Nothing
' If the table begins with zs then this is a table we will want to load
If Left(pDataset.Name, 2) = "zs" Then
' Report table name then get a handle on it
Debug.Print CStr(n) & " : " & pDataset.Name
Set pTable_Load = pDataset
' Check if table has 1 row in it, if it does not then reject
If pTable_Load.RowCount(Nothing) = 1 Then
' Get row and copy into a new row for merged table
Set pRow_load = pTable_Load.GetRow(0)
Set pRowBuffer = pTable_Merged.CreateRowBuffer
Set pFields = pRow_load.Fields
For i = 1 To pFields.FieldCount - 1
With pRowBuffer
.Value(i) = pRow_load.Value(i)
End With
Next i
pCursor.InsertRow pRowBuffer
Else
' Increment rejected
Let rej = rej + 1
End If
End If
' Get next table in folder
Set pDataset = pEnumDataset.Next
Let n = n + 1
Loop
' Clean up
pCursor.Flush
Debug.Print "Number of tables rejected: " & CStr(rej)
End Sub