Slow Table Merge: Python

BenjaminMittler · ‎09-13-2016

Hi all,

I'm merging together 4416 tables all within a single geodatabase, each table has two records in it. I've run this simple python script twice now, once it made it to table 343 after 7 hours, the second time 791 after 16 hours. Both times the script averaged 49 merged tables per hour which means this script would take a total of 90 hrs to run. Is there any other tool i can run that would make this run faster?

Thanks

import arcpy
arcpy.env.workspace = "C:\\DeleteMe\\tomerge\\tables.gdb"
tblList = arcpy.ListTables()
arcpy.Merge_management(tblList,"Merged")‍‍‍‍

MicahBabinski · ‎09-13-2016

Good morning Benjamin,

Do the tables all have the same schema? If not, are you doing any kind of field mapping for your merge? I am not 100% sure how the merge tool works when no field mapping info is supplied, but I am thinking it may be adding the fields of each input table into the output, which could be hurting your performance.

You may wish to create your output table ahead of time, and then use iteration to append the input tables. Something along the lines of:

import arcpy

arcpy.env.workspace = "C:\\DeleteMe\\tomerge\\tables.gdb"
target_table = "C:\\DeleteMe\\output\\final.gdb\\target_table"
tblList = arcpy.ListTables()

for t in tblList:
    arcpy.Append_management([t], target_table, "NO_TEST")‍‍‍‍‍‍‍‍

I was using the Append tool a couple weeks ago to append 60+ feature classes to one target. I think it took about ten minutes.

Good luck!

Micah

View solution in original post

MicahBabinski · ‎09-13-2016

Good morning Benjamin,

Do the tables all have the same schema? If not, are you doing any kind of field mapping for your merge? I am not 100% sure how the merge tool works when no field mapping info is supplied, but I am thinking it may be adding the fields of each input table into the output, which could be hurting your performance.

You may wish to create your output table ahead of time, and then use iteration to append the input tables. Something along the lines of:

import arcpy

arcpy.env.workspace = "C:\\DeleteMe\\tomerge\\tables.gdb"
target_table = "C:\\DeleteMe\\output\\final.gdb\\target_table"
tblList = arcpy.ListTables()

for t in tblList:
    arcpy.Append_management([t], target_table, "NO_TEST")‍‍‍‍‍‍‍‍

I was using the Append tool a couple weeks ago to append 60+ feature classes to one target. I think it took about ten minutes.

Good luck!

Micah

BenjaminMittler · ‎09-13-2016

perfect, this saved me 89 hours of processing time.

Thanks!

curtvprice · ‎09-19-2016

I think this would be faster:

arcpy.Append_management(tblList, target_table, "NO_TEST")‍‍

JoshuaBixby · ‎09-13-2016

I agree with Micah Babinski‌, consider using the Append tool if all of the schemas are the same, and you just need to dump the records from all of the tables into one table. Also, if the Append tool runs into problems, you could consider using the ArcPy Data Access cursors to accomplish the task. For example, create a new empty table and use an insert cursor on the new table and search cursors on all of the existing tables to insert the records from the existing tables into the new table.

DuncanHornby · ‎09-13-2016

Like you and others I have found the Merge tool performs poorly when merging thousands of datasets. I came across this problem a few years ago and gave up and wrote the following VBA code. This code was designed to run in ArcCatalog and is hardwired to specific folders and file names, it is up to you to edit this macro to suit your needs. It works on the same principal as what Joshua Bixby was suggesting in python. It turned what was taking all day to merge into a matter of minutes.

‌

If you are familiar with VBA and have it installed and licensed (which is free) then you could adapt this script?

Public Sub LoadData()
 ' Description: This VBA macro will search a folder for all tables beginning with zs 
 ' and then check to see if they contain 1 row, if they do then the row is copied into 
 ' the output Merged table.
 '
 ' Usage: You need to create an output table with the same schema as the tables that 
 ' will be loaded into it, this is the pTable_Merged object. All files must be in the 
 ' same workspace, in this example this would be c:\temp\zs. The VBA macro was designed 
 ' to run within ArcCatalog.
 '
 ' Created: 24/9/10
 ' Author: Duncan D Hornby (ddh@geodata.soton.ac.uk)
 
 ' Get a handle on workspace (folder)
 Dim pWorkspaceFactory As IWorkspaceFactory
 Set pWorkspaceFactory = New ShapefileWorkspaceFactory
 Dim pWorkspace As IWorkspace
 Set pWorkspace = pWorkspaceFactory.OpenFromFile("c:\temp\zs", 0)
 Dim pFeatureWorkspace As IFeatureWorkspace
 Set pFeatureWorkspace = pWorkspace
 
 ' Open table to load data into, this must have the same schema
 ' as the tables that will be loaded into it. Note we drop the .dbf
 Dim pTable_Merged As ITable
 Set pTable_Merged = pFeatureWorkspace.OpenTable("merged_sed04")
 
 ' Create an insert cursor
 Dim pCursor As ICursor
 Set pCursor = pTable_Merged.Insert(True)
 
 ' Get a list of tables in workspace
 Dim pEnumDataset As IEnumDataset
 Set pEnumDataset = pWorkspace.Datasets(esriDTTable)
 
 ' Declare some objects
 Dim pTable_Load As ITable
 Dim pRow_load As IRow
 Dim pRowBuffer As IRowBuffer
 Dim pFields As IFields
 Dim pDataset As IDataset
 Dim i As Byte
 Dim n As Long ' The number of tables processed
 Dim rej As Long ' The number of tables that are rejected
 Let n = 1
 Let rej = 0
 
 ' Main processing loop
 Set pDataset = pEnumDataset.Next
 Do While Not pDataset Is Nothing
 ' If the table begins with zs then this is a table we will want to load
 If Left(pDataset.Name, 2) = "zs" Then
 
 ' Report table name then get a handle on it
 Debug.Print CStr(n) & " : " & pDataset.Name
 Set pTable_Load = pDataset
 
 ' Check if table has 1 row in it, if it does not then reject
 If pTable_Load.RowCount(Nothing) = 1 Then
 
 ' Get row and copy into a new row for merged table
 Set pRow_load = pTable_Load.GetRow(0)
 Set pRowBuffer = pTable_Merged.CreateRowBuffer
 Set pFields = pRow_load.Fields
 For i = 1 To pFields.FieldCount - 1
 With pRowBuffer
 .Value(i) = pRow_load.Value(i)
 End With
 Next i
 pCursor.InsertRow pRowBuffer
 Else
 ' Increment rejected
 Let rej = rej + 1
 End If
 End If
 
 ' Get next table in folder
 Set pDataset = pEnumDataset.Next
 Let n = n + 1
 Loop
 
 ' Clean up
 pCursor.Flush
 Debug.Print "Number of tables rejected: " & CStr(rej)
End Sub‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍