I am seeking a method to merge table outputs from ZonalHistogram (ZH) using a Python script. These files have a simple two-column format. E.g., AAA.dbf has two fields:
Similarly, BBB.dbf has two fields, LABEL and BBB, CCC.dbf has LABEL and CCC. I am able to plot these individual table files with arcpy graph, or render them in Excel.
I am trying to merge these files into a single output with the structure:
LABEL AAA BBB CCC
0 - 100 ZH values ZH values ZH values
I started simply with:
tablesList = arcpy.ListTables()
arcpy.Merge_management(tablesList, "merged.dbf")
which gave me the four columns I was after but the output was arranged as:
rows 0 - 100: LABEL (0 - 100) ZH values 0 0
rows 101 - 201: LABEL (0 - 100) 0 ZH values 0
rows 202 - 302: Label (0 - 100) 0 0 ZH values
I have attempted to use field mapping as follows:
import arcpy
TESTDIR = <some dir>
arcpy.env.workspace = TESTDIR
arcpy.env.scratchWorkspace = TESTDIR
arcpy.env.overwriteOutput = True
tablesList = arcpy.ListTables()
inFields = [u'LABEL', u'AAA', u'BBB', u'CCC'] # Generated by arcpy.Describe()
fieldMappings = arcpy.FieldMappings()
# Deal with "LABEL" as special case (first field of first table)
fMap = arcpy.FieldMap()
fMap.addInputField(tablesList[0], inFields[0])
outField = fMap.outputField
outField.name = inFields[0]
fMap.outputField = outField
fieldMappings.addFieldMap(fMap)
# Add the second field of each file to the map
for i in range(len(tablesList)):
fMap = arcpy.FieldMap()
fMap.addInputField(tablesList[i], inFields[i + 1])
outField = fMap.outputField
outField.name = inFields[i + 1]
fMap.outputField = outField
fieldMappings.addFieldMap(fMap)
arcpy.Merge_management(tablesList, "merged.dbf", fieldMappings)
The output was the same. It seems I have merely replicated the built-in Merge_management. I would be most grateful for any advice on modifying the script to generate the merged output I am after.
Solved! Go to Solution.
I see when the TableToTable conversion is complete, the dbf has had its field names mangled with trailing "_" and digits and restricting them to 10 character
The 10 character truncation is because you your output is in shapefile or .dbf format. If you output to a geodatabase your field names will not be mangled to fit in the 10 character limit. The field names changes are those that are required to make the field name valid for the output format see the ValidateFieldName method in the help for more on that.
You may have better luck if you set the environment arcpy.env.qualifiedFieldNames to False, this will prevent your merge from trying to include the prefix in the output name. (When I think about it, this may be your main issue.)
LABEL AAA BBB CCC
0 - 100 ZH values ZH values ZH values
This suggests to me a join on the LABEL field. But is there a many to one, one to one etc.
Maybe if you have a larger sample to show, it might help.
Join with a summarize? hard to tell
It looks as if you want to join tables, not merge them. See Add Join for more information. You would join on the LABEL field.
An alternative to the join is to read the dbf files into dictionaries using a SearchCursor, combining them and saving the results. An example:
import arcpy
dbf = r"C:\path\to\dbf\AAA.dbf"
flds = ['LABEL', 'AAA']
dic1 = {r[0]: r[1] for r in arcpy.da.SearchCursor(dbf, flds)}
dbf = r"C:\path\to\dbf\BBB.dbf"
flds = ['LABEL', 'BBB']
dic2 = {r[0]: r[1] for r in arcpy.da.SearchCursor(dbf, flds)}
dbf = r"C:\path\to\dbf\CCC.dbf"
flds = ['LABEL', 'CCC']
dic3 = {r[0]: r[1] for r in arcpy.da.SearchCursor(dbf, flds)}
print "LABEL\tAAA\tBBB\tCCC"
for k, v in dic1.iteritems():
print "{}\t{}\t{}\t{}".format(k, v, dic2, dic3)
# would give something like:
'''
LABEL AAA BBB CCC
0 100 200 300
1 101 201 301
2 102 202 302
3 103 203 303
4 104 204 304
'''
Or, Add Join tables 2 and 3 to table 1 and copy the table view with Table to Table. This method has a nice advantage in that you can use a field map to control the order and names of the output fields.
Hi All,
I'm having a go at both approaches. I need to gain some experience with field mappings, so ...
After the join, tbl1 has the fieldnames:
[u'AAA.OID', u'AAA.LABEL', u'AAA.AAA', u'BBB.OID', u'BBB.LABEL', u'BBB.BBB', u'CCC.OID', u'CCC.LABEL', u'CCC.CCC']
I determine which of these I want to keep for the TableToTable_conversion with:
keepFields = [u'AAA.OID', u'AAA.LABEL', u'AAA.AAA', u'BBB.BBB', u'CCC.CCC']
and attempt to build the fieldMappings with:
fieldMappings = arcpy.FieldMappings()
for field in fieldnames:
fMap = arcpy.FieldMap()
if field in keepFields:
fMap.addInputField(tbl1, field)
outField = fMap.outputField
outField.name = field
fMap.outputField = outField
fieldMappings.addFieldMap(fMap)
which produces the error:
RuntimeError: FieldMap: Error in adding input field to field map
for line 5.
Can you suggest how I might follow this up? My limited experience can find no help in the error message at all!
Ian
I would print field in the loop so you can figure which one failed.
Yes, I had started that task when your suggestion arrived. It turns out all of them break it.
I see when the TableToTable conversion is complete, the dbf has had its field names mangled with trailing "_" and digits and restricting them to 10 characters. I was thinking this happened at this step but perhaps it occurs at the AddJoin_management? If this is the case, I can understand the addInputField might get confused.
I see when the TableToTable conversion is complete, the dbf has had its field names mangled with trailing "_" and digits and restricting them to 10 character
The 10 character truncation is because you your output is in shapefile or .dbf format. If you output to a geodatabase your field names will not be mangled to fit in the 10 character limit. The field names changes are those that are required to make the field name valid for the output format see the ValidateFieldName method in the help for more on that.
You may have better luck if you set the environment arcpy.env.qualifiedFieldNames to False, this will prevent your merge from trying to include the prefix in the output name. (When I think about it, this may be your main issue.)
Hi Curtis,
Setting the environment in this way certainly helped to make sense of what was happening with the field names. All the field names still failed so I figured I should look at the other parameter in addInputField. It turns out I was trying to use the arcpy.mapping.TableView view of the input dataset rather than the dataset name itself.
Thanks also for the 10-character limit explanation.
Ian
Thank you Dan, Randy and Curtis. I am most grateful for your ideas. I am unable to follow up on them this evening but I will let you know how I get on.