Seeking help with arcpy.Merge_management

776
13
Jump to solution
03-18-2019 03:53 AM
IanMCDONALD
New Contributor III

I am seeking a method to merge table outputs from ZonalHistogram (ZH) using a Python script. These files have a simple two-column format. E.g., AAA.dbf has two fields:

  • LABEL - the X-axis with values ranging from 0 to 100
  • AAA - the Y-axis counts produced by ZH

Similarly, BBB.dbf has two fields, LABEL and BBB, CCC.dbf has LABEL and CCC. I am able to plot these individual table files with arcpy graph, or render them in Excel.

I am trying to merge these files into a single output with the structure:
LABEL    AAA             BBB             CCC
0 - 100    ZH values    ZH values    ZH values

I started simply with:

tablesList = arcpy.ListTables()
arcpy.Merge_management(tablesList, "merged.dbf")‍‍‍‍

which gave me the four columns I was after but the output was arranged as:

rows 0 - 100:        LABEL (0 - 100)    ZH values        0                   0
rows 101 - 201:    LABEL (0 - 100)    0                      ZH values     0
rows 202 - 302:    Label (0 - 100)       0                      0                   ZH values

I have attempted to use field mapping as follows:

import arcpy
TESTDIR = <some dir>

arcpy.env.workspace = TESTDIR
arcpy.env.scratchWorkspace = TESTDIR
arcpy.env.overwriteOutput = True

tablesList = arcpy.ListTables()
inFields = [u'LABEL', u'AAA', u'BBB', u'CCC'] # Generated by arcpy.Describe()
fieldMappings = arcpy.FieldMappings()

 # Deal with "LABEL" as special case (first field of first table)
fMap = arcpy.FieldMap()
fMap.addInputField(tablesList[0], inFields[0])
outField = fMap.outputField
outField.name = inFields[0]
fMap.outputField = outField
fieldMappings.addFieldMap(fMap)

 # Add the second field of each file to the map
for i in range(len(tablesList)):
 fMap = arcpy.FieldMap()
 fMap.addInputField(tablesList[i], inFields[i + 1])
 outField = fMap.outputField
 outField.name = inFields[i + 1]
 fMap.outputField = outField
 fieldMappings.addFieldMap(fMap)
arcpy.Merge_management(tablesList, "merged.dbf", fieldMappings)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The output was the same. It seems I have merely replicated the built-in Merge_management. I would be most grateful for any advice on modifying the script to generate the merged output I am after.

0 Kudos
1 Solution

Accepted Solutions
curtvprice
MVP Esteemed Contributor

I see when the TableToTable conversion is complete, the dbf has had its field names mangled with trailing "_" and digits and restricting them to 10 character

The 10 character truncation is because you your output is in shapefile  or .dbf format. If you output to a geodatabase your field names will not be mangled to fit in the 10 character limit. The field names changes are those that are required to make the field name valid for the output format see the ValidateFieldName method in the help for more on that.

You may have better luck if you set the environment arcpy.env.qualifiedFieldNames to False, this will prevent your merge from trying to include the prefix in the output name. (When I think about it, this may be your main issue.)

View solution in original post

13 Replies
DanPatterson_Retired
MVP Esteemed Contributor

LABEL    AAA             BBB             CCC
0 - 100    ZH values    ZH values    ZH values

This suggests to me a join on the LABEL field.  But is there a many to one, one to one etc.

Maybe if you have a larger sample to show, it might help.

Join with a summarize? hard to tell

RandyBurton
MVP Regular Contributor

It looks as if you want to join tables, not merge them.  See Add Join for more information.  You would join on the LABEL field.

An alternative to the join  is to read the dbf files into dictionaries using a SearchCursor, combining them and saving the results.  An example:

import arcpy

dbf = r"C:\path\to\dbf\AAA.dbf"
flds = ['LABEL', 'AAA']
dic1 = {r[0]: r[1] for r in arcpy.da.SearchCursor(dbf, flds)} 

dbf = r"C:\path\to\dbf\BBB.dbf"
flds = ['LABEL', 'BBB']
dic2 = {r[0]: r[1]  for r in arcpy.da.SearchCursor(dbf, flds)}

dbf = r"C:\path\to\dbf\CCC.dbf"
flds = ['LABEL', 'CCC']
dic3 = {r[0]: r[1]  for r in arcpy.da.SearchCursor(dbf, flds)}

print "LABEL\tAAA\tBBB\tCCC"
for k, v in dic1.iteritems():
    print "{}\t{}\t{}\t{}".format(k, v, dic2, dic3)

# would give something like:
'''
LABEL	AAA	BBB	CCC
0	100	200	300
1	101	201	301
2	102	202	302
3	103	203	303
4	104	204	304
'''‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
curtvprice
MVP Esteemed Contributor

Or, Add Join tables 2 and 3 to table 1 and copy the table view with Table to Table.  This method has a nice advantage in that you can use a field map to control the order and names of the output fields. 

IanMCDONALD
New Contributor III

Hi All,

I'm having a go at both approaches.  I need to gain some experience with field mappings, so ...

After the join, tbl1 has the fieldnames:

[u'AAA.OID', u'AAA.LABEL', u'AAA.AAA', u'BBB.OID', u'BBB.LABEL', u'BBB.BBB', u'CCC.OID', u'CCC.LABEL', u'CCC.CCC']

I determine which of these I want to keep for the TableToTable_conversion with:

keepFields = [u'AAA.OID', u'AAA.LABEL', u'AAA.AAA', u'BBB.BBB', u'CCC.CCC']

and attempt to build the fieldMappings with:

fieldMappings = arcpy.FieldMappings()
for field in fieldnames:
 fMap = arcpy.FieldMap()
 if field in keepFields:
  fMap.addInputField(tbl1, field)
  outField = fMap.outputField
  outField.name = field
  fMap.outputField = outField
 fieldMappings.addFieldMap(fMap)‍‍‍‍‍‍‍‍‍

which produces the error:

RuntimeError: FieldMap: Error in adding input field to field map

for line 5.

Can you suggest how I might follow this up?  My limited experience can find no help in the error message at all!

Ian

0 Kudos
curtvprice
MVP Esteemed Contributor

I would print field in the loop so you can figure which one failed. 

IanMCDONALD
New Contributor III

Yes, I had started that task when your suggestion arrived.  It turns out all of them break it.

I see when the TableToTable conversion is complete, the dbf has had its field names mangled with trailing "_" and digits and restricting them to 10 characters.  I was thinking this happened at this step but perhaps it occurs at the AddJoin_management?  If this is the case, I can understand the addInputField might get confused.

0 Kudos
curtvprice
MVP Esteemed Contributor

I see when the TableToTable conversion is complete, the dbf has had its field names mangled with trailing "_" and digits and restricting them to 10 character

The 10 character truncation is because you your output is in shapefile  or .dbf format. If you output to a geodatabase your field names will not be mangled to fit in the 10 character limit. The field names changes are those that are required to make the field name valid for the output format see the ValidateFieldName method in the help for more on that.

You may have better luck if you set the environment arcpy.env.qualifiedFieldNames to False, this will prevent your merge from trying to include the prefix in the output name. (When I think about it, this may be your main issue.)

IanMCDONALD
New Contributor III

Hi Curtis,

Setting the environment in this way certainly helped to make sense of what was happening with the field names.  All the field names still failed so I figured I should look at the other parameter in addInputField.  It turns out I was trying to use the arcpy.mapping.TableView view of the input dataset rather than the dataset name itself.  

Thanks also for the 10-character limit explanation.

Ian

0 Kudos
IanMCDONALD
New Contributor III

Thank you Dan, Randy and Curtis.  I am most grateful for your ideas.  I am unable to follow up on them this evening but I will let you know how I get on.

0 Kudos