Seeking help with arcpy.Merge_management

IanMCDONALD · ‎03-18-2019

I am seeking a method to merge table outputs from ZonalHistogram (ZH) using a Python script. These files have a simple two-column format. E.g., AAA.dbf has two fields:

LABEL - the X-axis with values ranging from 0 to 100
AAA - the Y-axis counts produced by ZH

Similarly, BBB.dbf has two fields, LABEL and BBB, CCC.dbf has LABEL and CCC. I am able to plot these individual table files with arcpy graph, or render them in Excel.

I am trying to merge these files into a single output with the structure:
LABEL AAA BBB CCC
0 - 100 ZH values ZH values ZH values

I started simply with:

tablesList = arcpy.ListTables()
arcpy.Merge_management(tablesList, "merged.dbf")‍‍‍‍‍‍

which gave me the four columns I was after but the output was arranged as:

rows 0 - 100:        LABEL (0 - 100)    ZH values        0                   0
rows 101 - 201:    LABEL (0 - 100)    0                      ZH values     0
rows 202 - 302:    Label (0 - 100)       0                      0                   ZH values

I have attempted to use field mapping as follows:

import arcpy
TESTDIR = <some dir>

arcpy.env.workspace = TESTDIR
arcpy.env.scratchWorkspace = TESTDIR
arcpy.env.overwriteOutput = True

tablesList = arcpy.ListTables()
inFields = [u'LABEL', u'AAA', u'BBB', u'CCC'] # Generated by arcpy.Describe()
fieldMappings = arcpy.FieldMappings()

 # Deal with "LABEL" as special case (first field of first table)
fMap = arcpy.FieldMap()
fMap.addInputField(tablesList[0], inFields[0])
outField = fMap.outputField
outField.name = inFields[0]
fMap.outputField = outField
fieldMappings.addFieldMap(fMap)

 # Add the second field of each file to the map
for i in range(len(tablesList)):
 fMap = arcpy.FieldMap()
 fMap.addInputField(tablesList[i], inFields[i + 1])
 outField = fMap.outputField
 outField.name = inFields[i + 1]
 fMap.outputField = outField
 fieldMappings.addFieldMap(fMap)
arcpy.Merge_management(tablesList, "merged.dbf", fieldMappings)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The output was the same. It seems I have merely replicated the built-in Merge_management. I would be most grateful for any advice on modifying the script to generate the merged output I am after.

curtvprice · ‎03-22-2019

I see when the TableToTable conversion is complete, the dbf has had its field names mangled with trailing "_" and digits and restricting them to 10 character

The 10 character truncation is because you your output is in shapefile or .dbf format. If you output to a geodatabase your field names will not be mangled to fit in the 10 character limit. The field names changes are those that are required to make the field name valid for the output format see the ValidateFieldName method in the help for more on that.

You may have better luck if you set the environment arcpy.env.qualifiedFieldNames to False, this will prevent your merge from trying to include the prefix in the output name. (When I think about it, this may be your main issue.)

View solution in original post

DanPatterson_Retired · ‎03-18-2019

LABEL AAA BBB CCC
0 - 100 ZH values ZH values ZH values

This suggests to me a join on the LABEL field. But is there a many to one, one to one etc.

Maybe if you have a larger sample to show, it might help.

Join with a summarize? hard to tell

RandyBurton · ‎03-18-2019

It looks as if you want to join tables, not merge them. See Add Join for more information. You would join on the LABEL field.

An alternative to the join is to read the dbf files into dictionaries using a SearchCursor, combining them and saving the results. An example:

import arcpy

dbf = r"C:\path\to\dbf\AAA.dbf"
flds = ['LABEL', 'AAA']
dic1 = {r[0]: r[1] for r in arcpy.da.SearchCursor(dbf, flds)} 

dbf = r"C:\path\to\dbf\BBB.dbf"
flds = ['LABEL', 'BBB']
dic2 = {r[0]: r[1]  for r in arcpy.da.SearchCursor(dbf, flds)}

dbf = r"C:\path\to\dbf\CCC.dbf"
flds = ['LABEL', 'CCC']
dic3 = {r[0]: r[1]  for r in arcpy.da.SearchCursor(dbf, flds)}

print "LABEL\tAAA\tBBB\tCCC"
for k, v in dic1.iteritems():
    print "{}\t{}\t{}\t{}".format(k, v, dic2, dic3)

# would give something like:
'''
LABEL	AAA	BBB	CCC
0	100	200	300
1	101	201	301
2	102	202	302
3	103	203	303
4	104	204	304
'''‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

curtvprice · ‎03-18-2019

Or, Add Join tables 2 and 3 to table 1 and copy the table view with Table to Table. This method has a nice advantage in that you can use a field map to control the order and names of the output fields.

IanMCDONALD · ‎03-21-2019

Hi All,

I'm having a go at both approaches. I need to gain some experience with field mappings, so ...

After the join, tbl1 has the fieldnames:

[u'AAA.OID', u'AAA.LABEL', u'AAA.AAA', u'BBB.OID', u'BBB.LABEL', u'BBB.BBB', u'CCC.OID', u'CCC.LABEL', u'CCC.CCC']

I determine which of these I want to keep for the TableToTable_conversion with:

keepFields = [u'AAA.OID', u'AAA.LABEL', u'AAA.AAA', u'BBB.BBB', u'CCC.CCC']

and attempt to build the fieldMappings with:

fieldMappings = arcpy.FieldMappings()
for field in fieldnames:
 fMap = arcpy.FieldMap()
 if field in keepFields:
  fMap.addInputField(tbl1, field)
  outField = fMap.outputField
  outField.name = field
  fMap.outputField = outField
 fieldMappings.addFieldMap(fMap)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

which produces the error:

RuntimeError: FieldMap: Error in adding input field to field map

for line 5.

Can you suggest how I might follow this up? My limited experience can find no help in the error message at all!

Ian

curtvprice · ‎03-21-2019

I would print field in the loop so you can figure which one failed.

IanMCDONALD · ‎03-22-2019

Yes, I had started that task when your suggestion arrived. It turns out all of them break it.

I see when the TableToTable conversion is complete, the dbf has had its field names mangled with trailing "_" and digits and restricting them to 10 characters. I was thinking this happened at this step but perhaps it occurs at the AddJoin_management? If this is the case, I can understand the addInputField might get confused.

curtvprice · ‎03-22-2019

I see when the TableToTable conversion is complete, the dbf has had its field names mangled with trailing "_" and digits and restricting them to 10 character

The 10 character truncation is because you your output is in shapefile or .dbf format. If you output to a geodatabase your field names will not be mangled to fit in the 10 character limit. The field names changes are those that are required to make the field name valid for the output format see the ValidateFieldName method in the help for more on that.

You may have better luck if you set the environment arcpy.env.qualifiedFieldNames to False, this will prevent your merge from trying to include the prefix in the output name. (When I think about it, this may be your main issue.)

IanMCDONALD · ‎03-23-2019

Hi Curtis,

Setting the environment in this way certainly helped to make sense of what was happening with the field names. All the field names still failed so I figured I should look at the other parameter in addInputField. It turns out I was trying to use the arcpy.mapping.TableView view of the input dataset rather than the dataset name itself.

Thanks also for the 10-character limit explanation.

Ian

IanMCDONALD · ‎03-19-2019

Thank you Dan, Randy and Curtis. I am most grateful for your ideas. I am unable to follow up on them this evening but I will let you know how I get on.