Take identical fields and concatenate additional field

JeremyCrowley · ‎03-25-2014

I have been having an issue that I cannot seem to solve simply. Any help would be much appreciated.
I have a table (over 3000 rows) with a field "Map_ Code" (about 800 unique codes) which has multiple
"DataSources" with each code. I am trying to get a table of unique Map_Codes along with all the
DataSources for that Map_Code and the summed "Frequency". I have played with pivot table, concatenate
rows, update cursors etc.

Here is an example:

OID   Frequency   Map_Code DataSource
0              81              !@f         DSP14
1              20              !@f         DSP77
2              9                !@f         DSP81
3              102            !@s         DSP19
4              20              !@s         DSP27

What I would like to have is

OID Frequency   Map_Code             DataSource
0          110             !@f          DSP14;DSP77;DSP81
1          122             !@s          DSP19;DSP27

XanderBakker · ‎03-26-2014

Hi Jeremy,

Take a look at the code below. Basically I reference a table (could also be a feature class) and specify the fields. In the loop through the table of feature class I check to see if the mapcode (mc) is already in the dictionary. If so the frequency is added (summed) and the datasource is added to the list. If the mapcode is not in the dictionary a new entry is created with the frequency and a list holding a single datasource.

At the end I create a unique list of the datasources (per mapcode) with the set command and I join the list by putting a semicolon between each datasource.

import arcpy  # your data: tbl = r'C:\Project\_Forums\_maxDict\fgdb\test.gdb\aTable2' fld_freq = 'Frequency' fld_mapcode = 'Map_Code' fld_datasource = 'DataSource'  # create empty dictionary to store results dct = {}  # loop through table flds = (fld_freq, fld_mapcode, fld_datasource) with arcpy.da.SearchCursor(tbl, flds) as curs:     for row in curs:         freq = row[0]         mc = row[1]         ds = row[2]         if mc in dct:             freq_tot = dct[mc][0] + freq             lst_ds = dct[mc][1]             lst_ds.append(ds)             dct[mc] = (freq_tot, lst_ds)         else:             dct[mc] = (freq, [ds])  # print results for mc, val in dct.items():     lst_ds = sorted(set(val[1])) # make a unique sorted list     freq_tot = val[0]     print freq_tot,  mc, ";".join(lst_ds)

The result is this:
122.0 !@s DSP19;DSP27
110.0 !@f DSP14;DSP77;DSP81

Kind regards,

Xander

View solution in original post

XanderBakker · ‎03-26-2014

Hi Jeremy,

Take a look at the code below. Basically I reference a table (could also be a feature class) and specify the fields. In the loop through the table of feature class I check to see if the mapcode (mc) is already in the dictionary. If so the frequency is added (summed) and the datasource is added to the list. If the mapcode is not in the dictionary a new entry is created with the frequency and a list holding a single datasource.

At the end I create a unique list of the datasources (per mapcode) with the set command and I join the list by putting a semicolon between each datasource.

import arcpy  # your data: tbl = r'C:\Project\_Forums\_maxDict\fgdb\test.gdb\aTable2' fld_freq = 'Frequency' fld_mapcode = 'Map_Code' fld_datasource = 'DataSource'  # create empty dictionary to store results dct = {}  # loop through table flds = (fld_freq, fld_mapcode, fld_datasource) with arcpy.da.SearchCursor(tbl, flds) as curs:     for row in curs:         freq = row[0]         mc = row[1]         ds = row[2]         if mc in dct:             freq_tot = dct[mc][0] + freq             lst_ds = dct[mc][1]             lst_ds.append(ds)             dct[mc] = (freq_tot, lst_ds)         else:             dct[mc] = (freq, [ds])  # print results for mc, val in dct.items():     lst_ds = sorted(set(val[1])) # make a unique sorted list     freq_tot = val[0]     print freq_tot,  mc, ";".join(lst_ds)

The result is this:
122.0 !@s DSP19;DSP27
110.0 !@f DSP14;DSP77;DSP81

Kind regards,

Xander

JeremyCrowley · ‎03-26-2014

Wow, Xander, that is phenomenal. There are a lot of tricks in there that I will start using. Thank you very much for the excellent solution! You saved me quite a bit of time and fumbling.

Jeremy

JeremyCrowley · ‎03-26-2014

Another question: How would I modify the last piece of code to output loop results to a table? I tried append and copy rows.

This didn't work (RuntimeError: Object: Error in executing tool)

arcpy.CreateTable_management("C:/Output", "mmc2.dbf", "")
arcpy.AddField_management("mmc2.dbf", "Map_Code", "Text", 9, "", 10, "", "NULLABLE", "")
arcpy.AddField_management("mmc2.dbf", "Frequency", "LONG", 9, 0, "", "", "NULLABLE", "")
arcpy.AddField_management("mmc2.dbf", "DataSource", "Text", 9, "", 255, "", "NULLABLE", "")

for mc, val in dct.items():
    lst_ds = sorted(set(val[1])) # make a unique sorted list
    freq_tot = val[0]
    arcpy.CopyRows_management([freq_tot, mc, ";".join(lst_ds)], mmc2)

XanderBakker · ‎03-26-2014

Hi Jeremy,

I normally prefer using an insert cursor, although James Crandall once showed me a more elegant way using panda and numpy. Below the code using the insert cursor:

I added a reference to the os module

two extra variables to indicate the output workspace (path of fgdb) and the output table name

I copied your code for creating the empty table

I changed adding the fields slightly (now uses the same variables used earlier in the script)

The insert cursor create a row (tuple in the same order of fields as the 'flds' variable)

and finally insert the row into the cursor

import arcpy, os
arcpy.env.overwriteOutput = True

# your data:
tbl = r'C:\Project\_Forums\_maxDict\fgdb\test.gdb\aTable2'
fld_freq = 'Frequency'
fld_mapcode = 'Map_Code'
fld_datasource = 'DataSource'

# extra vars for output table
out_ws = r'C:\Output' # r'C:\Project\_Forums\_maxDict\fgdb\test.gdb'
tbl_name = 'mmc2.dbf' # 'mmc2'

# create empty dictionary to store results
dct = {}

# loop through table
flds = (fld_freq, fld_mapcode, fld_datasource)
with arcpy.da.SearchCursor(tbl, flds) as curs:
    for row in curs:
        freq = row[0]
        mc = row[1]
        ds = row[2]
        if mc in dct:
            freq_tot = dct[mc][0] + freq
            lst_ds = dct[mc][1]
            lst_ds.append(ds)
            dct[mc] = (freq_tot, lst_ds)
        else:
            dct[mc] = (freq, [ds])


# write results to table
arcpy.CreateTable_management(out_ws, tbl_name)
tbl_out = os.path.join(out_ws, tbl_name)
arcpy.AddField_management(tbl_out, fld_mapcode, "Text", 9, "", 10, "", "NULLABLE", "")
arcpy.AddField_management(tbl_out, fld_freq, "LONG", 9, 0, "", "", "NULLABLE", "")
arcpy.AddField_management(tbl_out, fld_datasource, "Text", 9, "", 255, "", "NULLABLE", "")

# we still have the var flds
with arcpy.da.InsertCursor(tbl_out, flds) as curs:
    for mc, val in dct.items():
        lst_ds = sorted(set(val[1]))
        freq_tot = val[0]
        row = (freq_tot, mc, ";".join(lst_ds))
        curs.insertRow(row)

With the chance of confusing you a little bit here is a numpy solution:

import arcpy, os, numpy
arcpy.env.overwriteOutput = True

# your data:
tbl = r'C:\Project\_Forums\_maxDict\fgdb\test.gdb\aTable2'
fld_freq = 'Frequency'
fld_mapcode = 'Map_Code'
fld_datasource = 'DataSource'

# extra vars for output
out_ws = r'C:\Output' # r'C:\Project\_Forums\_maxDict\fgdb\test.gdb'
tbl_name = 'mmc2.dbf' # 'mmc2'

# create empty dictionary to store results
dct = {}

# loop through table
flds = (fld_freq, fld_mapcode, fld_datasource)
with arcpy.da.SearchCursor(tbl, flds) as curs:
    for row in curs:
        freq = row[0]
        mc = row[1]
        ds = row[2]
        if mc in dct:
            freq_tot = dct[mc][0] + freq
            lst_ds = dct[mc][1]
            lst_ds.append(ds)
            dct[mc] = (freq_tot, lst_ds)
        else:
            dct[mc] = (freq, [ds])


# create list for use in numpy; each item is a tuple of frequency, map code and list of datasources
lst_out = [(val[0], mc, ";".join(sorted(set(val[1])))) for mc, val in dct.items()]

# convert the list to a numpy array
npa = numpy.array(lst_out, numpy.dtype([(fld_freq, numpy.int32), (fld_mapcode, '|S9'), (fld_datasource, '|S255')]))

# store the table
arcpy.da.NumPyArrayToTable(npa, os.path.join(out_ws, tbl_name), flds)

JeremyCrowley · ‎03-27-2014

Xander,

You are a wizard! I tried multiple ways of getting the lists into a table and couldn't figure it out. Thank you very much for your help.

Cheers,

Jeremy