Script for simultaneous looping and dissolve

SalmanAhmed · ‎06-03-2015

In the table attached you can see the destination cities in 'Destination' field. Many of them are same - Werne Werne Werne.......

I want to create a script/model such that when run, it loops through all the rows with the same destination city (all the rows that have Werne for example) and then out of them selects the row that has the maximum value in the 'Commuters' field and then dissolves the destination city with the origin city ('Origin' field) for that row.

For example for Werne the row with the maximum value in the commuter field is the first row. So the tool should identify that and then dissolve Werne with Bergkamen in the map.

And then this should be done for every destination city. So basically the looping must happen only until the rows with same destination cities. Then after dissolving that city with its origin city a new loop should be created for the next destination city in the 'Destination' field.

XanderBakker · ‎06-07-2015

OK, I created some code that seems to yield the correct result, but it is pretty ugly and unreadable... The result look like this (thick black lines):

... and the attribute lists the Gemeindes that were used to create the merged polygons.

I have attached the resulting shapefile, so you can have a look and check the result.

The code used to create the output is listed below:

def main():
    import arcpy
    arcpy.env.overwriteOutput = True

    # input fc
    fc = r"C:\Forum\Gemeinde\NRW_Gemeinde_SpatiaJoin.shp"
    fld_from = "GEN"
    fld_to = "GEN_1"
    fld_val = "Value"

    # output field (will be added to input fc)
    fld_out = "Gemeindes"

    # output fc
    fc_out = r"C:\Forum\Gemeinde\NRW_Gemeinde_diss01.shp"

    # detemine the max commutors per from gemeinde
    dct_max = {}
    with arcpy.da.SearchCursor(fc, (fld_from, fld_to, fld_val)) as curs:
        for row in curs:
            from_g = row[0]
            to_g = row[1]
            val = row[2]
            # check that from_Gemeinde is not equal to to_Gemeinde
            if from_g != to_g:
                if from_g in dct_max:
                    # check if value is higher
                    tpl = dct_max[from_g]
                    max_to_g = tpl[0]
                    max_val = tpl[1]
                    if val > max_val:
                        # update entry in dct
                        dct_max[from_g] = (to_g, val)
                else:
                    # insert value
                    dct_max[from_g] = (to_g, val)

    lst_init = []
    for from_g, v in sorted(dct_max.items()):
        to_g = v[0]
        merged = False
        for i in range(0, len(lst_init)):
            lst = lst_init
            if from_g in lst or to_g in lst:
                # merge
                lst.extend([from_g, to_g])
                lst_init = lst
                merged = True
        if merged == False:
            lst_init.append([from_g, to_g])

    lst_init = [list(set(lst)) for lst in lst_init]
    lst_all = []
    for lst in lst_init:
        for a in lst:
            lst_all.append(a)
    lst_all = list(set(lst_all))

    dct = {}
    for a in lst_all:
        lst_ids = []
        for i in range(0, len(lst_init)):
            lst = lst_init
            if a in lst:
                lst_ids.append(i)
        dct=lst_ids

    lst_res = []
    for gem, lst_ids in sorted(dct.items()):
        # print gem, lst_ids
        lst_ids2 = lst_ids
        for ids in lst_ids:
            lst = lst_init[ids]
            for gem2 in lst:
                ids = dct[gem2]
                lst_ids2.extend(ids)
                lst_ids2 = list(set(lst_ids2))
        lst_res.append(lst_ids2)

    lst_res = sorted(lst_res)
    lst_res2 = []
    res = []
    for i in range(0, len(lst_res)):
        prev_res = res
        res = sorted(lst_res)
        if res != prev_res:
            lst_res2.append(res)

    for i in range(len(lst_res2)-1, -1, -1):
        res = lst_res2
        for j in range(0, i+1):
            res2 = lst_res2
            if len(list(set(res) & set(res2))) > 0:
                res.extend(res2)
                lst_res2 = sorted(list(set(res)))
                if i <> j:
                    lst_res2.pop(i)

    lst_missing = []
    for i in range(0, len(lst_init)):
        bln_found = False
        for lst in lst_res2:
            if i in lst:
                bln_found = True
                break
        if bln_found == False:
            lst_missing.append(i)
    lst_res2.append(lst_missing)

    lst_fin = []
    for res in sorted(lst_res2):
        lst_elem = []
        for ids in res:
            lst_gem = lst_init[ids]
            lst_elem.extend(lst_gem)
        lst_fin.append(list(set(lst_elem)))

    dct_fin = {}
    for fin in lst_fin:
        for g in fin:
            dct_fin = ",".join(fin)

    if len(arcpy.ListFields(fc, wild_card=fld_out)) == 0:
        arcpy.AddField_management(in_table=fc, field_name=fld_out, field_type="TEXT", field_length=255)

    flds = (fld_from, fld_out)
    with arcpy.da.UpdateCursor(fc, flds) as curs:
        for row in curs:
            gem = row[0]
            if gem in dct_fin:
                row[1] = dct_fin[gem]
            else:
                row[1] = gem
            curs.updateRow(row)

    # dissolve gemeindes
    arcpy.Dissolve_management(in_features=fc, out_feature_class=fc_out,
                              dissolve_field=fld_out, statistics_fields="",
                              multi_part="MULTI_PART")

if __name__ == '__main__':
    main()

View solution in original post

XanderBakker · ‎06-03-2015

What happens when for example Unna has been joined with Dortmund, but when evaluating Schwerte it turns out the Dortmund (big city) also has the highest value. Should this city be joined twice or should it only be joined to the highest value in Commuters? Or should the Destination be joined to the highest value of the Origin and then exclude that origin for the next destination?

Are the geometries of the cities polygons?

SalmanAhmed · ‎06-03-2015

I think in that case an ideal output will be Unna, Dortmund and Schwerte all 3 combined into one shape. Similarly if Schwerte later has the highest value for some other municipality then that municipality will also be combined with these 3. So its a kind of chain formation. This is a rationalization project so I want to form single bigger regions combining the little municipalities.

SalmanAhmed · ‎06-03-2015

Okay so dissolve still only gives the maximum value for each city. For example it only returns the row with Unna and Bergkamen and removes all other Unna rows. That is the first step but the second and more important thing I want to do is to then make Unna merge with Bergkamen into one polygon somehow. Any ideas about that?

BlakeTerhune · ‎06-03-2015

The Dissolve should make one Destination polygon out of all Werne rows, with a single Commuter value of 1073.328 (which is the max value, belonging to Bergkamen). If you want to include the Origin (or better yet, the OriginID) for that Commuter value, try adding a second statistic field with the FIRST option (LAST would also give the same result).

SalmanAhmed · ‎06-03-2015

First thing - I think statistic field with first or last option for OriginID would not work because if the first name in the origin field for a destination is not the one with maximum commuters then it would select that first name but give the max commuter value from some other name. Same goes for the last option. So it will be mismatch between origin and commuter value.

Second thing - yes this can be done but any idea how I would then combine origin and destination polygons for each row after this?

XanderBakker · ‎06-04-2015

Would it be possible to attach (a part of) the dataset? I am not convinced that Dissolve is the option, but to be sure, I need to take a closer look at the data.

SalmanAhmed · ‎06-05-2015

Dropbox - Dataset.rar

Here you go. I'm sorry I don't know how I can attach a rar file here directly.

So the spatial join shapefile is the one in which the dissolve is to be done. And the selection shapefile has the attribute table that shows which polygon(destination) needs to be dissolved with which other(origin).

XanderBakker · ‎06-05-2015

Hi Salman Ahmed ,

I have attached a PDF to aid me in trying to explain the questions I have. The PDF looks something like this:

You have a lot of polygons (Gemeindes) that are includes multiple times in you shapefile. Each polygon contains the from Gemeinde geometry and indicate the number of commuters to other Gemeindes.

If I look at Gemeinde Bergheim in the upper left corner, I see that the highest number of commuters go to Kerpen (lower left) as indicated with the blue arrow pointing south. For Kerpen the highest number go to Frechen (which does no occur in the selection shapefile hence the missing hatching). For Frechen the highest number go to Köln and from Köln the highest number go to Leverkusen.

What should exactly be dissolved into what and what to do with those polygons that will not be merged, but still exist in the shapefile?

Why are there polygons excluded in the selection shape?

As a side note, the arrows were generate with some Python code listed below:

def main():
    import arcpy
    fc = r"D:\Xander\GeoNet\Gemeinde\NRW_Gemeinde_SpatiaJoin.shp"
    fld_from = "GEN"
    fld_to = "GEN_1"
    fld_val = "Value"

    fc_out = r"D:\Xander\GeoNet\Gemeinde\max_con_v01.shp"

    # create a dictionary of all the polygons, with key from gemeinde
    dct_pol = {r[0]: r[1] for r in arcpy.da.SearchCursor(fc, (fld_from, "SHAPE@"))}

    # detemine the max commutors per from gemeinde
    dct_max = {}
    with arcpy.da.SearchCursor(fc, (fld_from, fld_to, fld_val)) as curs:
        for row in curs:
            from_g = row[0]
            to_g = row[1]
            val = row[2]
            if from_g in dct_max:
                # see is value is higher
                tpl = dct_max[from_g]
                max_to_g = tpl[0]
                max_val = tpl[1]
                if val > max_val:
                    # update entry in dct
                    dct_max[from_g] = (to_g, val)
                else:
                    # don't update, leave as is
                    pass
            else:
                # insert value
                dct_max[from_g] = (to_g, val)

    # create the connection lines
    lst_lines = []
    sr = arcpy.Describe(fc).spatialReference
    for from_g, tpl in dct_max.items():
        to_g = tpl[0]
        pnt_f = dct_pol[from_g].trueCentroid
        pnt_t = dct_pol[to_g].trueCentroid
        polyline = arcpy.Polyline(arcpy.Array([pnt_f, pnt_t]), sr)
        lst_lines.append(polyline)

    # write lines to output
    arcpy.CopyFeatures_management(lst_lines, fc_out)

if __name__ == '__main__':
    main()

SalmanAhmed · ‎06-05-2015

Hello,

In your example I would then want Bergheim, Kerpen, Frechen and Koln all dissolved into one polygon. It is a kind of chain that is formed. From Bergheim highest no. of commuters go to Kerpen so dissolve Bergheim with Kerpen. From Kerpen highest no. goes to Frechen. So dissolve Kerpen with Frechen. But Kerpen was already dissolved with Bergheim in the first step so now that polygon which was dissolved before (Bergheim + Kerpen) will further dissolve with Frechen. And this chain will continue for Koln and further. I do believe that at some point this chain will break when for example the highest no. of commuters will go from Koln back to Frechen, then this chain will stop. And similarly other chain have to be created. Hope you get the idea.

And the polygons contain the origin geometry but its better to look at the destinations in the attribute table because in the destination field each name comes only once. But in the origin field it may come multiple times.