Dynamic segmentation with respect to a control factor

ChukwuemekaEzeiruaku · ‎05-07-2015

Is there a way either with python scripts or other solution to dynamically segment the following table?

Input table (not a feature class)

Item ID	Begin	End
1000	0	1
1001	0	5
1002	1	7
1003	4	5
1004	3	5

Each segmented section (from-to) should have all the item numbers written to a field that is within a certain range. From the input table, the final table should look like the one below:

Output table

Begin	End	Item ID	Item ID
0	1	1000	1001
1	3	1001	1001; 1002
3	4	1002	1001; 1002; 1004
4	5	1003	1001; 1002; 1003; 1004
5	7	1004	1002

Thanks.

BlakeTerhune · ‎05-07-2015

I tested it with your sample data, but I don't know how well it will work with more a more complex dataset. Note that the output field that lists all of the ItemID values has to have a limit, which I arbitrarily set to 255. Of course, you'll also need to change the paths and names accordingly.

def main():
    import arcpy
    import os

    # Local variables
    mygdb = r"N:\TechTemp\BlakeT\Work\TEMP.gdb"
    mytable = os.path.join(mygdb, "SegmentSequence")

    # Read input table into dictionary
    itemsdict = {}
    with arcpy.da.SearchCursor(mytable, ["ItemID", "Begin", "End"]) as s_cursor:
        for item, begin, end in s_cursor:
            itemsdict[item] = (begin, end)

    # Identify segments
    allsegments = [i[0] for i in itemsdict.values()] + [i[1] for i in itemsdict.values()]
    segments = tuple(sorted(set(allsegments)))
    del allsegments

    # Create output table
    arcpy.CreateTable_management(mygdb, "SegmentSequence_output")
    mytable_out = os.path.join(mygdb, "SegmentSequence_output")
    arcpy.AddField_management(mytable_out, "Begin", "SHORT")
    arcpy.AddField_management(mytable_out, "End", "SHORT")
    arcpy.AddField_management(mytable_out, "SegmentItems", "TEXT", field_length=255)

    # Write segment items to output table
    with arcpy.da.InsertCursor(mytable_out, ["Begin", "End", "SegmentItems"]) as i_cursor:
        for enum, seg in enumerate(segments):
            try:
                begin = seg
                end = segments[enum + 1]
            except IndexError:
                break
            seg_items = [k for k, v in itemsdict.items() if v[0] <= begin and v[1] >= end]
            row = (begin, end, str(seg_items).strip('[]'))
            i_cursor.insertRow(row)


if __name__ == '__main__':
    main()

View solution in original post

BlakeTerhune · ‎05-07-2015

Are all of the the "Item ID" values unique in the source table?

ChukwuemekaEzeiruaku · ‎05-07-2015

Yes, they are all unique

BlakeTerhune · ‎05-07-2015

In the output table, how did you get a Begin range value of 5 when there was no Begin value of 5 in the input table?

ChukwuemekaEzeiruaku · ‎05-07-2015

there's an end value of 5 which the next range starts from.

BlakeTerhune · ‎05-07-2015

So it's kind of like this?

               1004__________
                    1003_____
     1002______________________________
1001_________________________
1000_____
    0    1    2    3    4    5    6    7

So the segment from 0 to 1 includes items 1000, 1001

1 to 2 includes items 1001, 1002

2 to 3 includes items 1001, 1002

3 to 4 includes items 1001, 1002, 1004

4 to 5 includes items 1001, 1002, 1003, 1004

5 to 6 includes items 1002

6 to 7 includes items 1002

But it looks like you dissolve the segments that have the same items. So 1-2 and 2-3 become 1-3? Same with 5-6 and 6-7 becoming 5-7?

ChukwuemekaEzeiruaku · ‎05-07-2015

Ooops, the output table should look like:

Begin	End	Item ID
0	1	1001
1	3	1001; 1002
3	4	1001; 1002; 1004
4	5	1001; 1002; 1003; 1004
5	7	1002

Sorry about that.

BlakeTerhune · ‎05-07-2015

I tested it with your sample data, but I don't know how well it will work with more a more complex dataset. Note that the output field that lists all of the ItemID values has to have a limit, which I arbitrarily set to 255. Of course, you'll also need to change the paths and names accordingly.

def main():
    import arcpy
    import os

    # Local variables
    mygdb = r"N:\TechTemp\BlakeT\Work\TEMP.gdb"
    mytable = os.path.join(mygdb, "SegmentSequence")

    # Read input table into dictionary
    itemsdict = {}
    with arcpy.da.SearchCursor(mytable, ["ItemID", "Begin", "End"]) as s_cursor:
        for item, begin, end in s_cursor:
            itemsdict[item] = (begin, end)

    # Identify segments
    allsegments = [i[0] for i in itemsdict.values()] + [i[1] for i in itemsdict.values()]
    segments = tuple(sorted(set(allsegments)))
    del allsegments

    # Create output table
    arcpy.CreateTable_management(mygdb, "SegmentSequence_output")
    mytable_out = os.path.join(mygdb, "SegmentSequence_output")
    arcpy.AddField_management(mytable_out, "Begin", "SHORT")
    arcpy.AddField_management(mytable_out, "End", "SHORT")
    arcpy.AddField_management(mytable_out, "SegmentItems", "TEXT", field_length=255)

    # Write segment items to output table
    with arcpy.da.InsertCursor(mytable_out, ["Begin", "End", "SegmentItems"]) as i_cursor:
        for enum, seg in enumerate(segments):
            try:
                begin = seg
                end = segments[enum + 1]
            except IndexError:
                break
            seg_items = [k for k, v in itemsdict.items() if v[0] <= begin and v[1] >= end]
            row = (begin, end, str(seg_items).strip('[]'))
            i_cursor.insertRow(row)


if __name__ == '__main__':
    main()

ChukwuemekaEzeiruaku · ‎05-07-2015

this is awesome Blake, i will test it and get back to you.

Thanks

XanderBakker · ‎05-07-2015

Yep, Blake T answer is the correct one. I did a test and it yields the correct result.

Thought it was fun so I tried something too. See code below:

def main():
    import arcpy
    import numpy

    tbl_in = r'C:\Forum\TableFun\data.gdb\tbl158276'
    tbl_out = r'C:\Forum\TableFun\data.gdb\tbl158276_out'
    dct = {r[0]:(r[1], r[2]) for r in arcpy.da.SearchCursor(tbl_in, ('Item_ID', 'Begin_', 'End_'))}
    lst_tot = list(set([v[0] for v in dct.values()] + [v[1] for v in dct.values()]))

    dct_ranges = {}
    for i in range(len(lst_tot)-1):
        key_range = (lst_tot, lst_tot[i+1])
        dct_ranges[key_range] = []
        for key, tpl_range in sorted(dct.items()):
            if isInRange(tpl_range[0], tpl_range[1], key_range):
                dct_ranges[key_range].append(key)

    lst_numpy = []
    for k, v in sorted(dct_ranges.items()):
        lst_numpy.append((k[0], k[1], ",".join([str(i) for i in v])))

    array = numpy.array(lst_numpy,
                        numpy.dtype([('Begin', numpy.int32),
                                      ('End', numpy.int32),
                                      ('Item_ID', numpy.str_, 255)]))
    arcpy.da.NumPyArrayToTable(array, tbl_out)

def isInRange(begin, end, key_range):
    return key_range[0] < end and key_range[1] > begin

if __name__ == '__main__':
    main()