Dynamic segmentation with respect to a control factor

4728
12
Jump to solution
05-07-2015 08:51 AM
ChukwuemekaEzeiruaku
New Contributor II

Is there a way either with python scripts or other solution to dynamically segment the following table?

Input table (not a feature class)

 

Item IDBegin End
100001
100105
100217
100345
100435

Each segmented section (from-to) should have all the item numbers written to a field that is within a certain range. From the input table, the final table should look like the one below:

Output table

  

Begin EndItem IDItem ID
0110001001
1310011001; 1002
3410021001; 1002; 1004
4510031001; 1002; 1003; 1004
5710041002

Thanks.

0 Kudos
1 Solution

Accepted Solutions
BlakeTerhune
MVP Regular Contributor

I tested it with your sample data, but I don't know how well it will work with more a more complex dataset. Note that the output field that lists all of the ItemID values has to have a limit, which I arbitrarily set to 255. Of course, you'll also need to change the paths and names accordingly.

def main():
    import arcpy
    import os

    # Local variables
    mygdb = r"N:\TechTemp\BlakeT\Work\TEMP.gdb"
    mytable = os.path.join(mygdb, "SegmentSequence")

    # Read input table into dictionary
    itemsdict = {}
    with arcpy.da.SearchCursor(mytable, ["ItemID", "Begin", "End"]) as s_cursor:
        for item, begin, end in s_cursor:
            itemsdict[item] = (begin, end)

    # Identify segments
    allsegments = [i[0] for i in itemsdict.values()] + [i[1] for i in itemsdict.values()]
    segments = tuple(sorted(set(allsegments)))
    del allsegments

    # Create output table
    arcpy.CreateTable_management(mygdb, "SegmentSequence_output")
    mytable_out = os.path.join(mygdb, "SegmentSequence_output")
    arcpy.AddField_management(mytable_out, "Begin", "SHORT")
    arcpy.AddField_management(mytable_out, "End", "SHORT")
    arcpy.AddField_management(mytable_out, "SegmentItems", "TEXT", field_length=255)

    # Write segment items to output table
    with arcpy.da.InsertCursor(mytable_out, ["Begin", "End", "SegmentItems"]) as i_cursor:
        for enum, seg in enumerate(segments):
            try:
                begin = seg
                end = segments[enum + 1]
            except IndexError:
                break
            seg_items = [k for k, v in itemsdict.items() if v[0] <= begin and v[1] >= end]
            row = (begin, end, str(seg_items).strip('[]'))
            i_cursor.insertRow(row)


if __name__ == '__main__':
    main()

View solution in original post

12 Replies
BlakeTerhune
MVP Regular Contributor

Are all of the the "Item ID" values unique in the source table?

0 Kudos
ChukwuemekaEzeiruaku
New Contributor II

Yes, they are all unique

0 Kudos
BlakeTerhune
MVP Regular Contributor

In the output table, how did you get a Begin range value of 5 when there was no Begin value of 5 in the input table?

0 Kudos
ChukwuemekaEzeiruaku
New Contributor II

there's an end value of 5 which the next range starts from.

0 Kudos
BlakeTerhune
MVP Regular Contributor

So it's kind of like this?

               1004__________
                    1003_____
     1002______________________________
1001_________________________
1000_____
    0    1    2    3    4    5    6    7

So the segment from 0 to 1 includes items 1000, 1001

1 to 2 includes items 1001, 1002

2 to 3 includes items 1001, 1002

3 to 4 includes items 1001, 1002, 1004

4 to 5 includes items 1001, 1002, 1003, 1004

5 to 6 includes items 1002

6 to 7 includes items 1002

But it looks like you dissolve the segments that have the same items. So 1-2 and 2-3 become 1-3? Same with 5-6 and 6-7 becoming 5-7?

0 Kudos
ChukwuemekaEzeiruaku
New Contributor II

Ooops, the output table should look like:

  

BeginEndItem ID
011001
131001; 1002
341001; 1002; 1004
451001; 1002; 1003; 1004
57

1002

Sorry about that.

0 Kudos
BlakeTerhune
MVP Regular Contributor

I tested it with your sample data, but I don't know how well it will work with more a more complex dataset. Note that the output field that lists all of the ItemID values has to have a limit, which I arbitrarily set to 255. Of course, you'll also need to change the paths and names accordingly.

def main():
    import arcpy
    import os

    # Local variables
    mygdb = r"N:\TechTemp\BlakeT\Work\TEMP.gdb"
    mytable = os.path.join(mygdb, "SegmentSequence")

    # Read input table into dictionary
    itemsdict = {}
    with arcpy.da.SearchCursor(mytable, ["ItemID", "Begin", "End"]) as s_cursor:
        for item, begin, end in s_cursor:
            itemsdict[item] = (begin, end)

    # Identify segments
    allsegments = [i[0] for i in itemsdict.values()] + [i[1] for i in itemsdict.values()]
    segments = tuple(sorted(set(allsegments)))
    del allsegments

    # Create output table
    arcpy.CreateTable_management(mygdb, "SegmentSequence_output")
    mytable_out = os.path.join(mygdb, "SegmentSequence_output")
    arcpy.AddField_management(mytable_out, "Begin", "SHORT")
    arcpy.AddField_management(mytable_out, "End", "SHORT")
    arcpy.AddField_management(mytable_out, "SegmentItems", "TEXT", field_length=255)

    # Write segment items to output table
    with arcpy.da.InsertCursor(mytable_out, ["Begin", "End", "SegmentItems"]) as i_cursor:
        for enum, seg in enumerate(segments):
            try:
                begin = seg
                end = segments[enum + 1]
            except IndexError:
                break
            seg_items = [k for k, v in itemsdict.items() if v[0] <= begin and v[1] >= end]
            row = (begin, end, str(seg_items).strip('[]'))
            i_cursor.insertRow(row)


if __name__ == '__main__':
    main()
ChukwuemekaEzeiruaku
New Contributor II

this is awesome Blake, i will test it and get back to you.

Thanks

0 Kudos
XanderBakker
Esri Esteemed Contributor

Yep, Blake T answer is the correct one. I did a test and it yields the correct result.

Thought it was fun so I tried something too. See code below:

def main():
    import arcpy
    import numpy

    tbl_in = r'C:\Forum\TableFun\data.gdb\tbl158276'
    tbl_out = r'C:\Forum\TableFun\data.gdb\tbl158276_out'
    dct = {r[0]:(r[1], r[2]) for r in arcpy.da.SearchCursor(tbl_in, ('Item_ID', 'Begin_', 'End_'))}
    lst_tot = list(set([v[0] for v in dct.values()] + [v[1] for v in dct.values()]))

    dct_ranges = {}
    for i in range(len(lst_tot)-1):
        key_range = (lst_tot, lst_tot[i+1])
        dct_ranges[key_range] = []
        for key, tpl_range in sorted(dct.items()):
            if isInRange(tpl_range[0], tpl_range[1], key_range):
                dct_ranges[key_range].append(key)

    lst_numpy = []
    for k, v in sorted(dct_ranges.items()):
        lst_numpy.append((k[0], k[1], ",".join([str(i) for i in v])))

    array = numpy.array(lst_numpy,
                        numpy.dtype([('Begin', numpy.int32),
                                      ('End', numpy.int32),
                                      ('Item_ID', numpy.str_, 255)]))
    arcpy.da.NumPyArrayToTable(array, tbl_out)

def isInRange(begin, end, key_range):
    return key_range[0] < end and key_range[1] > begin

if __name__ == '__main__':
    main()