How do I create a new sequence of a point feature based on two corresponding fields?

ChristianStallings2 · ‎02-13-2019

I have a point file that has two fields with values that represent its neighboring point. We will call them Up for the point above it and down for the point below it for this. For example Up field is 1205 and the down field is 1201. For the next point in that sequence the up field is 1201 and the down field is 991. For the next point the up is 991 and the down is 977. The down field always indicates correctly the next point in the sequence however the numbers are not sequential. Also to add to the problem, they are string values and they may include a letter such as 981a however there is always a matching value for that in the next point in the opposite field. The final issue is that sometimes the sequence stops which would represent the end of one line feature and a new sequence begins that would represent the beginning of a new line feature. I would like to build a new field that can properly sequence these points so that I can convert this point file to a line file. Ideally I would like to do this in a way that I can incorporate it into a larger model. I really appreciate any help you can provide or other suggestions on how to approach the problem.

Thanks.

RandyBurton · ‎02-14-2019

It might be possible to navigate through your point feature using a dictionary, where the value of the point (down) becomes the key for the next point (up). From your sample data, the points starting and ending a line segment, do not match other points. That is, the starting point's up value does not have a match in the down column, and the ending point's down value does not have a match the up column.

In my test, I converted your shape file to a file geodatabase feature. To the feature, I added fields for line segment and segment order. The name for the line segment is the up value for the starting point. Here's a debug printout:

OID	UP	DOWN	LINE	ORDER
1	1118	1113	1118	1
2	1113	1121	1118	2
3	1121	1115	1118	3
4	1115	1114	1118	4
5	1114	1123	1118	5
6	1123	1119	1118	6
...
20	1031	1041	1118	20
21	1041	1009	1118	21
22	1009	1035	1118	22
23	1035	1040	1118	23
24	1040	1069	1118	24
25	995	1000	1184	4
26	1000	1103	1184	5
27	1003	987	1003	1
28	987	985	1003	2
29	985	993	1003	3
...
794	156	184	259	104
795	171	149	173B	2
796	149	164	173B	3
797	164	157	173B	4
798	157	137	173B	5
799	137	155	173B	6
800	688	670	743	56
801	680	688	743	55
802	683	680	743	54
803	184	178	259	105
804	178	L-14	259	106
805	L-14	178B	259	107
806	173B	171	173B	1
807	687	704	743	52
808	704	683	743	53
809	173	192	259	79
...
851	116	94	133	18
852	94	109	133	19
853	109	115	133	20
854	115	123	133	21
855	123	131	133	22
856	L-63A	744	L-63A	1
857	744	L-57A	L-63A	2
858	L-57A	675	L-63A	3
859	675	744A	L-63A	4
860	168	159	168	1
861	159	151	168	2
862	151	158	168	3
863	158	166	168	4
864	166	145	168	5
865	145	144	168	6
866	144	120	168	7
867	120	154	168	8
868	655	630	743	112
869	630	L-60	743	113
870	L-60	630A	743	114
871	630A	666C	743	115
872	666C	635	743	116
873	635	L-24A	743	117
874	L-24A	1083	743	118‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Here's the code I used:

import arcpy
import numpy as np

sourceFC = r'C:\Path\to\example.gdb\example'

# use OID@ token
sourceFieldsList = ["JOINT___UP", "JOINT___DO", "OID@"]
updateFields = ["OID@", "LineSeg", "SegOrder" ] # LineSeg is text, SegOrder is long int

valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)}

up = [] # values of JOINT___UP (the up point list)
do = [] # values of JOINT___DO (the down point list)

for k in valueDict:
    dn, id = valueDict[k] # k = key = JOINT___UP, dn = JOINT___DO, id =  OID@, LineSeg, SegOrder
    up.append(k) # up point values
    do.append(dn) # down point values

# https://stackoverflow.com/questions/41125909
# determine line start : up point doesn't have matching value in down point list
lineStart = np.setdiff1d(up,do)
# determine line end : down point doesn't have matching value in up point list
lineEnd = np.setdiff1d(do,up)

updateDict = {}
for ln in lineStart:
    # print "New line: {}".format(ln)
    seg_order = 1
    # key, next point, oid, line_seg, seg_order
    # print "\t{}\t{}\t{}\t{}\t{}".format(ln, valueDict[ln][0],valueDict[ln][1], ln, seg_order)
    updateDict[valueDict[ln][1]] = (ln, valueDict[ln][0], ln, seg_order)
    pt_next = valueDict[ln][0]

    while True:
        next_pt = pt_next
        seg_order += 1
        # print "\t{}\t{}\t{}\t{}\t{}".format(next_pt, valueDict[next_pt][0],valueDict[next_pt][1], ln, seg_order)
        updateDict[valueDict[next_pt][1]] = (next_pt, valueDict[next_pt][0], ln, seg_order)
        pt_next = valueDict[next_pt][0]    
        if valueDict[next_pt][0] in lineEnd:
            break

print # for debugging
print "OID\tUP\tDOWN\tLINE\tSEGMENT"
for k in sorted(updateDict):
    print "{}\t{}\t{}\t{}\t{}".format( k, updateDict[k][0],updateDict[k][1],updateDict[k][2],updateDict[k][3])

with arcpy.da.UpdateCursor(sourceFC, updateFields) as cursor:
    for row in cursor:
        row[1] = updateDict[row[0]][2] # line segment (starting point's JOINT___UP value)
        row[2] = updateDict[row[0]][3] # segment order
        cursor.updateRow(row)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

This will update the feature with a line segment name and a segment order. From there you should be able to use the Points to Line tool.

PointsToLine_management (Input_Features, Output_Feature_Class, {Line_Field}, {Sort_Field}, {Close_Line})

Hope this helps.

UPDATE:

The while block (lines 35-42 above) should be replaced with the following code. A line segment with only one point (because of a data entry error) would cause an error.

    while pt_next not in lineEnd:
        next_pt = pt_next
        seg_order += 1
        # print "\t{}\t{}\t{}\t{}\t{}".format(next_pt, valueDict[next_pt][0],valueDict[next_pt][1], ln, seg_order)
        updateDict[valueDict[next_pt][1]] = (next_pt, valueDict[next_pt][0], ln, seg_order)
        pt_next = valueDict[next_pt][0]‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Additionally, because of some issues with data entry errors, it is possible to not have a dictionary key for some rows. The code below (replacing lines 49-53 above) will print a warning message when a KeyError is encountered.

with arcpy.da.UpdateCursor(sourceFC, updateFields) as cursor:
    for row in cursor:
        try:
            row[1] = updateDict[row[0]][2] # line segment (starting point's JOINT___UP value)
            row[2] = updateDict[row[0]][3] # segment order
            cursor.updateRow(row)
        except KeyError:
            print "Unable to process row: {}".format(row[0])
            row[1] = 'ERROR'
            row[2] = -1
            cursor.updateRow(row)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

View solution in original post

JoshuaBixby · ‎02-13-2019

A lot to digest for one paragraph! Sometimes it helps to step back and state your requirement/need instead of asking for help with a solution because there might be a better overall approach.

Do you need to have the sequences renumbered, or is the main need to take the series of points and turn them into a line?

If you can share some examples of your data, that is most helpful, even if you make up some data that resembles the structure of your actual data.

DarrenWiens2 · ‎02-13-2019

Assuming your data looks like this (I added PRE_ORDER_ID and PRE_LINE_ID as columns indicating the desired final output):

import pandas as pd

data = [
    [30, 56, 1, 1],
    [78, 45, 5, 1],
    [45, 59, 6, 1],
    [59, 64, 7, 1],
    [12, 48, 3, 1],
    [56, 12, 2, 1],
    [48, 78, 4, 1],
    [82, 10, 3, 2],
    [84, 99, 5, 2],
    [79, 95, 1, 2],
    [95, 82, 2, 2],
    [10, 84, 4, 2],
]
df = pd.DataFrame(data, columns=['UP', 'DOWN', 'PRE_ORDER_ID', 'PRE_LINE_ID'])
df‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

...then you should be able to do something like this:

# determine if the point is a start, end, or neither
for index, row in df.iterrows():
    df.at[index,'START'] = row['UP'] not in df['DOWN'].values
    df.at[index,'END'] = row['DOWN'] not in df['UP'].values
    
counter = 0
line_counter = 1
pos_counter = 1
cur_index = 0
while counter < 100: # I'm paranoid about while loops, so change when you're comfortable
    df.at[cur_index,'LINE_ID'] = line_counter # record line number
    df.at[cur_index,'LINE_POS_ID'] = pos_counter # record position number
    if df.at[cur_index,'END']: # if it's an end point
        pos_counter = 0 # reset position counter
        line_counter += 1 # increment line counter
        cur_index = df.loc[df['START'] & df['LINE_ID'].isnull()].index[0] # set index to next available start point
    else:
        down_val = df.at[cur_index,'DOWN'] # find next point
        cur_index = df.loc[df['UP']==down_val].index[0] # set index
    pos_counter += 1
    counter +=1‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

...resulting in:

	UP	DOWN	PRE_ORDER_ID	PRE_LINE_ID	START	END	LINE_ID	LINE_POS_ID
0	30	56	1	1	True	False	1.0	1.0
1	78	45	5	1	False	False	1.0	5.0
2	45	59	6	1	False	False	1.0	6.0
3	59	64	7	1	False	True	1.0	7.0
4	12	48	3	1	False	False	1.0	3.0
5	56	12	2	1	False	False	1.0	2.0
6	48	78	4	1	False	False	1.0	4.0
7	82	10	3	2	False	False	2.0	3.0
8	84	99	5	2	False	True	2.0	5.0
9	79	95	1	2	True	False	2.0	1.0
10	95	82	2	2	False	False	2.0	2.0
11	10	84	4	2	False	False	2.0	4.0‍‍‍‍‍‍‍‍‍‍‍‍‍

...which you can use the last two columns to determine line ID, and position ID within the line.

ChristianStallings2 · ‎02-14-2019

Thank you Darren, I've uploaded an example file. Admittedly I'm having a little trouble running this. This is 100% due to my poor python skills. I'm using 2.7 in ArcMap 10.6. I don't believe I am indexing the dataframe correctly for 'UP' and 'Down' (I modified them to match their names in the shapefile) however I very well could be completely wrong. I really appreciate your help with this.

ChristianStallings2 · ‎02-14-2019

Darren,

I was able to populate my array finally using arcpy to convert a Numpy Array to a pandas DataFrame like so:

import arcpy
input = "C:\Users\cstallings\Desktop\From_James\Import_Excel_to_Database\Example_points_for_sequence_from_two_fields.shp"
arr = arcpy.da.TableToNumPyArray(input, ('JOINT___UP', 'JOINT___DO'))
data = pd.DataFrame(arr)
df = pd.DataFrame(data, columns=['JOINT___UP', 'JOINT___DO',])‍‍‍‍‍

I'm sure this is not the proper way but it did populate the DataFrame.

after running your suggested code it appears that some values came back NaN that i believe should not for example:

>>> df
    JOINT___UP JOINT___DO  START    END  LINE_ID  LINE_POS_ID
0         1118       1113   True  False      1.0          1.0
1         1113       1121  False  False      1.0          2.0
2         1121       1115  False  False      1.0          3.0
3         1115       1114  False  False      1.0          4.0
4         1114       1123  False  False      1.0          5.0
5         1123       1119  False  False      1.0          6.0
6         1119       1112  False  False      1.0          7.0
7         1112       1116  False  False      1.0          8.0
8         1116       1122  False  False      1.0          9.0
9         1122       1043  False  False      1.0         10.0
10        1043       1045  False  False      1.0         11.0
11        1045       1042  False  False      1.0         12.0
12        1042       1038  False  False      1.0         13.0
13        1038       1032  False  False      1.0         14.0
14        1032       1044  False  False      1.0         15.0
15        1044       1111  False  False      1.0         16.0
16        1111       1110  False  False      1.0         17.0
17        1110       1117  False  False      1.0         18.0
18        1117       1031  False  False      1.0         19.0
19        1031       1041  False  False      1.0         20.0
20        1041       1009  False  False      1.0         21.0
21        1009       1035  False  False      1.0         22.0
22        1035       1040  False  False      1.0         23.0
23        1040       1069  False  False      1.0         24.0
24         995       1000  False  False     16.0          4.0
25        1000       1103  False   True     16.0          5.0
26        1003        987   True  False      2.0          1.0
27         987        985  False  False      2.0          2.0
28         985        993  False  False      2.0          3.0
29         993        978  False  False      2.0          4.0
..         ...        ...    ...    ...      ...          ...
844        135        122  False  False     21.0         12.0
845        122        114  False  False     21.0         13.0
846        114        108  False  False     21.0         14.0
847        108        129  False  False     21.0         15.0
848        129        102  False  False     21.0         16.0
849        102        116  False  False     21.0         17.0
850        116         94  False  False     21.0         18.0
851         94        109  False  False     21.0         19.0
852        109        115  False  False     21.0         20.0
853        115        123  False  False     21.0         21.0
854        123        131  False   True     21.0         22.0
855      L-63A        744   True  False     22.0          1.0
856        744      L-57A  False  False     22.0          2.0
857      L-57A        675  False  False     22.0          3.0
858        675       744A  False   True     22.0          4.0
859        168        159   True  False     23.0          1.0
860        159        151  False  False     23.0          2.0
861        151        158  False  False     23.0          3.0
862        158        166  False  False     23.0          4.0
863        166        145  False  False     23.0          5.0
864        145        144  False  False     23.0          6.0
865        144        120  False  False     23.0          7.0
866        120        154  False   True     23.0          8.0
867        655        630  False  False      NaN          NaN
868        630       L-60  False  False      NaN          NaN
869       L-60       630A  False  False      NaN          NaN
870       630A       666C  False  False      NaN          NaN
871       666C        635  False  False      NaN          NaN
872        635      L-24A  False  False      NaN          NaN
873      L-24A       1083  False   True      NaN          NaN‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I'm way over my head at the moment but I think I'm pretty close to getting the values needed and populating new attribute fields from them.

Thank you again for your help!

DarrenWiens2 · ‎02-14-2019

It looks like, for some reason, the start point for the last line segment (line 57:

867        655        630  False  False

...did not get identified as a START. You can set it individually by:

df.at[867,'START'] = True

...or try to figure out why the last line segment didn't get identified as a START.

ChristianStallings2 · ‎02-14-2019

Thank you for your help Darren,

I think I will have to figure out why some segments didn't get identified as a start or a stop as there are a few in there. Lines 2 and 3 have duplicate Line_POS ID's (they could possibly actually have duplicate Line_ID's causing multiple Line_POS_ID's) Of the 874 points in the sample file, I have 64 without a LINE_D or Line_POS_ID and 51 with duplicate Line_ID & LINE_POS_ID.

RandyBurton · ‎02-14-2019

To assist in my understanding your question...

I am assuming the number in the up/down fields does not refer in any way to the OID. Could you explain the meaning of data such as L-60 and L-24A? Would these be line start/stop? Also, 630A and 666C? Would these be points used in multiple lines? Or would a line go L-60, 630A, 666C, 635, L-24A? And the next L-24A, 1083 ...etc?

ChristianStallings2 · ‎02-14-2019

Hi Randy, The up and downs do not refer to an OID they are somewhat random however they are set based on an joint of a line as part of a separate asset management system. They are all unique for the line system so none of them overlap. The L and C do not correlate with a start and a stop unfortunately. The one thing that is consistent is that for every up there is a matching down and vise versa with the exception of the start and end of a line segment. I hope that helps.

RandyBurton · ‎02-14-2019