I have a point file that has two fields with values that represent its neighboring point. We will call them Up for the point above it and down for the point below it for this. For example Up field is 1205 and the down field is 1201. For the next point in that sequence the up field is 1201 and the down field is 991. For the next point the up is 991 and the down is 977. The down field always indicates correctly the next point in the sequence however the numbers are not sequential. Also to add to the problem, they are string values and they may include a letter such as 981a however there is always a matching value for that in the next point in the opposite field. The final issue is that sometimes the sequence stops which would represent the end of one line feature and a new sequence begins that would represent the beginning of a new line feature. I would like to build a new field that can properly sequence these points so that I can convert this point file to a line file. Ideally I would like to do this in a way that I can incorporate it into a larger model. I really appreciate any help you can provide or other suggestions on how to approach the problem.
Thanks.
Solved! Go to Solution.
It might be possible to navigate through your point feature using a dictionary, where the value of the point (down) becomes the key for the next point (up). From your sample data, the points starting and ending a line segment, do not match other points. That is, the starting point's up value does not have a match in the down column, and the ending point's down value does not have a match the up column.
In my test, I converted your shape file to a file geodatabase feature. To the feature, I added fields for line segment and segment order. The name for the line segment is the up value for the starting point. Here's a debug printout:
OID UP DOWN LINE ORDER
1 1118 1113 1118 1
2 1113 1121 1118 2
3 1121 1115 1118 3
4 1115 1114 1118 4
5 1114 1123 1118 5
6 1123 1119 1118 6
...
20 1031 1041 1118 20
21 1041 1009 1118 21
22 1009 1035 1118 22
23 1035 1040 1118 23
24 1040 1069 1118 24
25 995 1000 1184 4
26 1000 1103 1184 5
27 1003 987 1003 1
28 987 985 1003 2
29 985 993 1003 3
...
794 156 184 259 104
795 171 149 173B 2
796 149 164 173B 3
797 164 157 173B 4
798 157 137 173B 5
799 137 155 173B 6
800 688 670 743 56
801 680 688 743 55
802 683 680 743 54
803 184 178 259 105
804 178 L-14 259 106
805 L-14 178B 259 107
806 173B 171 173B 1
807 687 704 743 52
808 704 683 743 53
809 173 192 259 79
...
851 116 94 133 18
852 94 109 133 19
853 109 115 133 20
854 115 123 133 21
855 123 131 133 22
856 L-63A 744 L-63A 1
857 744 L-57A L-63A 2
858 L-57A 675 L-63A 3
859 675 744A L-63A 4
860 168 159 168 1
861 159 151 168 2
862 151 158 168 3
863 158 166 168 4
864 166 145 168 5
865 145 144 168 6
866 144 120 168 7
867 120 154 168 8
868 655 630 743 112
869 630 L-60 743 113
870 L-60 630A 743 114
871 630A 666C 743 115
872 666C 635 743 116
873 635 L-24A 743 117
874 L-24A 1083 743 118
Here's the code I used:
import arcpy
import numpy as np
sourceFC = r'C:\Path\to\example.gdb\example'
# use OID@ token
sourceFieldsList = ["JOINT___UP", "JOINT___DO", "OID@"]
updateFields = ["OID@", "LineSeg", "SegOrder" ] # LineSeg is text, SegOrder is long int
valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)}
up = [] # values of JOINT___UP (the up point list)
do = [] # values of JOINT___DO (the down point list)
for k in valueDict:
dn, id = valueDict[k] # k = key = JOINT___UP, dn = JOINT___DO, id = OID@, LineSeg, SegOrder
up.append(k) # up point values
do.append(dn) # down point values
# https://stackoverflow.com/questions/41125909
# determine line start : up point doesn't have matching value in down point list
lineStart = np.setdiff1d(up,do)
# determine line end : down point doesn't have matching value in up point list
lineEnd = np.setdiff1d(do,up)
updateDict = {}
for ln in lineStart:
# print "New line: {}".format(ln)
seg_order = 1
# key, next point, oid, line_seg, seg_order
# print "\t{}\t{}\t{}\t{}\t{}".format(ln, valueDict[ln][0],valueDict[ln][1], ln, seg_order)
updateDict[valueDict[ln][1]] = (ln, valueDict[ln][0], ln, seg_order)
pt_next = valueDict[ln][0]
while True:
next_pt = pt_next
seg_order += 1
# print "\t{}\t{}\t{}\t{}\t{}".format(next_pt, valueDict[next_pt][0],valueDict[next_pt][1], ln, seg_order)
updateDict[valueDict[next_pt][1]] = (next_pt, valueDict[next_pt][0], ln, seg_order)
pt_next = valueDict[next_pt][0]
if valueDict[next_pt][0] in lineEnd:
break
print # for debugging
print "OID\tUP\tDOWN\tLINE\tSEGMENT"
for k in sorted(updateDict):
print "{}\t{}\t{}\t{}\t{}".format( k, updateDict[k][0],updateDict[k][1],updateDict[k][2],updateDict[k][3])
with arcpy.da.UpdateCursor(sourceFC, updateFields) as cursor:
for row in cursor:
row[1] = updateDict[row[0]][2] # line segment (starting point's JOINT___UP value)
row[2] = updateDict[row[0]][3] # segment order
cursor.updateRow(row)
This will update the feature with a line segment name and a segment order. From there you should be able to use the Points to Line tool.
PointsToLine_management (Input_Features, Output_Feature_Class, {Line_Field}, {Sort_Field}, {Close_Line})
Hope this helps.
UPDATE:
The while block (lines 35-42 above) should be replaced with the following code. A line segment with only one point (because of a data entry error) would cause an error.
while pt_next not in lineEnd:
next_pt = pt_next
seg_order += 1
# print "\t{}\t{}\t{}\t{}\t{}".format(next_pt, valueDict[next_pt][0],valueDict[next_pt][1], ln, seg_order)
updateDict[valueDict[next_pt][1]] = (next_pt, valueDict[next_pt][0], ln, seg_order)
pt_next = valueDict[next_pt][0]
Additionally, because of some issues with data entry errors, it is possible to not have a dictionary key for some rows. The code below (replacing lines 49-53 above) will print a warning message when a KeyError is encountered.
with arcpy.da.UpdateCursor(sourceFC, updateFields) as cursor:
for row in cursor:
try:
row[1] = updateDict[row[0]][2] # line segment (starting point's JOINT___UP value)
row[2] = updateDict[row[0]][3] # segment order
cursor.updateRow(row)
except KeyError:
print "Unable to process row: {}".format(row[0])
row[1] = 'ERROR'
row[2] = -1
cursor.updateRow(row)
A lot to digest for one paragraph! Sometimes it helps to step back and state your requirement/need instead of asking for help with a solution because there might be a better overall approach.
Do you need to have the sequences renumbered, or is the main need to take the series of points and turn them into a line?
If you can share some examples of your data, that is most helpful, even if you make up some data that resembles the structure of your actual data.
Assuming your data looks like this (I added PRE_ORDER_ID and PRE_LINE_ID as columns indicating the desired final output):
import pandas as pd
data = [
[30, 56, 1, 1],
[78, 45, 5, 1],
[45, 59, 6, 1],
[59, 64, 7, 1],
[12, 48, 3, 1],
[56, 12, 2, 1],
[48, 78, 4, 1],
[82, 10, 3, 2],
[84, 99, 5, 2],
[79, 95, 1, 2],
[95, 82, 2, 2],
[10, 84, 4, 2],
]
df = pd.DataFrame(data, columns=['UP', 'DOWN', 'PRE_ORDER_ID', 'PRE_LINE_ID'])
df
...then you should be able to do something like this:
# determine if the point is a start, end, or neither
for index, row in df.iterrows():
df.at[index,'START'] = row['UP'] not in df['DOWN'].values
df.at[index,'END'] = row['DOWN'] not in df['UP'].values
counter = 0
line_counter = 1
pos_counter = 1
cur_index = 0
while counter < 100: # I'm paranoid about while loops, so change when you're comfortable
df.at[cur_index,'LINE_ID'] = line_counter # record line number
df.at[cur_index,'LINE_POS_ID'] = pos_counter # record position number
if df.at[cur_index,'END']: # if it's an end point
pos_counter = 0 # reset position counter
line_counter += 1 # increment line counter
cur_index = df.loc[df['START'] & df['LINE_ID'].isnull()].index[0] # set index to next available start point
else:
down_val = df.at[cur_index,'DOWN'] # find next point
cur_index = df.loc[df['UP']==down_val].index[0] # set index
pos_counter += 1
counter +=1
...resulting in:
UP DOWN PRE_ORDER_ID PRE_LINE_ID START END LINE_ID LINE_POS_ID
0 30 56 1 1 True False 1.0 1.0
1 78 45 5 1 False False 1.0 5.0
2 45 59 6 1 False False 1.0 6.0
3 59 64 7 1 False True 1.0 7.0
4 12 48 3 1 False False 1.0 3.0
5 56 12 2 1 False False 1.0 2.0
6 48 78 4 1 False False 1.0 4.0
7 82 10 3 2 False False 2.0 3.0
8 84 99 5 2 False True 2.0 5.0
9 79 95 1 2 True False 2.0 1.0
10 95 82 2 2 False False 2.0 2.0
11 10 84 4 2 False False 2.0 4.0
...which you can use the last two columns to determine line ID, and position ID within the line.
Thank you Darren, I've uploaded an example file. Admittedly I'm having a little trouble running this. This is 100% due to my poor python skills. I'm using 2.7 in ArcMap 10.6. I don't believe I am indexing the dataframe correctly for 'UP' and 'Down' (I modified them to match their names in the shapefile) however I very well could be completely wrong. I really appreciate your help with this.
Darren,
I was able to populate my array finally using arcpy to convert a Numpy Array to a pandas DataFrame like so:
import arcpy
input = "C:\Users\cstallings\Desktop\From_James\Import_Excel_to_Database\Example_points_for_sequence_from_two_fields.shp"
arr = arcpy.da.TableToNumPyArray(input, ('JOINT___UP', 'JOINT___DO'))
data = pd.DataFrame(arr)
df = pd.DataFrame(data, columns=['JOINT___UP', 'JOINT___DO',])
I'm sure this is not the proper way but it did populate the DataFrame.
after running your suggested code it appears that some values came back NaN that i believe should not for example:
>>> df
JOINT___UP JOINT___DO START END LINE_ID LINE_POS_ID
0 1118 1113 True False 1.0 1.0
1 1113 1121 False False 1.0 2.0
2 1121 1115 False False 1.0 3.0
3 1115 1114 False False 1.0 4.0
4 1114 1123 False False 1.0 5.0
5 1123 1119 False False 1.0 6.0
6 1119 1112 False False 1.0 7.0
7 1112 1116 False False 1.0 8.0
8 1116 1122 False False 1.0 9.0
9 1122 1043 False False 1.0 10.0
10 1043 1045 False False 1.0 11.0
11 1045 1042 False False 1.0 12.0
12 1042 1038 False False 1.0 13.0
13 1038 1032 False False 1.0 14.0
14 1032 1044 False False 1.0 15.0
15 1044 1111 False False 1.0 16.0
16 1111 1110 False False 1.0 17.0
17 1110 1117 False False 1.0 18.0
18 1117 1031 False False 1.0 19.0
19 1031 1041 False False 1.0 20.0
20 1041 1009 False False 1.0 21.0
21 1009 1035 False False 1.0 22.0
22 1035 1040 False False 1.0 23.0
23 1040 1069 False False 1.0 24.0
24 995 1000 False False 16.0 4.0
25 1000 1103 False True 16.0 5.0
26 1003 987 True False 2.0 1.0
27 987 985 False False 2.0 2.0
28 985 993 False False 2.0 3.0
29 993 978 False False 2.0 4.0
.. ... ... ... ... ... ...
844 135 122 False False 21.0 12.0
845 122 114 False False 21.0 13.0
846 114 108 False False 21.0 14.0
847 108 129 False False 21.0 15.0
848 129 102 False False 21.0 16.0
849 102 116 False False 21.0 17.0
850 116 94 False False 21.0 18.0
851 94 109 False False 21.0 19.0
852 109 115 False False 21.0 20.0
853 115 123 False False 21.0 21.0
854 123 131 False True 21.0 22.0
855 L-63A 744 True False 22.0 1.0
856 744 L-57A False False 22.0 2.0
857 L-57A 675 False False 22.0 3.0
858 675 744A False True 22.0 4.0
859 168 159 True False 23.0 1.0
860 159 151 False False 23.0 2.0
861 151 158 False False 23.0 3.0
862 158 166 False False 23.0 4.0
863 166 145 False False 23.0 5.0
864 145 144 False False 23.0 6.0
865 144 120 False False 23.0 7.0
866 120 154 False True 23.0 8.0
867 655 630 False False NaN NaN
868 630 L-60 False False NaN NaN
869 L-60 630A False False NaN NaN
870 630A 666C False False NaN NaN
871 666C 635 False False NaN NaN
872 635 L-24A False False NaN NaN
873 L-24A 1083 False True NaN NaN
I'm way over my head at the moment but I think I'm pretty close to getting the values needed and populating new attribute fields from them.
Thank you again for your help!
It looks like, for some reason, the start point for the last line segment (line 57:
867 655 630 False False
...did not get identified as a START. You can set it individually by:
df.at[867,'START'] = True
...or try to figure out why the last line segment didn't get identified as a START.
Thank you for your help Darren,
I think I will have to figure out why some segments didn't get identified as a start or a stop as there are a few in there. Lines 2 and 3 have duplicate Line_POS ID's (they could possibly actually have duplicate Line_ID's causing multiple Line_POS_ID's) Of the 874 points in the sample file, I have 64 without a LINE_D or Line_POS_ID and 51 with duplicate Line_ID & LINE_POS_ID.
To assist in my understanding your question...
I am assuming the number in the up/down fields does not refer in any way to the OID. Could you explain the meaning of data such as L-60 and L-24A? Would these be line start/stop? Also, 630A and 666C? Would these be points used in multiple lines? Or would a line go L-60, 630A, 666C, 635, L-24A? And the next L-24A, 1083 ...etc?
Hi Randy, The up and downs do not refer to an OID they are somewhat random however they are set based on an joint of a line as part of a separate asset management system. They are all unique for the line system so none of them overlap. The L and C do not correlate with a start and a stop unfortunately. The one thing that is consistent is that for every up there is a matching down and vise versa with the exception of the start and end of a line segment. I hope that helps.
It might be possible to navigate through your point feature using a dictionary, where the value of the point (down) becomes the key for the next point (up). From your sample data, the points starting and ending a line segment, do not match other points. That is, the starting point's up value does not have a match in the down column, and the ending point's down value does not have a match the up column.
In my test, I converted your shape file to a file geodatabase feature. To the feature, I added fields for line segment and segment order. The name for the line segment is the up value for the starting point. Here's a debug printout:
OID UP DOWN LINE ORDER
1 1118 1113 1118 1
2 1113 1121 1118 2
3 1121 1115 1118 3
4 1115 1114 1118 4
5 1114 1123 1118 5
6 1123 1119 1118 6
...
20 1031 1041 1118 20
21 1041 1009 1118 21
22 1009 1035 1118 22
23 1035 1040 1118 23
24 1040 1069 1118 24
25 995 1000 1184 4
26 1000 1103 1184 5
27 1003 987 1003 1
28 987 985 1003 2
29 985 993 1003 3
...
794 156 184 259 104
795 171 149 173B 2
796 149 164 173B 3
797 164 157 173B 4
798 157 137 173B 5
799 137 155 173B 6
800 688 670 743 56
801 680 688 743 55
802 683 680 743 54
803 184 178 259 105
804 178 L-14 259 106
805 L-14 178B 259 107
806 173B 171 173B 1
807 687 704 743 52
808 704 683 743 53
809 173 192 259 79
...
851 116 94 133 18
852 94 109 133 19
853 109 115 133 20
854 115 123 133 21
855 123 131 133 22
856 L-63A 744 L-63A 1
857 744 L-57A L-63A 2
858 L-57A 675 L-63A 3
859 675 744A L-63A 4
860 168 159 168 1
861 159 151 168 2
862 151 158 168 3
863 158 166 168 4
864 166 145 168 5
865 145 144 168 6
866 144 120 168 7
867 120 154 168 8
868 655 630 743 112
869 630 L-60 743 113
870 L-60 630A 743 114
871 630A 666C 743 115
872 666C 635 743 116
873 635 L-24A 743 117
874 L-24A 1083 743 118
Here's the code I used:
import arcpy
import numpy as np
sourceFC = r'C:\Path\to\example.gdb\example'
# use OID@ token
sourceFieldsList = ["JOINT___UP", "JOINT___DO", "OID@"]
updateFields = ["OID@", "LineSeg", "SegOrder" ] # LineSeg is text, SegOrder is long int
valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)}
up = [] # values of JOINT___UP (the up point list)
do = [] # values of JOINT___DO (the down point list)
for k in valueDict:
dn, id = valueDict[k] # k = key = JOINT___UP, dn = JOINT___DO, id = OID@, LineSeg, SegOrder
up.append(k) # up point values
do.append(dn) # down point values
# https://stackoverflow.com/questions/41125909
# determine line start : up point doesn't have matching value in down point list
lineStart = np.setdiff1d(up,do)
# determine line end : down point doesn't have matching value in up point list
lineEnd = np.setdiff1d(do,up)
updateDict = {}
for ln in lineStart:
# print "New line: {}".format(ln)
seg_order = 1
# key, next point, oid, line_seg, seg_order
# print "\t{}\t{}\t{}\t{}\t{}".format(ln, valueDict[ln][0],valueDict[ln][1], ln, seg_order)
updateDict[valueDict[ln][1]] = (ln, valueDict[ln][0], ln, seg_order)
pt_next = valueDict[ln][0]
while True:
next_pt = pt_next
seg_order += 1
# print "\t{}\t{}\t{}\t{}\t{}".format(next_pt, valueDict[next_pt][0],valueDict[next_pt][1], ln, seg_order)
updateDict[valueDict[next_pt][1]] = (next_pt, valueDict[next_pt][0], ln, seg_order)
pt_next = valueDict[next_pt][0]
if valueDict[next_pt][0] in lineEnd:
break
print # for debugging
print "OID\tUP\tDOWN\tLINE\tSEGMENT"
for k in sorted(updateDict):
print "{}\t{}\t{}\t{}\t{}".format( k, updateDict[k][0],updateDict[k][1],updateDict[k][2],updateDict[k][3])
with arcpy.da.UpdateCursor(sourceFC, updateFields) as cursor:
for row in cursor:
row[1] = updateDict[row[0]][2] # line segment (starting point's JOINT___UP value)
row[2] = updateDict[row[0]][3] # segment order
cursor.updateRow(row)
This will update the feature with a line segment name and a segment order. From there you should be able to use the Points to Line tool.
PointsToLine_management (Input_Features, Output_Feature_Class, {Line_Field}, {Sort_Field}, {Close_Line})
Hope this helps.
UPDATE:
The while block (lines 35-42 above) should be replaced with the following code. A line segment with only one point (because of a data entry error) would cause an error.
while pt_next not in lineEnd:
next_pt = pt_next
seg_order += 1
# print "\t{}\t{}\t{}\t{}\t{}".format(next_pt, valueDict[next_pt][0],valueDict[next_pt][1], ln, seg_order)
updateDict[valueDict[next_pt][1]] = (next_pt, valueDict[next_pt][0], ln, seg_order)
pt_next = valueDict[next_pt][0]
Additionally, because of some issues with data entry errors, it is possible to not have a dictionary key for some rows. The code below (replacing lines 49-53 above) will print a warning message when a KeyError is encountered.
with arcpy.da.UpdateCursor(sourceFC, updateFields) as cursor:
for row in cursor:
try:
row[1] = updateDict[row[0]][2] # line segment (starting point's JOINT___UP value)
row[2] = updateDict[row[0]][3] # segment order
cursor.updateRow(row)
except KeyError:
print "Unable to process row: {}".format(row[0])
row[1] = 'ERROR'
row[2] = -1
cursor.updateRow(row)