Grouping and assigning values to items with shared values?

RPGIS · ‎01-14-2022

Hi,

I have a script, which works up but does not output the result that I am looking for, where some shared values don't get assigned the unique ID. Some portion of lines are getting separate IDs despite having a shared common point. Here is the script that I have working thus far:

import arcpy
import time

start_time =  'Process start time: {}'.format(time.strftime('%I:%M:%S'))
print (start_time)
#arcpy.env.overwriteOutput = True
#spointFc = r'*'
#LineFc = r'*'
spointFc = r'*'
LineFc = r'*'
workspace = r'*'

edit = arcpy.da.Editor(workspace)

# Gather fields for searching through
# and updating each feature class.
SelectspointFc_Fields = [field.name for field in arcpy.ListFields(spointFc) if field.name in ['OBJECTID', 'IsolationIDA', 'IsolationIDB']]
SelectLineFc_Fields = [field.name for field in arcpy.ListFields(LineFc) if field.name in ['OBJECTID', 'IsolationID']]

AddspointShape_Field = ['SHAPE@X'] # Set as last/or end of the list of fields
AddLineShape_Field = ['SHAPE@'] # Set as last/or end of the list of fields

spointFc_Fields = SelectspointFc_Fields + AddspointShape_Field
LineFc_Fields = SelectLineFc_Fields + AddLineShape_Field

# Define the update and search cursors for
# input features.
updatePoint = arcpy.da.UpdateCursor(spointFc, spointFc_Fields)
updateLines = arcpy.da.UpdateCursor(LineFc, LineFc_Fields)

searchpoints = arcpy.da.SearchCursor(spointFc, spointFc_Fields)
searchLines = arcpy.da.SearchCursor(LineFc, LineFc_Fields)

# Loop through the line feature class and
# update based on whether the IsolationID
# is null.
i = 10000000

Assigned_IDs = {}
assigned_uniqueIDs = []

LineValues = {}
IDValues = {}

shared_IDs = {}
UpdateIsolatedIDs_Lines = {}

valve_points = []
Non_valvePoints = []

ID_list = []

startloop = 'Starting point feature class at {}'.format(time.strftime('%I:%M:%S'))
print (startloop)

# Loop through the point feature class
# to get the Object ID and the
# floating X coordinate of the
# point feature class.   
with searchpoints as cursor:
    for point in cursor:
        Vpoint = float(point[-1])
        #print (int(Vpoint))
        if Vpoint not in valve_points:
            valve_points.append(Vpoint)
del cursor

stoploop ='Finished point feature class at {}\n'.format(time.strftime('%I:%M:%S'))
print (stoploop)

startloop = 'Starting line feature class at {}'.format(time.strftime('%I:%M:%S'))
print (startloop)

with searchLines as cursor:
    for row in cursor:
        # Loop through the line feature class to
        # get the Object ID, starting and ending line
        # coordinates. Extract only the floating X
        # coordinates of the line features.
        ID = row[0]
        startpnt = row[2].firstPoint
        start = float(startpnt.X)
        
        endpnt = row[2].lastPoint
        end = float(endpnt.X)

        # Check if the ID is in the specified dictionary, then check
        # if both the start and end of the line are in the specified
        # list and assign as key and values to specified dictionary.            
        if ID not in UpdateIsolatedIDs_Lines:
            if start in valve_points and end in valve_points:
                UpdateIsolatedIDs_Lines[ID] = i
                i += 1
                
            # Check if the start of the line is in the specified list
            # and assign key and values to specified dictionary.
            elif start not in valve_points and end in valve_points:
                IDValues[ID] = [start]
                ID_list.append(ID)
                
                # Check if the start of the line is in the specified dictionary
                # and assign starting coordinate as key with ID as value.
                # Update the values if the start of the line already exists
                # as the key in dictionary.
                if start not in LineValues:
                    LineValues[start] = [ID]
                    
                else:
                    update = LineValues[start] + [ID]
                    LineValues[start] = update
                    #print (ID, update)

            # Check if the end of the line is in the specified list and set
            # ID as key in dictionary with ending coordinate as the value.
            elif start in valve_points and end not in valve_points:                
                IDValues[ID] = [end]
                ID_list.append(ID)

                # Check if the end of the line is not in the specified dictionary and
                # assign ending coordinate as key with ID as value to specified dictionary.
                if end not in LineValues:
                    LineValues[end] = [ID]
                    
                else:
                    # Update the values if the end coordinate of the line already
                    # exists in dictionary and update the same key in the dictionary.
                    #print (ID, update)
                    update = LineValues[end] + [ID]                    
                    LineValues[end] = update

            else:

                # Check if both the start and end of the line are in the specified dictionary
                # and assign ID as key in with list of starting and ending coordinate as values.
                # Then assign the starting and ending coordiates as keys with the line ID as
                # the values.
                if start not in LineValues and end not in LineValues:
                    ID_list.append(ID)
                    IDValues[ID] = [start, end]
                    LineValues[start] = [ID]
                    LineValues[end] = [ID]
                    
                # Check if the start of the line is in the specified dictionary and assign
                # the ID as key with a list of starting and ending coordinate as value.
                # Then assign the starting and ending coordinates as keys with the line ID as
                # the values. Update the values if the end coordinate already exists in
                # dictionary as the key in the dictionary.
                elif start not in LineValues and end in LineValues:
                    ID_list.append(ID)
                    IDValues[ID] = [start, end]
                    LineValues[start] = [ID]

                    update = LineValues[end] + [ID]
                    LineValues[end] = update
                    #print (end, update)

                # Check if the start of the line is in the specified dictionary and assign
                # the ID as key with a list of starting and ending coordinate as value.
                # Then assign the starting and ending coordinates as keys with the line ID as
                # the values. Update the values if the start coordinate already exists in
                # dictionary as the key in the dictionary.
                elif start in LineValues and end not in LineValues:
                    ID_list.append(ID)
                    IDValues[ID] = [start, end]
                    LineValues[end] = [ID]

                    update = LineValues[start] + [ID]
                    LineValues[start] = update
                    #print (end, update)

                else:
                    # Assign the starting and ending coordiates as keys with the line ID as
                    # the values. # Update the values if the start coordinate of the line
                    # already exists in the dictionary and the same key in the dictionary.
                    ID_list.append(ID)
                    IDValues[ID] = [start, end]
                    
                    update = LineValues[start] + [ID]
                    LineValues[start] = update
                    
                    update = LineValues[end] + [ID]
                    LineValues[end] = update
del cursor

stoploop = 'Finished line feature class at {}\n'.format(time.strftime('%I:%M:%S'))
print (stoploop)

startloop = 'Starting Compiling Main Update Dictionary At {}'.format(time.strftime('%I:%M:%S'))
print (startloop)

for ID in ID_list:
    #print ('{} is the Object ID of the line'.format(ID))
    unassigned_IDs = []

    # Check if the ID is not in UpdateIsolatedIDs_Lines dictionary. Then check if ID is a key in the IDValues dictionary.
    # If the ID is a key in dictionary, loop through all values in the dictionary. Check if the values of that dictionary
    # are keys in LineValues dictionary. Loop through those values and append all values that are not equal to the ID to the
    # the unassigned_values list.
    if ID not in UpdateIsolatedIDs_Lines:        
        if IDValues[ID]:
            print ('{} is key in {} dictionary with floating {} X coordinates as values'.format(ID, 'IDValues',IDValues[ID]))
        
            for end in IDValues[ID]:
                if LineValues[end]:
                    print ('The floating {} is the key in {} dictionary with {} line IDs as values'.format(end, 'LineValues', LineValues[end]))

                    for x in LineValues[end]:
                        if x not in UpdateIsolatedIDs_Lines:
                            unassigned_IDs.append(x)
                        
        sorted_list = sorted(unassigned_IDs)
        #print (ID, sorted_list)
        first = sorted_list[0]

        if len(sorted_list) > 1:
            if first == ID and first not in UpdateIsolatedIDs_Lines:
                UpdateIsolatedIDs_Lines[first] = i
                print ('{} has been assigned {} as value.'.format(first, UpdateIsolatedIDs_Lines[first]))
                i += 1

                for x in sorted_list[1:]:
                    UpdateIsolatedIDs_Lines[x] = UpdateIsolatedIDs_Lines[first]
                    print ('{} has been assigned {} as value.'.format(x, UpdateIsolatedIDs_Lines[first]))
                    
            elif first in UpdateIsolatedIDs_Lines:
                UpdateIsolatedIDs_Lines[ID] = UpdateIsolatedIDs_Lines[first]
                for x in sorted_list[1:]:
                    UpdateIsolatedIDs_Lines[x] = UpdateIsolatedIDs_Lines[first]
                    print ('{} has been assigned {} as value.'.format(x, UpdateIsolatedIDs_Lines[first]))
        else:
            UpdateIsolatedIDs_Lines[ID] = i
            print ('{} has been assigned {} as value.'.format(first, UpdateIsolatedIDs_Lines[first]))
            i += 1

    else:
        print ('{} is already assigned {} as isolation ID.'.format(ID, UpdateIsolatedIDs_Lines[ID]))
        if IDValues[ID]:
            print ('{} is key in {} dictionary with floating {} X coordinates as values'.format(ID, 'IDValues',IDValues[ID]))
        
            for end in IDValues[ID]:
                if LineValues[end]:
                    print ('The floating {} is the key in {} dictionary with {} line IDs as values'.format(end, 'LineValues', LineValues[end]))

                    for x in LineValues[end]:
                        UpdateIsolatedIDs_Lines[x] = UpdateIsolatedIDs_Lines[ID]
                        print ('{} has been assigned {} as value.'.format(x, UpdateIsolatedIDs_Lines[ID]))
    #print ('\n\n')

finishloop = 'Finished Compiling Main Update Dictionary At {}\n'.format(time.strftime('%I:%M:%S'))
print (finishloop)

AssignedID_checklist = {}

for OID, AssignedID in UpdateIsolatedIDs_Lines.items():
    if AssignedID not in AssignedID_checklist:
        AssignedID_checklist[AssignedID] = [OID]
    else:
        AssignedID_checklist[AssignedID] += [OID]

startloop_updating = 'Starting ID Line Updates at {}'.format(time.strftime('%I:%M:%S'))
print (startloop_updating)

edit.startEditing(False, True)
edit.startOperation()

with updateLines as cursor:
    for row in cursor:
        if row[0] in UpdateIsolatedIDs_Lines:
            IsolationID = UpdateIsolatedIDs_Lines[row[0]]
            print (row)
            print (IsolationID)
            if IsolationID:
                row[0] = row[0]
                row[1] = IsolationID
                row[2] = row[2]
                updateLines.updateRow(row)
            print (row)
            print ('\n')

del cursor

edit.stopOperation()
edit.stopEditing(True)

finishloop_updating = 'Finished ID Line Updates at {}\n'.format(time.strftime('%I:%M:%S'))
print (finishloop_updating)



finish_time = 'Process finish time: {}'.format(time.strftime('%I:%M:%S'))
print (finish_time)

RPGIS · ‎01-20-2022

Hi @Anonymous User,

1 [2134344.1859597526, 2134044.6763881706]
2 [2134716.9647417553, 2134344.1859597526]
3 [2134854.214795336, 2134716.9647417553]
4 [2127143.862105839]
5 [2127231.4413270056]
6 [2127231.4413270056]
7 [2127232.538437672]
8 [2127231.4413270056, 2128202.311432503]
9 [2128202.311432503]
10 [2128489.532314755]
11 [2128489.532314755]
12 [2128489.532314755]
13 [2128488.698326919]
14 [2128544.1339115873]
15 [2128544.1339115873, 2128859.5250130855]
16 [2128859.5250130855]
17 [2128859.80782092]
18 [2128859.5250130855, 2129739.475024838]
19 [2129739.475024838, 2130065.4704358354]
20 [2130065.4704358354]
21 [2130052.789030753]
22 [2130065.4704358354, 2130320.071304586]
23 [2134044.6763881706, 2133934.3380261697]
24 [2133934.3380261697]
25 [2133955.591264505]

2134344.1859597526 [1, 2, 27]
2134044.6763881706 [1, 23]
2134716.9647417553 [2, 3, 29]
2134854.214795336 [3, 31, 33, 37]
2127143.862105839 [4]
2127231.4413270056 [5, 6, 8]
2127232.538437672 [7]
2128202.311432503 [8, 9]
2128489.532314755 [10, 11, 12]
2128488.698326919 [13]
2128544.1339115873 [14, 15]
2128859.5250130855 [15, 16, 18]
2128859.80782092 [17]
2129739.475024838 [18, 19]
2130065.4704358354 [19, 20, 22]
2130052.789030753 [21]
2130320.071304586 [22, 124]
2133934.3380261697 [23, 24, 115247]
2133955.591264505 [25, 26]
2133945.8530950025 [26, 17025, 17026]
2134369.3020512536 [28]
2134720.549380254 [30]
2134848.2853452526 [32]
2134866.82139742 [34, 35]
2134877.9821362533 [36]

Here is where I was having specific issues with my script. There are two dictionaries, one comprised of the ObjectID of the line as the key and the line ends (as floating X coordinate) as values, and the other comprised of the line end as the key with the ObjectID of the other line. I was struggling with figuring out how to make this accurate and efficient. After some tinkering, and using your suggestion, here is the what I have come up with as a result.

assigned_uniqueIDs = [i]
#IsoID_groups = {}

def correlatingValues(ID):
    corrIDs = []
    for xcoord in IDValues[ID]:
        for OID in LineValues[xcoord]:
            if OID != ID:
                corrIDs.append(OID)

    return corrIDs

for assigned in assigned_uniqueIDs:
    if ID_list:
        A = [ID_list[0]]
        B = 0
        
        while A:
            for a in A:
                UpdateIsolatedIDs_Lines[a] = assigned
                values = correlatingValues(a)
                check = [x for x in values if x not in A]
                if check:
                    A = A + check
                if a in ID_list:
                    ID_list.remove(a)
            B += 1
            
            if len(A) == B:
                break
                
        #IsoID_groups[assigned] = sorted(A)
        #print (assigned, A)
        
        i += 1
        assigned_uniqueIDs.append(i)

This seems to work very well and gets me really close to what I am after. But in terms of efficiency, I don't know if there is a better way. I have noticed there may be some issues with the geometries themselves, which will take some time to fix, but this is the closest thing that I could come up with.

View solution in original post

DanPatterson · ‎01-14-2022

Can you also share your current output with an example of the expected output

... sort of retired...

Anonymous User · ‎01-14-2022

I'd suggest you step through it with the debugger and watch for the values that are not assigned/or is assigned other Id's. In example, if 12345 is one that is an issue, add a conditional to check if your script is 'working' on that value for it to stop and you step through the code so you can see where/when/what is causing them to not be what you expect. If I remember correctly, this is a large dataset so set the conditional up to continue processing if the known Id that has problems is not being worked on. When you figure out what's causing it, write code to handle it and when its all fixed, remove the additional debugging code.

elif start not in LineValues and end in LineValues:
    if start == 12345:
        Process stuff with debugger.

RPGIS · ‎01-18-2022

Hi,

Here is what I have for some of my print statements.

10017531 [27087, 27127, 27187, 27190, 27111, 27128]
10017532 [27088, 27154]
10017533 [27089, 27254, 27255]
10017534 [27090, 27332]
10017535 [27091, 27134, 27135]
10017536 [27092, 90968]
10017538 [27093, 27124, 90965, 27095, 27125, 27133]
10017541 [27094, 27100, 27101, 27103, 27105]
10017540 [27130, 27099, 90956, 27129, 90957, 90962]
10017539 [27096, 27126]
10017542 [27102]

Here is a screenshot for one of the dirtier areas.

Anonymous User · ‎01-18-2022

Without context of what those numbers are in the print statements (ID's, AssignedID's? and start, end?), and not seeing any duplicated assigned numbers that prompted this question, I still think that you will need to step through your code with a debugger to figure out when/why/how the assignment is happening.

Replace these numbers with the numbers that are 'wrong':

if ID == 10017531:
  # put a break point here and step through it 

# or 
if start in [27087, 27127, 27187, 27190, 27111, 27128]:
  # put a break point here and step through it

RPGIS · ‎01-18-2022

Hi @Anonymous User,

Here is an updated print statement, but I am working, as you suggest, debugging using line breaks. I haven't used those at all so I am trying to configure it as such.

93169 is already assigned 10040713 as isolation ID.
93169 is key in IDValues dictionary with floating [2203858.366813004, 2203850.9137439206] X coordinates as values
The floating 2203858.366813004 is the key in LineValues dictionary with [93168, 93169] line IDs as values
93168 has been assigned 10040713 as value.
93169 has been assigned 10040713 as value.
The floating 2203850.9137439206 is the key in LineValues dictionary with [93166, 93169] line IDs as values
93166 has been assigned 10040713 as value.
93169 has been assigned 10040713 as value.

The previous print statement is from a dictionary in which the IDs and Assigned IDs are inverted. Hopefully this gives more detail and in the meantime, I am working on utilizing the line breaks and debugging as I go.

RPGIS · ‎01-20-2022

Hi @Anonymous User,

1 [2134344.1859597526, 2134044.6763881706]
2 [2134716.9647417553, 2134344.1859597526]
3 [2134854.214795336, 2134716.9647417553]
4 [2127143.862105839]
5 [2127231.4413270056]
6 [2127231.4413270056]
7 [2127232.538437672]
8 [2127231.4413270056, 2128202.311432503]
9 [2128202.311432503]
10 [2128489.532314755]
11 [2128489.532314755]
12 [2128489.532314755]
13 [2128488.698326919]
14 [2128544.1339115873]
15 [2128544.1339115873, 2128859.5250130855]
16 [2128859.5250130855]
17 [2128859.80782092]
18 [2128859.5250130855, 2129739.475024838]
19 [2129739.475024838, 2130065.4704358354]
20 [2130065.4704358354]
21 [2130052.789030753]
22 [2130065.4704358354, 2130320.071304586]
23 [2134044.6763881706, 2133934.3380261697]
24 [2133934.3380261697]
25 [2133955.591264505]

2134344.1859597526 [1, 2, 27]
2134044.6763881706 [1, 23]
2134716.9647417553 [2, 3, 29]
2134854.214795336 [3, 31, 33, 37]
2127143.862105839 [4]
2127231.4413270056 [5, 6, 8]
2127232.538437672 [7]
2128202.311432503 [8, 9]
2128489.532314755 [10, 11, 12]
2128488.698326919 [13]
2128544.1339115873 [14, 15]
2128859.5250130855 [15, 16, 18]
2128859.80782092 [17]
2129739.475024838 [18, 19]
2130065.4704358354 [19, 20, 22]
2130052.789030753 [21]
2130320.071304586 [22, 124]
2133934.3380261697 [23, 24, 115247]
2133955.591264505 [25, 26]
2133945.8530950025 [26, 17025, 17026]
2134369.3020512536 [28]
2134720.549380254 [30]
2134848.2853452526 [32]
2134866.82139742 [34, 35]
2134877.9821362533 [36]

Here is where I was having specific issues with my script. There are two dictionaries, one comprised of the ObjectID of the line as the key and the line ends (as floating X coordinate) as values, and the other comprised of the line end as the key with the ObjectID of the other line. I was struggling with figuring out how to make this accurate and efficient. After some tinkering, and using your suggestion, here is the what I have come up with as a result.

assigned_uniqueIDs = [i]
#IsoID_groups = {}

def correlatingValues(ID):
    corrIDs = []
    for xcoord in IDValues[ID]:
        for OID in LineValues[xcoord]:
            if OID != ID:
                corrIDs.append(OID)

    return corrIDs

for assigned in assigned_uniqueIDs:
    if ID_list:
        A = [ID_list[0]]
        B = 0
        
        while A:
            for a in A:
                UpdateIsolatedIDs_Lines[a] = assigned
                values = correlatingValues(a)
                check = [x for x in values if x not in A]
                if check:
                    A = A + check
                if a in ID_list:
                    ID_list.remove(a)
            B += 1
            
            if len(A) == B:
                break
                
        #IsoID_groups[assigned] = sorted(A)
        #print (assigned, A)
        
        i += 1
        assigned_uniqueIDs.append(i)

This seems to work very well and gets me really close to what I am after. But in terms of efficiency, I don't know if there is a better way. I have noticed there may be some issues with the geometries themselves, which will take some time to fix, but this is the closest thing that I could come up with.

Anonymous User · ‎01-20-2022

Knowing how to use the debugger is great. Glad to see you were able to make progress. It also helps to use a segment of the dictionary as code comment so the reader can visualize what the dictionary looks like at each step, and for you as the developer a line to reference while you work through your logic. It could also prompt other ideas how to work through the processing. Give it a try. Its easy to copy the value from the debugger as you step through the code for these types of comments.

# assigned_uniqueIDs = {1: [2127231.4413270056, 2128202.311432503], 8: [2127231.4413270056, 2128202.311432503], 9: [2127231.4413270056, 2128202.311432503] ... }
for assigned in assigned_uniqueIDs:
    if ID_list:  # what is ID_list?
       ...