Creating chaining dictionary for coinciding endpoints?

RPGIS · ‎12-10-2021

Hi,

I have been working on this python script for a good while. For the most part, I am able to get it to work on a smaller set of data. However, I have noticed that it takes an exceptionally long time to execute for the entire database. I am not sure if I wrote this in an efficient manner, but I am quite perplexed as to how long it takes. I have tried splitting up the dictionaries to simplify matters and to have smaller sections of code execute. I am hoping this will speed up the process a bit but at this point I have run out of options.

import arcpy
import time

spointFc = r'*'
LineFc = r'*'
workspace = r'*'

edit = arcpy.da.Editor(workspace)

startloop_pointFC = 'Starting point feature class at {}'.format(time.strftime('%I:%M:%S'))
print (startloop_pointFC)

with searchpoints as cursor:
    for point in searchpoints:
        #Vpoint = [x for x in point[-1]]
        Vpoint = int(point[-1])
        #print (int(Vpoint))
        valve_points.append(Vpoint)

del cursor

stoploop_pointFC ='Finished point feature class at {}\n'.format(time.strftime('%I:%M:%S'))
print (stoploop_pointFC)

startloop_lineFC = 'Starting line feature class at {}'.format(time.strftime('%I:%M:%S'))
print (startloop_lineFC)

edit.startEditing(False, True)
edit.startOperation()

with updateLines as cursor:
    for row in cursor:
        ID = row[0]

        startpnt = row[2].firstPoint
        #start = [startpnt.X, startpnt.Y]
        start = int(startpnt.X)
        
        endpnt = row[2].lastPoint
        #end = [endpnt.X, endpnt.Y]
        end = int(endpnt.X)
        
        start_end = [start, end]

        # Check if start or end coordinates are shared
        # with any of the point coordinates (X)
        A = 0
        B = 0
            
        for valve in valve_points:
            if start/valve == 1:
                A += 1
            elif end/valve == 1:
                B += 1

        #if start in valve_points and end in valve_points:
        if A == 1 and B == 1:
            #print (row)
            row[0] = row[0]
            row[1] = i
            row[2] = row[2]
            updateLines.updateRow(row)
            #print (row)
            i += 1
                            
        #elif start in valve_points and end not in valve_points:
        if A == 1 and B == 0:
            #if end not in NonValvePoints:
            NonValvePoints.append(end)
            LinesWithValve[ID] = end
                    
        #elif end in valve_points and start not in valve_points:
        if A == 0 and B == 1:
            #if start not in NonValvePoints:
            NonValvePoints.append(start)
            LinesWithValve[ID] = start

        else:
            # If neither start or end coordinates are
            # in the list of non-point coordinates

            C = 0
            D = 0
            
            for Nvalve in NonValvePoints:
                if start/Nvalve == 1:
                    C += 1 
                elif end/Nvalve == 1:
                    D += 1
                    
            # Check if C and D change in vales.
            # If C and/or D change values, add ID as
            # key with line ends as assigned values
            if C == 1 and D == 1:
                LinesWithoutValve[ID] = start_end

            elif C == 1 and D == 0:
                NonValvePoints.append(end)
                LinesWithoutValve[ID] = start_end

            elif C == 0 and D == 1:
                NonValvePoints.append(start)
                LinesWithoutValve[ID] = start_end

            else:
                NonValvePoints + start_end
                LinesWithoutValve[ID] = start_end

del cursor
    
edit.stopOperation()
edit.stopEditing(True)

stoploop_lineFC = 'Finished line feature class at {}\n'.format(time.strftime('%I:%M:%S'))
print (stoploop_lineFC)

startloop_nonvalves = 'Starting dictionary at {}'.format(time.strftime('%I:%M:%S'))
print (startloop_nonvalves)

# Run through dictionary and run through same dictionary
# to find any coinciding values for lines without valves.
for init_ID, End in LinesWithoutValve.items():
    CoincidingIDs = []

    for other_ID, OtherEnd in LinesWithoutValve.items():
        if init_ID != other_ID:
            isMatch = [e for e in End if e in OtherEnd]

            if isMatch:
                CoincidingIDs.append(other_ID)

    if CoincidingIDs:
        SharedIDs[init_ID] = CoincidingIDs
        #print (init_ID, CoincidingIDs)

# Run through dictionary and run through other dictionary
# to find any coinciding values for lines with valves.      
for init_ID, End in LinesWithoutValve.items():
    CoincidingIDs = []
    
    for lv_ID, line_valveEnd in LinesWithValve.items():
        for E in End:
            if E == line_valveEnd:
                CoincidingIDs.append(lv_ID)

    if init_ID in SharedIDs:
        shared_updates = SharedIDs[init_ID] + CoincidingIDs
        SharedIDs[init_ID] = shared_updates

finishloop_dict = 'Finished dictionary at {}\n'.format(time.strftime('%I:%M:%S'))
print (finishloop_dict)

If anyone can provide any advice or somehow show a more efficient way to construct this; I would be greatly appreciated.

Thanks

Anonymous User · ‎12-16-2021

what is searchpoints? Take it you removed some cursor code for posting? Looks like you are just creating a list and that can be handled with list comprehension.

valve_points = [int(point[-1]) for point in searchpoints]

Same with updateLines.

You are iterating over the valve_point list each row iteration to check if the start (row) number divided by that (valve) value is 1. Would checking if start is in valve_points work? Then check if end is in valve_points and then work out the four scenarios based on true true, false true, true false, false false?

Instead of math to determine equality, why not just compare equality? If start/valve is 1/2 != 1, or not equal, start != valve is the doing the same but with less steps involved. I can't think of any situation where that would no be true and direct comparing not work.

I think there can be a better way to iterate over the LinesWithoutValve dictionary to find matches. Here, you are saying if the keys do not match, iterate over the values to find a match. And since you are iterating over the dictionary in two loops, you are doing this a lot of times.

if init_ID != other_ID:
    isMatch = [e for e in End if e in OtherEnd]

say there are 5 items in dictionary.

loop 1:

inner loop is done 5 times, 4 times it will compare [e for e in End if e in OtherEnd]

...

loop 5:

inner loop is done another 5 times, 4 times it will compare [e for e in End if e in OtherEnd]

You did 5 outer loops and 20 inner compare loops, and 20 loops over the End list for each e that is in there.

If you restructure your dictionary to have keys that could be used to hold values, such as 'start': start and 'end': end, instead of tying everything to the ID, that could help with comparing matches in a more efficient manner. You could then iterate over the list of End and lookup the value by matching the key. fcId can still be the ID.

updated example:

dict = {'fcId': {'start': 1, 'end': 3},
        'fcId2': {'start': 3, 'end': 7},
        'fcId3': {'start': 3, 'end': 9}}

endList = [3, 5, 7, 7]
otherEnd = [1, 6, 9, 5]
CoincidingIDs = []

for e in [e for e in endList if e not in otherEnd]:
    for k, v in dict.items():
        if v['end'] == e:
            CoincidingIDs.append(k) if k not in CoincidingIDs else print(f'{k} already in list')

print(CoincidingIDs)

RPGIS · ‎12-16-2021

Hi @Anonymous User,

I reworked the script where instead of creating two sets of two dictionaries to try and iterate through, I combined all like dictionaries into two main dictionaries. The first of which is comprised of the ID as the key with the line ends as the value(s) and the other which is comprised of the line ends as the key and coinciding IDs as values. This is in addition to a list of IDs.

Rather than looping through multiple dictionaries like I tried previously, instead I resorted to looping through a single list of IDs, indexing the keys and values, loop through values, and index those values as keys to find all correlating values. The modified script came really close, and it runs far more efficiently than what I had previously. The script before took roughly 4 hours to run, but this slight modification runs in roughly 15 minutes.

Here is the slightly modified script:

import arcpy
import time

start_time =  'Process start time: {}'.format(time.strftime('%I:%M:%S'))
print (start_time)

spointFc = r'*'
LineFc = r'*'
workspace = r'*'

edit = arcpy.da.Editor(workspace)

# Gather fields for searching through
# and updating each feature class.
SelectspointFc_Fields = [field.name for field in arcpy.ListFields(spointFc) if field.name in ['OBJECTID', 'IsolationIDA', 'IsolationIDB']]
SelectLineFc_Fields = [field.name for field in arcpy.ListFields(LineFc) if field.name in ['OBJECTID', 'IsolationID']]

AddspointShape_Field = ['SHAPE@X']
AddLineShape_Field = ['SHAPE@']

spointFc_Fields = SelectspointFc_Fields + AddspointShape_Field
LineFc_Fields = SelectLineFc_Fields + AddLineShape_Field

# Define the update and search cursors for
# input features.
updatePoint = arcpy.da.UpdateCursor(spointFc, spointFc_Fields)
updateLines = arcpy.da.UpdateCursor(LineFc, LineFc_Fields)

searchpoints = arcpy.da.SearchCursor(spointFc, spointFc_Fields)
searchLines = arcpy.da.SearchCursor(LineFc, LineFc_Fields)

# Loop through the line feature class and
# update based on whether the IsolationID
# is null.
i = 10000000

Assigned_IDs = {}
assigned_uniqueIDs = [i]

LineValues = {}
IDValues = {}

shared_IDs = {}
UpdateIsolatedIDs_Lines = {}

valve_points = []
Non_valvePoints = []

ID_list = []

startloop = 'Starting point feature class at {}'.format(time.strftime('%I:%M:%S'))
print (startloop)

with searchpoints as cursor:
    for point in searchpoints:
        #Vpoint = [x for x in point[-1]]
        Vpoint = float(point[-1])
        #print (int(Vpoint))
        if Vpoint not in valve_points:
            valve_points.append(Vpoint)
del cursor

stoploop ='Finished point feature class at {}\n'.format(time.strftime('%I:%M:%S'))
print (stoploop)

startloop = 'Starting line feature class at {}'.format(time.strftime('%I:%M:%S'))
print (startloop)

with searchLines as cursor:
    for row in cursor:
        ID = row[0]
        startpnt = row[2].firstPoint                                    # Capture floating start point of line.
        start = float(startpnt.X)                                       # Capture floating X coordinate of line.
        
        endpnt = row[2].lastPoint                                       # Capture floating end point of line.
        end = float(endpnt.X)                                           # Capture floating X coordinate of line.
        
        if ID not in UpdateIsolatedIDs_Lines:                           # Check if ID is in the specified dictionary.
            if start in valve_points and end in valve_points:           # Check if both the start and end of the line are in the specified list.
                UpdateIsolatedIDs_Lines[ID] = i                         # Assign key and values to specified dictionary.
                i += 1                                                  # Increment i by 1
                assigned_uniqueIDs.append(i)                            # Append i to list of used unique IDs

            elif start not in valve_points and end in valve_points:     # Check if the start of the line is in the specified list.
                IDValues[ID] = [start]                                  # Assign key and values to specified dictionary.
                ID_list.append(ID)
                    
                if start not in LineValues:                             # Check if the start of the line is in the specified dictionary.
                    LineValues[start] = [ID]                            # Assign starting coordinate as key with ID as value to specified dictionary.
                else:
                    update = LineValues[start] + [ID]                   # Update the values if the start of the line already exists in dictionary.
                    LineValues[start] = update                          # Update the key of the line already in dictionary.
                    #print (ID, update)

            elif start in valve_points and end not in valve_points:     # Check if the end of the line is in the specified list.
                IDValues[ID] = [end]                                    # Set ID as key in specified dictionary with list of ending coordinate as values
                ID_list.append(ID)                                      # Append ID to specified list.
                
                if end not in LineValues:                               # Check if the end of the line is not in the specified dictionary.
                    LineValues[end] = [ID]                              # Assign ending coordinate as key with ID as value to specified dictionary.
                else:
                    update = LineValues[end] + [ID]                     # Update the values if the end coordinate of the line already exists in dictionary.
                    LineValues[end] = update                            # Update the key of the line already in dictionary.
                    #print (ID, update)

            else:

                if start not in LineValues and end not in LineValues:   # Check if both the start and end of the line are in the specified dictionary.
                    ID_list.append(ID)                                  # Append ID to specified list.
                    IDValues[ID] = [start, end]                         # Set ID as key in specified dictionary with list of starting and ending coordinate as values
                    
                    LineValues[start] = [ID]                            # Assign starting coordinate as key with ID as value to specified dictionary.
                    LineValues[end] = [ID]                              # Assign ending coordinate as key with ID as value to specified dictionary.
                    

                elif start not in LineValues and end in LineValues:     # Check if the start of the line is in the specified dictionary.
                    ID_list.append(ID)
                    IDValues[ID] = [start, end]
                    LineValues[start] = [ID]
                    
                    update = LineValues[end] + [ID]
                    LineValues[end] = update
                    #print (end, update)

                elif start in LineValues and end not in LineValues:
                    ID_list.append(ID)
                    IDValues[ID] = [start, end]
                    LineValues[end] = [ID]
                    
                    update = LineValues[start] + [ID]
                    LineValues[start] = update
                    #print (end, update)

                else:
                    ID_list.append(ID)
                    IDValues[ID] = [start, end]
                    update = LineValues[start] + [ID]
                    LineValues[start] = update
                    
                    update = LineValues[end] + [ID]
                    LineValues[end] = update
del cursor

stoploop = 'Finished line feature class at {}\n'.format(time.strftime('%I:%M:%S'))
print (stoploop)

startloop = 'Starting Compiling Main Update Dictionary At {}'.format(time.strftime('%I:%M:%S'))
print (startloop)

for ID in ID_list:
    #print (ID)
    needs_assigning = []

    if IDValues[ID]:
        #print ('{} is key in {} dictionary with {} as values'.format(ID, 'IDValues',IDValues[ID]))
        for end in IDValues[ID]:
            if LineValues[end]:
                #print ('{} is key in {} dictionary with {} as values'.format(end, 'LineValues', LineValues[end]))
                for x in LineValues[end]:
                    if x not in needs_assigning and x != ID:
                        needs_assigning.append(x)
                        
    shared_IDs[ID] = needs_assigning
    print ('{} is unique list of unassigned values'.format(needs_assigning))
    
    if ID == needs_assigning[0]:
        if needs_assigning[0] not in UpdateIsolatedIDs_Lines:

            UpdateIsolatedIDs_Lines[ID] = i
            #print ('{} has {} assigned as the unique ID'.format(needs_assigning[0], i))

            i += 1

    if UpdateIsolatedIDs_Lines[ID]:
        for assign in needs_assigning:
            UpdateIsolatedIDs_Lines[assign] = UpdateIsolatedIDs_Lines[ID]
            #print ('{} has been assigned {} as unique ID'.format(assign, UpdateIsolatedIDs_Lines[ID]))
  
    print ('\n')

finishloop = 'Finished Compiling Main Update Dictionary At {}'.format(time.strftime('%I:%M:%S'))
print (finishloop)
    
startloop = 'Starting Reading Dictionary At {}'.format(time.strftime('%I:%M:%S'))
print (startloop)

Assigned_IDGroups = {}

for ID, Assigned_ID in UpdateIsolatedIDs_Lines.items():
    #print ('{} is assigned {} as a common ID'.format(ID, Assigned_ID))
    
    if Assigned_ID not in Assigned_IDGroups:
        Assigned_IDGroups[Assigned_ID] = [ID]
    else:
        update = Assigned_IDGroups[Assigned_ID] + [ID]
        Assigned_IDGroups[Assigned_ID] = update

for UniqueIDs, grouped_IDs in Assigned_IDGroups.items():
    print ('{} is assigned to this {} group of IDs.'.format(UniqueIDs, grouped_IDs))

finishloop = 'Finished Group Assignment IDs Dictionary At {}'.format(time.strftime('%I:%M:%S'))
print (finishloop)

This script gets me close to what I am trying to accomplish, but I will give your suggestion a try and see how that performs. There are a few instances that raised an issue with either values being skipped, not grouped fully, or grouped in another group with another ID assigned.

Anonymous User · ‎12-16-2021

15 minutes is a lot better than 4 hours. I'm sure you can find a way to figure out those fringe cases.

RPGIS · ‎12-17-2021

I am trying. For the most part, things seem to work . However, I am struggling a bit with the logic handling. For some reason, I either assign the wrong ID to the wrong group, assign two IDs to two separate groups when those groups should be one group, or skip something entirely.