Fastest approach to analyzing two lists comprised of coordinates?

RPGIS · ‎04-10-2023

Hi,

I have several options that I discovered that can analyze thousands of point locations and return a list of ones that did not match and a dictionary of IDs where the points fell within a certain radius. However, I wanted to know if there are other alternatives that could be slightly faster. I have tried several other options, but I can't seem to find one that could potentially be faster. Any help on this would be greatly appreciated.

# ________________________________________ Import modules ______________________________________________
import arcpy, os, datetime, numpy

from arcpy import ListFields, CreateFileGDB_management as CreateGDB, Point, Sort_management as Sorting
from arcpy.da import Editor as Editing, SearchCursor as Searching, InsertCursor as Inserting, UpdateCursor as Updating, Walk
from arcpy.management import CopyFeatures as ReplicateFeature, CreateFeatureDataset as MakeDataset, CreateRelationshipClass as MakeRelation, CreateTable, AddField, TruncateTable as EmptyFeature
from arcpy.conversion import ExportTable as ReplicateTable

from datetime import datetime, timedelta
from os import chdir as SetDirectory, mkdir as CreateDirectory
from os.path import basename as RootName, join as Combine, exists as Existing, dirname as ParentDirectory

import numpy as np

# ______________________________________ Define functions ______________________________________________
# Get list of items that exist in the database
def GetDatabaseItems(InputDatabase):return [Combine(root, filename) for root, directory, filenames in Walk(InputDatabase) for filename in filenames]

# Get current timestamp
def TimeStamp(*args):
    GetStamp = {x: time.strftime(y) for x,y in zip(['Year','Month','Day','Date','Time'], ["%Y","%B","%A","%B %d, %Y",'%I:%M:%S %p'])}
    DT, Values = None, [GetStamp[arg] for arg in args if arg in GetStamp]
    if all(Values): DT = Values
    elif any(Values): DT = GetStamp[Values[0]]
    return DT

# Get point distance
def Distance(PointA, PointB): return (pow(PointA[0]-PointB[0], 2) + pow(PointA[1]-PointB[1], 2))**(1/2)

# Get matching point features
def SpatialIdentification(InFeatureCoords, RefFeatureCoords):
    SortedInFeature, SortedRefFeature = sorted(InFeatureCoords, key=lambda x: x[0]), sorted(RefFeatureCoords, key=lambda x: x[0])
    All = SortedInFeature + SortedRefFeature
    n = 1
    Matching = {}
    NonMatching = []
    InCheck, RefCheck = [], []
    while SortedInFeature:
        for A, B in zip(SortedInFeature, SortedRefFeature):
            Ax, Ay, Bx, By = list(A) + list(B)
            if Distance(A,B) <= n:
                Matching[InFeatureCoords[A]] = RefFeatureCoords[B]
                if A in SortedInFeature: SortedInFeature.remove(A)
                if B in SortedRefFeature: SortedRefFeature.remove(B)
            elif Distance(A,B) > n:
                if Ax < Bx:
                    if InCheck:
                        for X in InCheck:
                            if Distance(X,B) <= n:
                                Matching[InFeatureCoords[A]] = RefFeatureCoords[B]
                                SortedRefFeature.remove(B)
                                InCheck.remove(X)
                                NonMatching.remove(X)
                    NonMatching.append(A)
                    InCheck.append(A)
                    SortedInFeature.remove(A)
                    SortedInFeature = sorted(SortedInFeature, key=lambda x: x[0])
                    break
                elif Ax > Bx:
                    if RefCheck:
                        for Y in RefCheck:
                            if Distance(A,Y) <= n:
                                Matching[InFeatureCoords[A]] = RefFeatureCoords[Y]
                                SortedInFeature.remove(A)
                                RefCheck.remove(Y)
                                NonMatching.remove(Y)
                    NonMatching.append(B)
                    RefCheck.append(B)
                    SortedRefFeature.remove(B)
                    SortedRefFeature = sorted(SortedRefFeature, key=lambda x: x[0])
                    break
    if SortedRefFeature: NonMatching + SortedRefFeature
    return Matching, NonMatching

# Get time lapse
def LapsedTime(start, end):
    hours, rem = divmod(end-start, 3600)
    minutes, seconds = divmod(rem, 60)
    lapse_time = ("{:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
    return lapse_time

# ______________________________________ Important Variables ___________________________________________

# Get specific featureclasses and tables and all the fields from each
# Hydrant feature service (hosted)
HydrantFeatureService = r'*'
# Hydrant featureclass from water database
RefFeature = r'*'
# Create subfolders, geodatabases, etc for data to be copied over
MainFolder = r'*'

# __________________________________________ Main Script _______________________________________________
date, start, end, year, month, day, timestamp = ['Date','Start','End','Year','Month','Day','Time']

# Create a new filegeodatabase if it does not exist
DatabaseName = 'StagingDatabase' + '.gdb'
LocalDatabase = Combine(MainFolder, DatabaseName)
if not Existing(LocalDatabase): LocalDatabase = CreateDirectory(LocalDatabase)

# Get list of items in the database
DatabaseItems = GetDatabaseItems(LocalDatabase)

# Copy data from one database to another database in a newly created dataset
# Featureclass
Name = 'Test'
InFeature = Combine(LocalDatabase, Name)
if InFeature not in DatabaseItems: ReplicateFeature(HydrantFeatureService, InFeature)

# Create costants to test various options
Fields = ['OID@', 'SHAPE@XY']
InFeatureIDShape = {row[-1]:row[0] for row in Searching(InFeature, Fields)}
RefFeatureIDShape = {row[-1]: row[0] for row in Searching(RefFeature, Fields)}
SortedInFeature, SortedRefFeature = sorted(InFeatureIDShape, key=lambda x: x[0]), sorted(RefFeatureIDShape, key=lambda x: x[0])#sorted(RefFeatureIDShape)

n = 25
# First Option
print ('Option 1')
start = TimeStamp('Time')[0]
print (start)
start = time.time()
i = 0
Matched, UnMatched = SpatialIdentification(InFeatureIDShape, RefFeatureIDShape)
if Matched or UnMatched: print (f'{len(Matched)} matched features | {len(UnMatched)} unmatched features')
for A in sorted(Matched):
    if i < n:
        print (A, Matched[A])
    i += 1
for Point in sorted(UnMatched):
    print (Point)
end = TimeStamp('Time')[0]
print (end)
end = time.time()
lapse = LapsedTime(start, end)
print (lapse)
print ('\n')

# Second option
print ('Option 2')
start = TimeStamp('Time')[0]
print (start)
start = time.time()
i = 0
InFeatureMPoint, RefFeatureMPoint = arcpy.Multipoint(arcpy.Array([arcpy.Point(*x) for x in SortedInFeature])), arcpy.Multipoint(arcpy.Array([arcpy.Point(*x) for x in SortedRefFeature]))
UnMatched = [(Point.X, Point.Y) for Point in list(InFeatureMPoint - RefFeatureMPoint) + list(RefFeatureMPoint - InFeatureMPoint)]
UnMatched = sorted(UnMatched, key=lambda x: x[0])
Matched = dict(zip([InFeatureIDShape[x] for x in SortedInFeature if x not in UnMatched], [RefFeatureIDShape[x] for x in SortedRefFeature if x not in UnMatched]))
if Matched or UnMatched: print (f'{len(Matched)} matched features | {len(UnMatched)} unmatched features')
for A in sorted(Matched):
    if i < n:
        print (A, Matched[A])
    i += 1
for Point in UnMatched:
    print (Point)
end = TimeStamp('Time')[0]
print (end)
end = time.time()
lapse = LapsedTime(start, end)
print (lapse)
print ('\n')

dslamb2022 · ‎04-10-2023

I always found using an index to compare sets of points is the fastest way to compare them. You can find the k nearest or the nearest points within a distance. There are a couple of libraries for python (RTree), but generally use scipy kdtree since scipy is a default library in ArcGIS Pro.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html

For example (not tested):

import numpy as np
from scipy.spatial import KDTree

points1 = np.array([row[0] for row in arcpy.da.SearchCursor(featureClass,["SHAPE@XY"])])
points2 = np.array([row[0] for row in arcpy.da.SearchCursor(featureClass2,["SHAPE@XY"])])
tree = KDTree(points1)
dd, ii = tree.query(points2, k=[1])
epsilon = .00002
ii[dd>epsilon] #or something like that to find the points that have a distance greater than some cutoff.

RPGIS · ‎04-10-2023

Thank you very much @dslamb2022 for the example.

I was trying to test a third option to see if it would run faster than the first two. I tried using numpy but could not get it to work.

I forgot to mention, when both tests were ran, the first option ran in between 7-8 seconds and the second option ran between 2-3 seconds. Both options gave me the result that I needed, but I wanted to reach out to see if there were any other possibilities that I did not think of that could also work.

DavidPike · ‎04-10-2023

I might be overly simplifying but it might be worth using a tool such as https://pro.arcgis.com/en/pro-app/latest/tool-reference/analysis/generate-near-table.htm

I'd recommend you create the best spatial index you can for both features prior to running the tool.

If it's faster then you can then just run a SeachCursor on the table to create whatever dictionaries you need.

RPGIS · ‎04-10-2023

Thanks @DavidPike for the help. My other goal was to run a script without needing to create another feature to either search through or reference to get the information. So far both solutions seem to work, option 2 is faster than the other, but I was looking to see if there were other options out there that are similar and run even faster than the solutions I came up with.

The first solution takes 7-8 seconds and the other 2-3 seconds. I just thought to reach out and see if there were other solutions that I had not thought of.