I have bus location information that is coming in daily (approx 50K of points a day) that I need to verify that the location cited is using the correct bus stop. In order to determine the nearest bus stop I was attempting to use SQL Server Spatial, but that's not as fast as I would like it to be.
I came across an older post about this (https://community.esri.com/message/939175-re-python-near-analysis-no-advanced-licence?commentID=9391...) and decided that Numpy became an option and I've built the code below to compare 2 different tables and return the id from the first one, and the corresponding id/minimum distance from the second one. I've worked the code below to get me the ID of the closest stop, but I'm looking for help in making sure this is the right way to go. This is what I'm requesting:
1. When I graduated from using input test featureclasses (now remarked out) to examples of the SQL Server tables I will be hitting, I noticed that the index for the minimum distance was always off by 1 (so I just added 1 to get the right one). I'm sure something is wrong there so any help in getting the right index of the minimum distance stop would help.
2. Also, I was told that I didn't need Pandas (by Dan Patterson) but I wasn't generating lat/longs yet as this is just a piece of the overall daily processing (most of it using SQL Server) and it doesn't need to be pushed to a featureclass.
3. I came across the current way of using numpy which seems to be just a little faster than the numpy.einsum function, but maybe not if it was all cleaned up more. All opinions are welcomed as I'm interested in knowing the best way to do this.
If you can provide any help I would sure appreciate it.
import os, sys
import numpy as np
import arcpy
import numpy.lib.recfunctions, scipy.spatial, random, cProfile
import pandas as pd
np.set_printoptions(edgeitems=3, linewidth=80, precision=6, suppress=True, threshold=10)
def npNear(f1,f2):
f1shp_fields = ["DilaxID","LAT", "LON"]
f2shp_fields = ["OID@", "SHAPE@XY"]
a = arcpy.da.TableToNumPyArray(f1, f1shp_fields)
b = arcpy.da.FeatureClassToNumPyArray(f2, f2shp_fields)
origs = np.array(pd.DataFrame(a[:][['LON', 'LAT']]))
dests = b['SHAPE@XY'][:]
P = np.add.outer(np.sum(origs**2, axis=1), np.sum(dests**2, axis=1))
N = np.dot(origs, dests.T)
dist = np.sqrt(P - 2*N)
out = np.zeros((len(dist),3))
out[:,0] = a['DilaxID'][:]
out[:,1] = np.argmin(dist, axis=1) + 1
out[:,2] = np.amin(dist, axis=1)
return out
if __name__=="__main__":
f1 = os.path.join(sys.path[0], r"Ridership on W-SQL01.sde\Ridership.dbo.DilaxBusTest")
f2 = os.path.join(sys.path[0],r"GTFS on W-SQL01.sde\GTFS.dbo.BusStopsWM")
outPath = os.path.join(sys.path[0], r"Ridership on W-SQL01.sde\Ridership.dbo.DilaxBusDistTest")
start = time.time()
out = npNear(f1, f2)
if arcpy.Exists(outPath):
arcpy.Delete_management(outPath)
struct_out = numpy.core.records.fromarrays(
out.transpose(),
numpy.dtype([('dOID', numpy.int32), ('bOID', numpy.int32), ('distance', '<f8')]))
arcpy.da.NumPyArrayToTable(struct_out, outPath)
end = time.time()
print(out)
print(end - start)