I have bus location information that is coming in daily (approx 50K of points a day) that I need to verify that the location cited is using the correct bus stop. In order to determine the nearest bus stop I was attempting to use SQL Server Spatial, but that's not as fast as I would like it to be.

I came across an older post about this (https://community.esri.com/message/939175-re-python-near-analysis-no-advanced-licence?commentID=939175&et=watches.email.thread#comment-939175) and decided that Numpy became an option and I've built the code below to compare 2 different tables and return the id from the first one, and the corresponding id/minimum distance from the second one. I've worked the code below to get me the ID of the closest stop, but I'm looking for help in making sure this is the right way to go. This is what I'm requesting:

1. When I graduated from using input test featureclasses (now remarked out) to examples of the SQL Server tables I will be hitting, I noticed that the index for the minimum distance was always off by 1 (so I just added 1 to get the right one). I'm sure something is wrong there so any help in getting the right index of the minimum distance stop would help.

2. Also, I was told that I didn't need Pandas (by Dan Patterson) but I wasn't generating lat/longs yet as this is just a piece of the overall daily processing (most of it using SQL Server) and it doesn't need to be pushed to a featureclass.

3. I came across the current way of using numpy which seems to be just a little faster than the numpy.einsum function, but maybe not if it was all cleaned up more. All opinions are welcomed as I'm interested in knowing the best way to do this.

If you can provide any help I would sure appreciate it.

`##Script: busStopDistances.py`

##Author: Borrowed from Dan.Patterson@carleton.ca, and others

##Purpose: Numpy Repository stuff

##References:

## https://community.esri.com/thread/175646

## https://numpy.org/doc/stable/reference/generated/numpy.einsum.html

## https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347

## https://en.wikipedia.org/wiki/Einstein_notation

## https://pro.arcgis.com/en/pro-app/arcpy/data-access/tabletonumpyarray.htm

##

import os, sys

import numpy as np

import arcpy

import numpy.lib.recfunctions, scipy.spatial, random, cProfile

import pandas as pd

np.set_printoptions(edgeitems=3, linewidth=80, precision=6, suppress=True, threshold=10)

def npNear(f1,f2):

f1shp_fields = ["DilaxID","LAT", "LON"] # DilaxBusTest SQL Server table

#f1shp_fields = ["OID@", "SHAPE@XY", "POINT_Y", "POINT_X"] # RandomPoints FC

f2shp_fields = ["OID@", "SHAPE@XY"] # Bus stop locatons FC

a = arcpy.da.TableToNumPyArray(f1, f1shp_fields)

#a = arcpy.da.FeatureClassToNumPyArray(f1, f1shp_fields)

b = arcpy.da.FeatureClassToNumPyArray(f2, f2shp_fields)

origs = np.array(pd.DataFrame(a[:][['LON', 'LAT']])) # How do w/o Pandas?

#origs = a['SHAPE@XY'][:] # for RandomPoints FC

dests = b['SHAPE@XY'][:]

# Distance calculations using Numpy Einsum

## subts = origs[:,None,:] - dests

## dist = np.sqrt(np.einsum('ijk,ijk->ij',subts,subts))

# Distance calculations using other Numpy functions

P = np.add.outer(np.sum(origs**2, axis=1), np.sum(dests**2, axis=1))

N = np.dot(origs, dests.T)

dist = np.sqrt(P - 2*N)

# print(dist) # for testing

# Generate output array

out = np.zeros((len(dist),3))

out[:,0] = a['DilaxID'][:] # the initial table's ID field (f1, a, origs)

out[:,1] = np.argmin(dist, axis=1) + 1 # minimum value from the distance calc

# Adding 1 to get the correct bus stop

out[:,2] = np.amin(dist, axis=1) # minimum value's id, to be associated to later

return out

if __name__=="__main__":

# Featureclasses for testing

#f1 = r"d:\Projects\GTFS_Update_Checker\RandomPoints.shp"

#f2 = r"d:\Projects\GTFS_Update_Checker\Random2.shp"

# SQL Server table/fc for testing

f1 = os.path.join(sys.path[0], r"Ridership on W-SQL01.sde\Ridership.dbo.DilaxBusTest")

f2 = os.path.join(sys.path[0],r"GTFS on W-SQL01.sde\GTFS.dbo.BusStopsWM")

# Final table for results

outPath = os.path.join(sys.path[0], r"Ridership on W-SQL01.sde\Ridership.dbo.DilaxBusDistTest")

start = time.time() # for determining processing time

out = npNear(f1, f2) # Call to sub-routine for determining calculations

# Send to output table in SQL Server

if arcpy.Exists(outPath):

arcpy.Delete_management(outPath)

struct_out = numpy.core.records.fromarrays(

out.transpose(),

numpy.dtype([('dOID', numpy.int32), ('bOID', numpy.int32), ('distance', '<f8')]))

arcpy.da.NumPyArrayToTable(struct_out, outPath)

end = time.time() # end processing time

print(out)

print(end - start)