# Finding furthest distances between two sets of points

8335
26
03-18-2015 10:14 AM
New Contributor

I have two sets of points- list A and list B. I need to efficiently select the points from list B that are furthest from points in list A.

The reason Near Table isn't working out is because of the number of (redundant) records showing the distance of each point on list B to every single point on list A. That's 9000 entries (A) x 500 entries (B) for a total of 4,500,000 records generated by the near table.

I would like to end up with an ordered list of points from list B that are furthest away from any points on list A.

It seems like a simple thing to ask but not so simple to do.

How can I approach this?

Tags (3)
26 Replies
New Contributor

Thank you everyone for all your help.

Since there is a limited number sites and facilities, we can order all the sites by distance from their closest waste facilities.

In practice you would not really have identical distances between points.

So at the very top of the list you would have one construction site that you could say is further from all environmental hazards than any other site you have on the list, right?

MVP Esteemed Contributor

OK, I get it, you want to find the nearest hazard site to each construction site, and then rank/order all the construction sites in descending order of that value.  So, the construction site at the top has the largest minimum distance to a hazard site.

New Contributor

That's it.

I realize that my original question may have been a bit off.

I was originally using Near Table and was trying to efficiently get rid of the non-relevant records. (Non-relevant meaning those that do not help identify points with the largest minimum distance to a hazard)

I did't realize I could do this with just the Near tool as Darren has suggested.

MVP Regular Contributor

I've done something similar and the Near table worked for me. When I was doing this, I also found it frustrating that the Near table would only show the ObjectID of the features, requiring additional joins to actually see what features the OIDs were for. I created this Python script that will create a Near table with a key field of your choice in the output. I haven't tested it much but maybe you can get some use out of it.

```import arcpy
import os

try:
# Set local variables
sourceGDB = r"C:\temp\mytempsource.gdb"
outputGDB = r"C:\temp\mytempoutput.gdb"
## IN and NEAR features can be the same
in_fc = "FeatureClassIn_Name"
in_fc_keyfield_name = "KeyField_Name"
in_fc_keyfield_type = "Long"
in_fc_path = os.path.join(sourceGDB, in_fc)
near_fc = "FeatureClassNear_Name"
near_fc_keyfield_name = "KeyField_Name"
near_fc_keyfield_type = "Long"
near_fc_path = os.path.join(sourceGDB, near_fc)

# Create near table and make as TableView
neartable_inmem = os.path.join("in_memory", "neartable_inmem")
arcpy.GenerateNearTable_analysis(
in_fc_path,  ## in_features
near_fc_path,  ## near_features
neartable_inmem,  ## out_table
"NO_LOCATION",  ## location
"NO_ANGLE",  ## angle
"ALL",  ## closest
"0",  ## closest_count
"PLANAR"  ## method
)
arcpy.MakeTableView_management(neartable_inmem, "neartable_view")
print "Near table view created"

# Add new IN and NEAR key fields to near table
in_kf = "IN_{}".format(in_fc_keyfield_name)
"neartable_view",  ## in_table
in_kf,  ## field_name
in_fc_keyfield_type,  ## field_type
)
near_kf = "NEAR_{}".format(near_fc_keyfield_name)
"neartable_view",  ## in_table
near_kf,  ## field_name
near_fc_keyfield_type,  ## field_type
)

# Join and calc IN key field
"neartable_view",  ## in_layer_or_view
"IN_FID",  ## in_field
in_fc_path,  ## join_table
"OBJECTID",  ## join_field
"KEEP_COMMON"  ## join_type
)
arcpy.CalculateField_management(
"neartable_view",  ## in_table
in_kf,  ## field
"!{}!".format(in_fc_keyfield_name),  ## expression
"PYTHON_9.3"  ## expression_type
)
arcpy.RemoveJoin_management("neartable_view")
print "{} field populated".format(in_kf)

# Join and calc NEAR key field
"neartable_view",  ## in_layer_or_view
"NEAR_FID",  ## in_field
near_fc_path,  ## join_table
"OBJECTID",  ## join_field
"KEEP_COMMON"  ## join_type
)
arcpy.CalculateField_management(
"neartable_view",  ## in_table
near_kf,  ## field
"!{}!".format(near_fc_keyfield_name),  ## expression
"PYTHON_9.3"  ## expression_type
)
arcpy.RemoveJoin_management("neartable_view")
print "{} field populated".format(near_kf)

# Export final near table
kf_near_table = "{}_NEAR".format(in_fc.replace('.', '_'))  ## Replace any dot with underscore and add _NEAR suffix
kf_near_table_path = os.path.join(outputGDB, kf_near_table)
arcpy.CopyRows_management(
"neartable_view",  ## in_rows
kf_near_table_path  ## out_table
)
print "Final near table output at {}".format(kf_near_table_path)

except Exception as err:
print err
## If there was an ArcPy error, write all of the messages returned by the last tool
if arcpy.GetMessages(2):
print arcpy.GetMessages()

finally:
# Cleanup
arcpy.Delete_management("in_memory")
arcpy.ClearWorkspaceCache_management()```
New Contributor

Good god. I have no experience with Python yet, so I'm going to take this down and chew on it later ^^

MVP Regular Contributor

Haha! Well, you did post in the Python section so I thought I would go for it. It's not as complicated as it looks.

1. Create Near table
2. Add two new fields for in and near key fields
3. Join the original feature classes to field calc the in and near key field values
4. Export final near table
MVP Esteemed Contributor

If you were interested in the farthest instead of nearest, then you would have to use Point Distance or Generate Near Table in your workflow, unless you wanted to role your own NumPy-based function.  When working with data sets in the tens of thousands, which much from a practical stand point, I have found the Point Distance tool to be very slow and sometimes Generate Near Table fails with unspecified errors.

For situations where I want farthest points or quicker outputs similar to Point Distance, I have roled my own NumPy-based functions.  For example:

```def stat_point(in_features, other_features, stat='MINIMUM'):
import arcpy
import numpy

stats = {
'MINIMUM': {'FID': 'MIN_FID',
'DIST': 'MIN_DIST',
'INDEX': lambda x: 0},
'MAXIMUM': {'FID': 'MAX_FID',
'DIST': 'MAX_DIST',
'INDEX': lambda x: x - 1},
'MEDIAN_HIGH': {'FID': 'MEDH_FID',
'DIST': 'MEDH_DIST',
'INDEX': lambda x: x / 2},
'MEDIAN_LOW': {'FID': 'MEDL_FID',
'DIST': 'MEDL_DIST',
'INDEX': lambda x: x / 2 - 1 if x % 2 == 0 else x / 2 }
}

desc = arcpy.Describe(in_features)
SR = desc.spatialReference

desc = arcpy.Describe(other_features)
OID_name_other = desc.OIDFieldName
shape_name_other = desc.ShapeFieldName
narr_other = arcpy.da.FeatureClassToNumPyArray(
other_features,
[OID_name_other, desc.ShapeFieldName],
spatial_reference = SR
)
xy_other = narr_other[shape_name_other]
idx = stats[stat]['INDEX'](numpy.shape(xy_other)[0])

with arcpy.da.UpdateCursor(
in_features,
["SHAPE@XY", stats[stat]['FID'], stats[stat]['DIST']]
) as cur:
for xy, FID, DIST in cur:
xy_in = numpy.array(xy)
d0 = numpy.subtract.outer(xy_in[0], xy_other[:,0])
d1 = numpy.subtract.outer(xy_in[1], xy_other[:,1])
dist = numpy.hypot(d0, d1)
i = dist.argsort()[idx]
FID = narr_other[OID_name_other]
DIST = dist
cur.updateRow([xy, FID, DIST])```

When used with "MINIMUM", the stat_point function mimics the Near tool.  The Near tool is more performant than using stat_point to find the minimum distance point, but stat_point is orders of magnitude faster at finding non-nearest points compared to Point Distance or Generate Near Table.  Of course, stat_point is doing planar measurements so that is something to be aware of.  I threw MEDIAN parameters in as an example of how simple it is to extend this methodology beyond minimum or maxiumum points.

The NumPy heavy lifting, lines 43-45, comes from Alex Martelli in response to a Stack Overflow post:  Euclidean distance between points in two different Numpy arrays, not within...

MVP Honored Contributor

I should probably read this more closely, but I've used scipy.spatial.distance.pdist for approximately what I think may be happening in lines 43-45 (pairwise distances?).

MVP Esteemed Contributor

I stuck with NumPy because SciPy isn't shipped with ArcGIS Desktop, yet, and SciPy isn't part of our standard data center Python deployment either.  From a performance perspective, it would be interesting to compare the two approaches.

MVP Honored Contributor

Just ran a quick test between stat_point() and the following function for 1,000 points and the results are: stat_point() - 29s, numpyNear() - 0.5s. Almost of the stat_point() time is due to calls to arcpy.Describe (14.1s) and arcpy.AddField (14.7s), so the numpy stuff isn't the problem.

```... def numpyNear(fc):
...     npArray = arcpy.da.FeatureClassToNumPyArray(fc,["OID@","SHAPE@XY"])
...     distArray = scipy.spatial.distance.pdist(npArray['SHAPE@XY'],'euclidean')
...     sqDistArray = scipy.spatial.distance.squareform(distArray)
...     nearFID = npArray['OID@'][numpy.argsort(sqDistArray)[:,1]].transpose()
...     npAppend = numpy.lib.recfunctions.append_fields(npArray,'NearFID',nearFID)
...     nearDist = numpy.sort(sqDistArray)[:,1].transpose()
...     npAppend = numpy.lib.recfunctions.append_fields(npAppend,'NearDist',nearDist)
...     outFC = r'in_memory\outPts'+str(random.random()*1000)
...     outFeat = arcpy.da.NumPyArrayToFeatureClass(npAppend,outFC,"SHAPE@XY")
...     arcpy.CopyFeatures_management(outFC,r'in_memory\numpy_Pts')```