Finding furthest distances between two sets of points

8742
26
Jump to solution
03-18-2015 10:14 AM
MarleneSanchez
New Contributor

I have two sets of points- list A and list B. I need to efficiently select the points from list B that are furthest from points in list A.

The reason Near Table isn't working out is because of the number of (redundant) records showing the distance of each point on list B to every single point on list A. That's 9000 entries (A) x 500 entries (B) for a total of 4,500,000 records generated by the near table.

I would like to end up with an ordered list of points from list B that are furthest away from any points on list A.

It seems like a simple thing to ask but not so simple to do.

How can I approach this?

0 Kudos
26 Replies
JoshuaBixby
MVP Esteemed Contributor

It isn't really an apples to apples test, is it?  Where did you test?  ArcMap interactive Python window?  Running Background processing?

The arcpy.Describe call is needed to get spatial references so that you can ensure data in different projections is exported to NumPy with the same spatial reference.

I can install SciPy sometime soon and do a more apples-to-apples test.

0 Kudos
DarrenWiens2
MVP Honored Contributor

No, you're right, it isn't apples-to-apples (I meant to say so). stat_points() is much more fully functional, but that comes at a cost. My main observation was that your pure numpy method is likely done behind the scenes using scipy, which comes at its own cost.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

stat_point is primarily deployed on point sets with millions or tens of millions of comparisons, so an extra few seconds of overhead here or there really is negligible compared to the overall compute times.  That said, an apples-to-apples comparison would be interesting.  I am also interested in improving performance, and SciPy is rumored to be installed automatically with ArcGIS 10.3.1.

0 Kudos
DarrenWiens2
MVP Honored Contributor

Hmmm, well my numpyNear() function gives 'MemoryError' at around 10,000 points. Whether that's due to SciPy or my setup, I don't know.

edit: I see you said comparisons, so I guess that's about in line with 10,000 points compared to itself: 100,000,000 distance comparisons.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Kicked the tires on SciPy.  At first I got memory errors, but I was able to work around it.  Iterating over a 10,000 point feature class against a 50,000 point feature class, instead of comparing all 500,000,000 combinations at once, the SciPy method was ~15% faster than my original straight NumPy approach.  Assuming the memory errors are manageable, SciPy does offer a performance improvement in this case.

It is good that Esri will be packaging and automatically installing SciPy with future ArcGIS releases.

0 Kudos
DarrenWiens2
MVP Honored Contributor

Just curious, are you using 32- or 64-bit Python? I can't seem to shake these Memory Errors with 32-bit. Any more than 10,000 x 10,000 comparisons returns an error, but I may be sloppy managing memory.

edit: or when you say you worked around it, do you mean you processed chunks at a time?

JoshuaBixby
MVP Esteemed Contributor

32-bit.  It would be interesting whether 64-bit SciPy would make a difference.  And yes, in chunks.  My original NumPy code worked in chunks anyways to make the memory footprint lower at the expense of slightly slower code.

0 Kudos