Right now, I have a few hundred points and 20+ million lines. Over and over, I need to select the lines that are closest to the points. If I use Spatial Join, it takes an absurdly long time. If I try to narrow it down by using Select by Location first, it takes an absurdly long time, reading in every single line envelope that falls within the bounding box of my points. All of them. This problem has been happening for at least a year (10.0), and it cripples performance. We work around it by scripting by location with each point selected instead one at a time, but this is not acceptable.
Will there *ever* be a fix for this?
Is there a better workaround?
Both sets of features are in a local file geodatabase. I'm on 10.1 SP1 now.
Frankly, these tools are about as good as they come, and ESRI has done their due diligence and tweaked them about as best they can be.
What sort of hardware are you running this on? Local disks, right?
If you use the 64 bit background geoprocessing (v10.1 only), you should be able to load these large FCs into the in_memory workspace and see a performace boost there (be sure to write the output table/fc there as well).
Only other thing would be to write some fancy Python subprocess/multiprocessing script to break the job up into smaller peices and run them concurrently, which is typically what you have to really boost performance of large jobs like this.
reading in every single line envelope that falls within the bounding box of my points
Perhaps you can whittle the number of lines down by doing some conditional distance pre-processing (exclude any lines that are x distance from a point)? Also, if you are scripting this stuff - remember to take advantage of the arcpy.env.extent property. Most tools honor it (SpatialJoin, GenerateNearTable, etc.).
Another idea... Instead of using the lines themselves, use the line's start/centroid/end points (multipoint style)... which will be much faster than using the entire line geometry (but less precise of course). Then you can get, say, the closest 100 results - and then use the actual line geometry (of the closest results) to make the final determination. You could do all sorts of other high pass filtering too... Like using hull rectangles.... or sampling every other line vertex.
Thanks for the various suggestions. I cannot comment on the performance of Generate Near Table, since I am running with a Basic license.
The complaint I have is why does ArcGIS seem to include all the lines inside the boundary box of the selecting points, even if the search distance is small? If one point is used, the process is very fast. Add a second point diagonally across the extent of the streets and all of them appear to be examined.