any workarounds or plans to fix Select By Location and spatial join performance?

HarryBowman · ‎06-24-2013

Right now, I have a few hundred points and 20+ million lines. Over and over, I need to select the lines that are closest to the points. If I use Spatial Join, it takes an absurdly long time. If I try to narrow it down by using Select by Location first, it takes an absurdly long time, reading in every single line envelope that falls within the bounding box of my points. All of them. This problem has been happening for at least a year (10.0), and it cripples performance. We work around it by scripting by location with each point selected instead one at a time, but this is not acceptable.

Will there *ever* be a fix for this?

Is there a better workaround?

Both sets of features are in a local file geodatabase. I'm on 10.1 SP1 now.

HarryBowman · ‎06-24-2013

The same Select Layer by Location is now being run with 115,000 points and 20+ million streets. It looks like ArcGIS is trying to bring in the entire dataset - 3 GB of memory usage and climbing.

KevinHibma · ‎06-26-2013

I'd give this blog a read... http://blogs.esri.com/esri/arcgis/2013/05/20/are-you-sure-intersect-is-the-right-tool-for-the-job/
While I dont think any of the particular cases cover your scenario, it does present different ways to tackle similar problems which might give you insight into your own "new" solution.

MarkBoucher · ‎06-26-2013

If your data is on a network, try putting it on your local PC. The processes will run faster.

ChrisSnyder · ‎06-26-2013

Curious if the GenerateNearTable tool (http://resources.arcgis.com/en/help/main/10.1/index.html#//00080000001n000000) would have any better performance than your other methods.

Frankly, these tools are about as good as they come, and ESRI has done their due diligence and tweaked them about as best they can be.

What sort of hardware are you running this on? Local disks, right?

If you use the 64 bit background geoprocessing (v10.1 only), you should be able to load these large FCs into the in_memory workspace and see a performace boost there (be sure to write the output table/fc there as well).

Only other thing would be to write some fancy Python subprocess/multiprocessing script to break the job up into smaller peices and run them concurrently, which is typically what you have to really boost performance of large jobs like this.

ChrisSnyder · ‎06-26-2013

reading in every single line envelope that falls within the bounding box of my points

Perhaps you can whittle the number of lines down by doing some conditional distance pre-processing (exclude any lines that are x distance from a point)? Also, if you are scripting this stuff - remember to take advantage of the arcpy.env.extent property. Most tools honor it (SpatialJoin, GenerateNearTable, etc.).

ChrisSnyder · ‎06-26-2013

Another idea... Instead of using the lines themselves, use the line's start/centroid/end points (multipoint style)... which will be much faster than using the entire line geometry (but less precise of course). Then you can get, say, the closest 100 results - and then use the actual line geometry (of the closest results) to make the final determination. You could do all sorts of other high pass filtering too... Like using hull rectangles.... or sampling every other line vertex.

HarryBowman · ‎06-27-2013

Thanks for the various suggestions. I cannot comment on the performance of Generate Near Table, since I am running with a Basic license.

The complaint I have is why does ArcGIS seem to include all the lines inside the boundary box of the selecting points, even if the search distance is small? If one point is used, the process is very fast. Add a second point diagonally across the extent of the streets and all of them appear to be examined.