Spatial join with HUGE datasets

7379
18
07-16-2015 04:34 AM
ManuelFrias
New Contributor III

How to make a spatial join with ArcGIS with these HUGE datasets?

Target: almost 3 million squares 500x500m size (the attribute table is basically empty)

Join feature: 12000 polylines

Operation: one to many

Match: intersect

All my attempts have failed:

From the Spatial Join tool in the Toolbox: after 15 min the output is a file where no join has been made (Join_Count field is 0 in all rows)

From the dialog box: It seems to work but my estimate is that it takes 4 days to complete - without any guarantee the output will be correct.

It works when I do a Spatial Join of a small area.

Any ideas on how to do this in a proper way?

I am using file geodatabase, ArcGIS 10.2, background processing, 23 GB RAM

(The idea is to overlap those polylines to the grid (the 500x500m squares). I will then make a dissolve to find out how many lines are in each square. Finally, I will make a density map.)

0 Kudos
18 Replies
ManuelFrias
New Contributor III

Hi.

I think I'm going to follow Dan Patterson​'s idea. Vince Angelo​' strategy sounds good but I still need to work with 500x500m cell size. I don't know anyway how I'd fit the raster to the polygon grid.

I created another grid with 9 big cells which cover the whole area. The idea is to do the spatial join in each big cell and then merge everything.

Capture.PNG

This model works like a charm for one big cell:

Capture.PNG

It works fine for one rectangle (only 24 min!) but how to iterate to do the same in the rest of the cells? I exported it from Model builder (attached) and tested a for loop but to no avail.

Any ideas?

0 Kudos
GabrielUpchurch1
Occasional Contributor III

Since you are already using ModelBuilder, I would just insert a "Feature Selection" iterator prior to the Make Feature Layer step and use the 9 cell grid feature class as the input.  The output from the iterator is then used as input to the Make Feature Layer tool.

Some documentation on iterators:  A quick tour of using iterators—Help | ArcGIS for Desktop

You will also need to use "inline variable substitution" to properly name the outputs from each iteration.  Here is some documentation:  A quick tour of using inline variable substitution—Help | ArcGIS for Desktop

VinceAngelo
Esri Esteemed Contributor

The Environments... settings of the Kernel Density tool (and Line Density, and all other tools) provide a way to specify the origin of the Processing Extent:

Environments.jpg

You may need to enter the top/right values before the left/bottom, since the UI enforces a valid envelope (right > left, top > bottom) at all times.

I can't recommend using vector to take days to perform an analysis that raster provides in seconds.

- V

ManuelFrias
New Contributor III

Hi Vince, thanks for your interest,

I guess the main problem is not fitting the raster to a 500x500m grid but to answer the original problem: How many lines are in each cell?

Kernel and line density tools don't solve this. At least the tests I made don't directly answer the above question.

kernel.PNG

Result:

kernelResult.PNG

The Intersecting tool that has been mentioned earlier was a good idea but it took 19 hours to finish the process.

0 Kudos
VinceAngelo
Esri Esteemed Contributor

To limit values to actual line crossings, your search radius should be between sqrt(2) and 1.5 times the pixel radius (354-375) -- a value equivalent to the pixel size increases the density in neighboring cells.

You'll need to divide the density pixel by the average value generated by one pixel to generate a scaled feature density, and "Plus" 0.5 and "Int" to get approximate counts.  Since it only takes two seconds to generate a ~4M pixel grid, there's plenty of time to try various search radius values.

Or you can use the EXPECTED_COUNTS option at 10.3, which will pre-scale the results [just need int(val+0.5)]kdensity.jpg

ManuelFrias
New Contributor III

Hi Vince Angelo

I tried in AG 10.3 but it crashes.

Back in AG 10.2 I tried different search radius but I fail to get the exact amount of lines crossing each cell. Usually the resulting cell after the kernel process gives more lines than there actually are. It's close but not exact.

I got confused with your second paragraph. I guess you're talking about the raster calculator but I don't really get how you can do it.

I am in the desperate phase when nothing seems to work. Vince's suggestion seems the more rational if i get it to work. Meanwhile, my attempts to do it with a polygon grid have failed so far. The suggestion by Gabriel Upchurch​ was great.

ModelBuilder.PNG

The problem is that Spatial Join is unable to join the polygons and the lines of the first part, That is about 600 000 polygons and only 4 500 lines. Out of memory!

polyAndLines.PNG

This happens if I use Intersect for Select Layer by Location (2): the lines and the GridDivision.

If I use within the process goes until grid cell 7. The problem is that with withing not all lines get selected.

polyAndLinesNote.PNG

I read that even if I am on an 64-bit OS I can only operate with 32-bit. Is the model builder using 64 or 32? ( I do have Background Geoprocessing (64-bit))

0 Kudos
GabrielUpchurch1
Occasional Contributor III

To answer your last question first, the 64-bit Background Geoprocessing applies only to geoprocessing tools being executed in the background.  It can be used whether the tools are executed directly or in ModelBuilder or python.  ArcGIS for Desktop is a 32-bit application but when you install the patch and execute the tools in the background, they run in 64-bit.

When you run the model, is the progress dialog opening?  If so, then the model is still running in the foreground.  How are you executing the model?  If you are executing it from the ModelBuilder dialog, I think it always runs in the foreground so you need to execute it as you would any geoprocessing tool.  If the model is running in the foreground, then do the following:

1. Right-click on the model and open the "Properties" dialog.

2. Under the "General" tab, uncheck the setting to "Always run in the foreground" and apply the change.

3. Now execute the model by double-clicking on it (not by opening the ModelBuilder window).

If the model is already running in the background and you are still getting "out of memory" errors, then you have two main choices:

1. Process the data in smaller subsets (this should be straightforward since you already have the model); or

2. Try to get Vince's suggested approach to work.

GabrielUpchurch1
Occasional Contributor III

To check to see if the 64-bit patch is installed, go to the Help dropdown menu in ArcMap (or ArcCatalog) and select "About ArcMap."  If it is installed, you should see something listed similar to "Background Geoprocessing (64-bit)...".

VinceAngelo
Esri Esteemed Contributor

If you have Spatial Analyst available, then you're wasting your time doing this as a vector modelling exercise.  In fact, raster is a much better way to model the quality of AIS (though 500m increments are far too small for reality -- 5km would be better).

It would take much less than 19 hours to learn how to generate a raster with the same origin as your polygon set (I had never done so before from the UI, and it took seconds), and minutes to run dozens of model variants until you found a Kernel Density parameter set that produced a useful raster.  From there, all you need to do is reclassify the zeros to NODATA, scale the results to integer, and convert to polygon with Raster to Polygon.

- Vince