Spatial join with 50 million observations...

2251
4
08-03-2011 02:24 PM
AndrewTschirhart
New Contributor
Hello,

I am using ArcGIS 9.3.1, and I am trying to match 50 million addresses that I have already geocoded to a polygon feature class.  I am using the GP tool Spatial Join, but after working with a tiny subset of my data, I think that this could take a month on the whole dataset...

Does anyone know of useful techniques I could use to speed up the process?  I have 8 processors and 16 GB of RAM on my machine, so I am thinking of splitting the two tables I'm joining into eight pieces each and running 8 simultaneous spatial joins in separate ArcMap windows, one for each thread.  Or can the "Add spatial index" geoprocessing tool help me here?  I am relatively new to ArcGIS and I only need to perform this process once, so I am trying to avoid writing a Python script, though if someone happened to have a premade solution for 9.3.1 that I could tweak, that would be very helpful.

Thank you very much for your time,

Andrew Tschirhart
0 Kudos
4 Replies
BruceHarold
Esri Regular Contributor
Hello Andrew

A framework for parallelizing a workflow like yours exists in an ArcGIS 10 sample script tool here:
http://resources.arcgis.com/gallery/file/geocoding/details?entryID=A284F7D9-1422-2418-7F50-BA718224C...
If you upgrade to 10 it isn't a big job to alter the script to perform Spatial Join.

If you need to stick to 9.3.1 then you'll need to create a Model that (for example) has an input parameter that selects a subset of addresses (say using modulo arithmetic, like OBJECTID mod 8 = 0, then 1,2,3,4,5,6,7 to make 8 possible non-overlapping selections).  Then you will need to run this Model in 8 concurrent ArcGIS sessions, and afterwards merge the results.

Regards
0 Kudos
AndrewTschirhart
New Contributor
Bruce,

Thank you very much for your help, I greatly appreciate it.

I have just one follow-up question: do you know if using the framework for ArcGIS 10 is faster/more efficient than running 8 simultaneous sessions in ArcGIS 9.3.1?  I am trying to figure out whether I should wait for my IT to install ArcGIS 10, which is planned for the next few months.

Thanks,
Andrew Tschirhart
0 Kudos
BruceHarold
Esri Regular Contributor
Bruce,

Thank you very much for your help, I greatly appreciate it.

I have just one follow-up question: do you know if using the framework for ArcGIS 10 is faster/more efficient than running 8 simultaneous sessions in ArcGIS 9.3.1?  I am trying to figure out whether I should wait for my IT to install ArcGIS 10, which is planned for the next few months.

Thanks,
Andrew Tschirhart


Andrew

The approaches are the same, in terms of execution of geoprocessing functions, it's just the 10-based script handles all the split/process/merge legwork.  The script will help you with the modulo arithmetic details so take a look anyway.

Regards
0 Kudos
AndrewTschirhart
New Contributor
Bruce,

Thank you very much for your help; I am very grateful.  I will definitely take a look at the script.

Andrew Tschirhart
0 Kudos