Spatial Join Tool : does not work; slow as glaciers;

RobertStevens · ‎08-24-2015

I am going to try not to fill this post with expletives.

Preliminaries: ArcGIS 10.3.1 running on latest 2015 MacBook Pro using Parallels, Windows7, 8G memory, no other user level processes running.

Using: Arc Toolbox => Analysis Tools => Overlay => Spatial Join to join points(715 features) to polygons(469 features)

The two datasets have the same coordinate system, although the point dataset has a join, and the polygon dataset

has a giant number of fields because it was originally Business Analyst dataset, but which was pared down to

the geographic region of interest

The tool fires up. I enter the name of the target feature dataset (polygons). Then name of join features.... 205 seconds later the tool acknowledges the name of the dataset (no work yet mind you, just to respond that I entered the name of a dataset).

Then I wish to join 1:1 and to merge features of the point dataset. So I delete all the features which cannot meaningfully be joined.

Each deletion takes about 10 seconds to register for a net total of 270 seconds.

Then the remaining 4 features Itell it to aggregate as a mean. Another 20 odd seconds.

I enter the name of the output dataset -- incredible, it seems to respond almost immediately; what can be wrong?

I have now spent 495 seconds just to enter the data.

I now hit the run button. It executes, I get the usual progress dialogue at the screen bottom, and within a few seconds the tool finishes

claiming to have run without error. Only, there is no output dataset created.

There is a word for software like this, which decorum dictates that I had best not use.

I have had the misfortune to have been required to use ArcGIS for years;

have lost count of the number of bugs personally reported; grown

old looking at a spinning blue wheel while the product seemingly does nothing.

The tool did not work, already bad, but what kind of software

engineering underlies a user interface that takes 10 minutes to specify

a handful of data items?

The behavior of this tool is replicable.

There, I got to the end without using any 4 letter words.

Now I can go out and relieve by feelings by giving the cat a good kick.

DanPatterson_Retired · ‎08-24-2015

Haha... love it ... some tips that I use

work on a local machine...can't do it? skip to the end if your network causes any slowdown issues
make sure everything is the same projection/coordinate system (check for you)
when I do spatial joins, I give a crap about attributes so I just make clone containing files containing only geometry and ID for both files (time taken < 30 seconds)
I do a spatial index Add Spatial Index—Help | ArcGIS for Desktop (only needs to be done once unless you update/change geometry
do the spatial join...if all works well it will take way less time. You should now have the indices where one file corresponds to another file. You can attribute join/relate using the results
other suggestions
- sometimes coding the same things runs faster than from the tool...don't know why, can't give examples, but I have seen it
- get rid of crap from the files that you know aren't of any interest or are totally outside of the spatial join realm (ie polygons that you know have no chance in H of intersecting another polygon
- prune prune you data to it bare minimum...it takes far less time, particularly if you have to do this task numerous times]
IF you don't have to do any of the above on a regular basis, then consider, either you are getting paid to do it, it is billable time, you need a coffee anyway

Thanks for venting

XanderBakker · ‎08-24-2015

If you are willing to share the data, I would like to try and see if I get the same performance that you are experiencing. Although the outcome can be the same, perhaps the "trouble shared is trouble halved" could save the cat...

RobertStevens · ‎08-25-2015

Chris

The data is not by any means the crown jewels, so I am certainly willing

to share it.

But I am not sure how much you need.

Do you need the BA2015 datasets, or do you have those?

I am not very familiar with packaging up

a subset of data from a file geodatabase.

I would certainly be interested in what you find.

I will try to figure out how to package this stuff up.

Rob

PS. Cat OK : did not even expend one of his 9 lives.

XanderBakker · ‎08-25-2015

As Dan mentioned before, that is indeed a large set of attributes. Although one may expect that this shouldn't have such a big impact on the spatial operations. Since they do, please try what Dan Patterson suggested and see if that speeds up the process. Do generate the attribute indexes on the fields used for the join, when you join back the attributes.

If you want to verify if the join with all the attributes takes takes as long on my system as it did in yours, then you could simply select (in the catalog window) the featureclasses, and paste them into a new one. If one of the dataset contains a very much larger area as the other, then you could do a select by location (draw a rectangle with the selection tool) and export the featureclass to the new file geodatabase. Zip the file geodatabase (including the .gdb folder name) and see if you can attach it to the thread.

BTW: glad the cat is OK...

RobertStevens · ‎08-25-2015

Dan and Xander

Thanks for your replies. Many of the things Dan had recommended were already in place:

same projection; local machine; polygons pertinent to geography.

Now, what is true, is that these Business Analyst datasets have many fields (~2000 in BA basic).

But surely when I run one of these python tools and simply enter the name of a dataset

it should not take 3 minutes simply to display a list of fields, should it? I mean, there is

no real computation happening yet. There is some structure containing field data and, paf!,

display it in a dialogue box ought to be instantaneous, shouldn't it??

Ok, speed aside, how is it that the tool appears to run, claims to have completed successfully,

and yet does not output the new feature class I specify? If the tool is overwhelmed by computational

complexity, should it now just report that and quit with an error?

I think possibly the way out of my dilemma is along the lines Dan suggests:

1. Use the BA layers (tracts, block groups, zip codes..) to create just polygons with no data.

Store those polygons in some file geodatabase.

2. Join the data I am interested in with those polygons using a suitable unique identifier

(tract ID, block group ID, zip code, ..)

3. Then create my own custom BDS layer as outlined in this document:

https://www.esri.com/library/whitepapers/pdfs/importing-and-using-your-own-data.pdf

Another advantage of that is that one can use the BA reporting capabilities.

I am guessing that one needs these BDS layers to be able efficiently to use all the data.

Very likely normal feature classes were never intended to have so many fields.

DanPatterson_Retired · ‎08-25-2015

That is definitely a bloat of data. I would suggest you deal with geometry when needed in separate files, then joing the attributes over if they are used for further work. Keep us posted if you workflow improves.