Spatial Join Problem: Returning Incorrect Counts

9085
5
Jump to solution
11-14-2013 08:21 AM
AndrewChamberlain
New Contributor
I'm trying to join 4,835 geocoded crimes into 134 Census Tracts to obtain the count of crimes in each area. Problem: the spatial join tool is returning join counts that don't match the total number of crimes.

I've attached a small file geodatabase with the two layers I'm trying to join. The target feature is Census_Tracts, and the join feature is the crime layer. I'm doing a 1-to-1 join, intersect (or "contains" -- they both produce the same error), with a 0 search radius.

In the attribute table for the resulting join feature, the sum of the "join count" column is 4,839 -- an overcount of 4.

I've tried everything I could find in the forums -- I've checked for geometry errors in the Census Tract polygon layer (no errors), I've tried using the "multipart to singlepart" tool and re-running the join (I still get an overcount of 4), and I've tried all the options in the spatial join tool.

Can anyone try this spatial join in the attached files and tell me what the problem is? These join counts are a pretty important part of an academic paper I'm working on, and I'm afraid I'm going to have to abandon ArcGIS if I can't get to the bottom of this. Thanks for your help!

(Note: I'm running ArcGIS Desktop, version 10.2.0.3348)
0 Kudos
1 Solution

Accepted Solutions
DaleHoneycutt
Occasional Contributor III
The reason for the mismatch is that some of your points fall exactly on the polygon boundaries and get double-counted.  In addition, some of your points are not inside any polygon (0.1 meters outside a polygon).  We can debate the philosophy of what to do with points exactly on a polygon boundary, but clearly in your case, you'd like Spatial Join (or any overlay tool) to just assign the point to a single polygon -- maybe just by random (choose polygon on left or right -- it doesn't matter).  As far as the points outside the polygon, that can be handled by a search tolerance.

I'm working on a model now that will assign the points-on-the-border to one of the neighbor polygons and will post when done.

(Also, there is a datum mismatch between your census and crime points.  The only reason I bring this up is because projecting one of the datasets so that it matches the datum of the other dataset may resolve some of the positional problems -- points outside or on border.  I'll check into that as well.)

After running tests on your full dataset, you have 9 points that fall on boundaries and 5 points that fall outside, for a difference of 4.  (9 "extra" points because of boundary issues minus the 5 "less" points because they fall outside).

View solution in original post

0 Kudos
5 Replies
AndrewChamberlain
New Contributor
This afternoon I used these two layers to do the spatial join in Quantum GIS, and I got the correct join count of 4,835. So this is clearly an ESRI problem with spatial join, right? Can anyone else confirm that a spatial join between the two feature classes in the above file gives an incorrect join count?
0 Kudos
RichardFairhurst
MVP Honored Contributor
Probably your feature class tolerance and resolution are too low with a shapefile to ensure that points near a boundary do not duplicate.  A File geodatabase allows for double precision and I believe there may be tools to convert to high precision.  In a sense it is correct to duplicate an incident on a boundary.  Also, if your boundaries have overlaps, even very small overlaps, they can cause a problem. 

Sloppy topology on your polygons could cause several problems.  Intersecting the polygons with themselves would show you if your topology is sloppy.  Making sure your topology is clean is crucial if you want very precise results.

If there really is polygon overlap or points fall on the boundary, then the other program is also skewing your results by making a random choice on which boundary to assign the count to without consulting you.  I prefer the ARCGIS behavior so that I could process a secondary routine to find points that need to have their count cut in half to split the incident between to two or more areas manually (or shift its position so that it is clearly assigned to a single polygon).
0 Kudos
DaleHoneycutt
Occasional Contributor III
The reason for the mismatch is that some of your points fall exactly on the polygon boundaries and get double-counted.  In addition, some of your points are not inside any polygon (0.1 meters outside a polygon).  We can debate the philosophy of what to do with points exactly on a polygon boundary, but clearly in your case, you'd like Spatial Join (or any overlay tool) to just assign the point to a single polygon -- maybe just by random (choose polygon on left or right -- it doesn't matter).  As far as the points outside the polygon, that can be handled by a search tolerance.

I'm working on a model now that will assign the points-on-the-border to one of the neighbor polygons and will post when done.

(Also, there is a datum mismatch between your census and crime points.  The only reason I bring this up is because projecting one of the datasets so that it matches the datum of the other dataset may resolve some of the positional problems -- points outside or on border.  I'll check into that as well.)

After running tests on your full dataset, you have 9 points that fall on boundaries and 5 points that fall outside, for a difference of 4.  (9 "extra" points because of boundary issues minus the 5 "less" points because they fall outside).
0 Kudos
DaleHoneycutt
Occasional Contributor III
FYI:
This was such a good thread that I mined it for a blog article:  
More adventures in overlay: point in polygon
0 Kudos
MonicaOosterman
New Contributor

I still do not know what the answer is???

0 Kudos