Clip, Spatial Join or Union?

3205
6
06-25-2014 02:09 AM
wusirui
New Contributor
Hi All
  I have a question about which geospatial process need to be selected in my case, I would be much appreciate if someone could guide me and tell me what their differences.

I have two maps in my case, one is world bounday map (with COUNTRY name) and another is world city map(Only polygon without COUNTRY name). both maps are polygon formats. In order to know how many countries are located within certain country, I need use some geoprocessing tools, such as Clip, Spatial Join and Union.

I just wounder which geoprocessing tools should be the best one if I want to link two maps and get the number of cities.

I had already tried these three functions, see as following:

Clip: I use cthis fucntion to clip those countries I need (According to the world bounday map by "Select by attribute") and output as a new file. Open the attribute table then the number of cities was there.

Spatial Join: World bounday map is input as the Target features and World city map as Input featres. Then, "Contains" operation has been used for linking these two maps and result obtained.

Union: Input these two maps and the output can also show the similar result as above mentioned.

However, These three results are not equal in terms of the city numbers. For example, Clip could be 2000, Spatial join (Contains) could be 1850 and Union could be 1992. I have no idea which one will be more accurate.

Please note: sometimes, some cities were located around the country bounday that may significantly affect the result of this geoprocessing since these cities might be counted.
If anyone who has any experience on this topic, Please give me some suggestions and tips. Thank you.
0 Kudos
6 Replies
RichardFairhurst
MVP Honored Contributor
Hi All
  I have a question about which geospatial process need to be selected in my case, I would be much appreciate if someone could guide me and tell me what their differences.

I have two maps in my case, one is world bounday map (with COUNTRY name) and another is world city map(Only polygon without COUNTRY name). both maps are polygon formats. In order to know how many countries are located within certain country, I need use some geoprocessing tools, such as Clip, Spatial Join and Union.

I just wounder which geoprocessing tools should be the best one if I want to link two maps and get the number of cities.

I had already tried these three functions, see as following:

Clip: I use cthis fucntion to clip those countries I need (According to the world bounday map by "Select by attribute") and output as a new file. Open the attribute table then the number of cities was there.

Spatial Join: World bounday map is input as the Target features and World city map as Input featres. Then, "Contains" operation has been used for linking these two maps and result obtained.

Union: Input these two maps and the output can also show the similar result as above mentioned.

However, These three results are not equal in terms of the city numbers. For example, Clip could be 2000, Spatial join (Contains) could be 1850 and Union could be 1992. I have no idea which one will be more accurate.

Please note: sometimes, some cities were located around the country bounday that may significantly affect the result of this geoprocessing since these cities might be counted.
If anyone who has any experience on this topic, Please give me some suggestions and tips. Thank you.


None of them are accurate.  The two source maps will not match boundaries exactly where the two sources touch boundaries and as a result none of these methods will return a true answer.  If the boundaries matched without crossing, then probably all methods would yield the same result.  So the critical item is boundary alignment.  As long as boundaries cross each other, Spatial Join is under-counting when you use the Contains option, while Union and Clip are over-counting small slivers of polygons.

I would make copies of the source feature classes and use Integrate on them first to align the boundaries together better, then try the Spatial Join option with Contains.  The number will likely increase.  (Be sure to do Integrate on copies, not the original, since it directly alters the source geometry.  You should always maintain an unaltered version of your original data unless the edits are verified before destroying the originals.)

However, the real method I would use with an Advanced license is to first use the Feature to Point tool to extract the Inside point option of each city and then use Spatial Join to the points.  Then use an attribute join of the points to the city polygons to transfer the country attribute.  I might test the Multipart to Singlepart tool option before extracting the points to ensure they were centered inside each separate polygon part prior to doing the join and then performing a Summary Statistics to verify that each part fell in only one country before doing the join.  If they did I would do the transfer.  If they didn't, I would examine each that had a different Min and Max Country ID to find out how bad the overlap was affecting them.

Getting exact counts is one of the most difficult tasks when data is poorly aligned.  Several validation steps have to be done to verify that you have a correct result and manual intervention is almost always needed when the boundary alignment is very poor.
0 Kudos
wusirui
New Contributor
None of them are accurate.  The two source maps will not match boundaries exactly where the two sources touch boundaries and as a result none of these methods will return a true answer.  If the boundaries matched without crossing, then probably all methods would yield the same result.  So the critical item is boundary alignment.  As long as boundaries cross each other, Spatial Join is under-counting when you use the Contains option, while Union and Clip are over-counting small slivers of polygons.

I would make copies of the source feature classes and use Integrate on them first to align the boundaries together better, then try the Spatial Join option with Contains.  The number will likely increase.  (Be sure to do Integrate on copies, not the original, since it directly alters the source geometry.  You should always maintain an unaltered version of your original data unless the edits are verified before destroying the originals.)

However, the real method I would use with an Advanced license is to first use the Feature to Point tool to extract the Inside point option of each city and then use Spatial Join to the points.  Then use an attribute join of the points to the city polygons to transfer the country attribute.  I might test the Multipart to Singlepart tool option before extracting the points to ensure they were centered inside each separate polygon part prior to doing the join and then performing a Summary Statistics to verify that each part fell in only one country before doing the join.  If they did I would do the transfer.  If they didn't, I would examine each that had a different Min and Max Country ID to find out how bad the overlap was affecting them.

Getting exact counts is one of the most difficult tasks when data is poorly aligned.  Several validation steps have to be done to verify that you have a correct result and manual intervention is almost always needed when the boundary alignment is very poor.



Hello Sir
Thank you for your kindly answers and suggestion. I also have some further questions regarding to your answers.

1. How do you know if one data is porrly aligned? Does this always happen or it is common?
2. You mentioned two methods to check, one was Integrate and another was transform polygon to points. Were these two are equal?
if polygons are transformed to be points, do these point are reliable? For example, one big polygon might be split as two points then the number of cities (here is number of points) would increase.
4. I dont care if those points layer cannot be retun to be polygon since the only thing I need to know is How many cities are located in different countries. Which methods you prefer?

This is very important to me and thank you for your kindly help.
0 Kudos
RichardFairhurst
MVP Honored Contributor
Hello Sir
Thank you for your kindly answers and suggestion. I also have some further questions regarding to your answers.

1. How do you know if one data is porrly aligned? Does this always happen or it is common?
2. You mentioned two methods to check, one was Integrate and another was transform polygon to points. Were these two are equal?
if polygons are transformed to be points, do these point are reliable? For example, one big polygon might be split as two points then the number of cities (here is number of points) would increase.
4. I dont care if those points layer cannot be retun to be polygon since the only thing I need to know is How many cities are located in different countries. Which methods you prefer?

This is very important to me and thank you for your kindly help.


1.  If it came from two independent sources it cannot be precisely aligned.  Even a single source agency may be sloppy with this kind of topology.  The different numbers from the methods you used show it is bad topologically.

2,  The two methods are not equal.  Polygons are harder to edge match than a it is to locate a point in the correct surrounding related polygon.  One polygon equals one point regardless of size, so a single feature would never generate 2 points.  The only question arises for multi-part polygons.  If you break a multi-part poiygon into separate singlepart polygons then you have to track the source city ID for a count.  In general extracting a single points with the inside option is all you normally would need, unless you have very gerrymandered boundaries.

4.    The points method is my preferred method, but it requires an Advanced license, so other options have to be applied if you have the Basic or Standard license levels.
0 Kudos
wusirui
New Contributor
1.  If it came from two independent sources it cannot be precisely aligned.  Even a single source agency may be sloppy with this kind of topology.  The different numbers from the methods you used show it is bad topologically.

2,  The two methods are not equal.  Polygons are harder to edge match than a it is to locate a point in the correct surrounding related polygon.  One polygon equals one point regardless of size, so a single feature would never generate 2 points.  The only question arises for multi-part polygons.  If you break a multi-part poiygon into separate singlepart polygons then you have to track the source city ID for a count.  In general extracting a single points with the inside option is all you normally would need, unless you have very gerrymandered boundaries.

4.    The points method is my preferred method, but it requires an Advanced license, so other options have to be applied if you have the Basic or Standard license levels.


Ok, I will try and hope this can be helpful. Thank you
0 Kudos
wusirui
New Contributor
1.  If it came from two independent sources it cannot be precisely aligned.  Even a single source agency may be sloppy with this kind of topology.  The different numbers from the methods you used show it is bad topologically.

2,  The two methods are not equal.  Polygons are harder to edge match than a it is to locate a point in the correct surrounding related polygon.  One polygon equals one point regardless of size, so a single feature would never generate 2 points.  The only question arises for multi-part polygons.  If you break a multi-part poiygon into separate singlepart polygons then you have to track the source city ID for a count.  In general extracting a single points with the inside option is all you normally would need, unless you have very gerrymandered boundaries.

4.    The points method is my preferred method, but it requires an Advanced license, so other options have to be applied if you have the Basic or Standard license levels.


Sorry, I forget one thing. You said Integrate function can help correct the features but how to set this function in ArcGIS. For example ,Should I input these two features together or put it separately.
0 Kudos
RichardFairhurst
MVP Honored Contributor
Sorry, I forget one thing. You said Integrate function can help correct the features but how to set this function in ArcGIS. For example ,Should I input these two features together or put it separately.


The two have to used as inputs to be matched together, with the more accurate one having a ranking of 1 and the other having a ranking of 2.  Both will potentially move due to the tolerance you set, but the higher ranked layer should move less.  You have to look at examples of slivers you got from the union tool to estimate a decent tolerance that is large enough to take care of the slivers, but not so large that it could create unwanted distortions of your features.
0 Kudos