I have a points dataset. I want to identify the cluster area first, and then, find the center of this cluster area.
I was thinking to use "K-Means", but I'm working on a single cluster, so I believe this method is not fitting with my case.
Now, I'm thinking to use either "Mean Center" or "Median Center" tool after identifying the cluster area by using the "Optimized Hot Spot Analysis". However, I'm confused between the mean and median. I'm dealing with only locations without any required weights, which means that I just want to find the center location of the points.
Which one of the two methods is the most proper and accurate for my case? And why?
* I read a lot about them, but I couldn't find the answer for my case.
* In the attachment, you can see the dataset and the cluster area marked by the red circle.
Thank you in advance
Solved! Go to Solution.
the mean center is going to be the arithmetic average of the coordinates which will be influenced to a greater degree than the median because of that outlier to the right/east The median can be calculated in several ways, but typically it is the middle value of the ranked X and Y coordinates, hence, outliers have less impact on the value. If you are looking to get your measure of centrality in the ellipse you have identified, there is no guarantee that you will get it with either measure. In such cases where you have outliers, one can use a trim mean which looks at 95% of the data points with 2.5% trimmed off the extremes from the sorted list of X and Y, this is not implemented in ArcMap.
Other alternatives, although less employeed, would be to produce a Delaunay triangulation (TIN) and determine the area of the triangles. A sorted list of their areas would identify areas from which you could select the points as candidates for the mean or median, after trimming the triangle list so that 90% or so of the area is represented.. You can do the same with successive removals of convex hulls. determine the convex hull, remove the points on the hull and recalculate until you are left with a certain percentage of area (perhaps 90% or even 50%)
Centrality has no 'accurate' measure, only 'best'... so in short, go with the median, or if you want to trim, do a trim median.
Other options are possible but more esoteric.
the mean center is going to be the arithmetic average of the coordinates which will be influenced to a greater degree than the median because of that outlier to the right/east The median can be calculated in several ways, but typically it is the middle value of the ranked X and Y coordinates, hence, outliers have less impact on the value. If you are looking to get your measure of centrality in the ellipse you have identified, there is no guarantee that you will get it with either measure. In such cases where you have outliers, one can use a trim mean which looks at 95% of the data points with 2.5% trimmed off the extremes from the sorted list of X and Y, this is not implemented in ArcMap.
Other alternatives, although less employeed, would be to produce a Delaunay triangulation (TIN) and determine the area of the triangles. A sorted list of their areas would identify areas from which you could select the points as candidates for the mean or median, after trimming the triangle list so that 90% or so of the area is represented.. You can do the same with successive removals of convex hulls. determine the convex hull, remove the points on the hull and recalculate until you are left with a certain percentage of area (perhaps 90% or even 50%)
Centrality has no 'accurate' measure, only 'best'... so in short, go with the median, or if you want to trim, do a trim median.
Other options are possible but more esoteric.
The Directional Distribution tool (Directional Distribution (Standard Deviational Ellipse)—Help | ArcGIS for Desktop ) might help eliminate the outliers. It also provides the centroid coordinates of the ellipse indicating the distribution.
Yes that is one of the possibilities, although affected by outliers as well, it would perform a point based approach. The reducing convex hulls would produce similar results as the SDE, but would be areal based rather than distance to centrality based. As indicated, there is no 'ONE' measure of centrality. For instance, you could construct a Spanning Tree http://www.arcgis.com/home/item.html?id=6ce9db93533345e49350d30a07fc913a and find the middle point of the tree which would represent the place which minimizes the connectedness of all points.
Bounding Containers with central measure in a minimum area bounding rectangle or
Voronoi/Delaunay could be used if one wanted to peal away the outer layers of a voronoi or delaunay triangulation and find the center of mass/area.