Identify clusters that meet a threshold value

Anonymous User · ‎05-05-2010

Original User: dzaks

Given a raster dataset, I am looking to identify clusters comprised of the smallest number of grid cells that meet a given threshold value.

Is this possible? Any ideas on how to accomplish this?

Thanks!

WilliamHuber · ‎05-05-2010

...clusters comprised of the smallest number of grid cells that meet a given threshold value.

Could you elaborate on what this might mean? If by "meet a threshold value" you mean either "have values exceeding a given constant" or "have values equaling a given constant" (it's unclear which is intended), then just select all such grid cells: there's no question of clustering and no question of finding a "smallest number." Thus some clarification of your criterion would be helpful.

Anonymous User · ‎05-05-2010

Original User: dzaks

Sorry!

Given a raster dataset, I am looking to identify clusters of X number of cells (or less) whose values sum up to a certain number.

i.e. The output would spatially identify clusters in the dataset where each cluster is comprised of 20 cells (or less) and the sum of those cells is 1000.

DeeleshMandloi · ‎05-05-2010

You can use the Partitioning tools to achieve your work flow. But these tools work only with point features.

So you can use Raster to Point tool to convert your raster to points along with attributes.

Use the "Create groups based on attribute values" tool within the Partitioning Tools toolbox to get the clusters

Use Point To Raster tool to convert the clustered points to a raster.

Hope this helps
Deelesh

Anonymous User · ‎05-06-2010

Original User: whuber

Given a raster dataset, I am looking to identify clusters of X number of cells (or less) whose values sum up to a certain number.

i.e. The output would spatially identify clusters in the dataset where each cluster is comprised of 20 cells (or less) and the sum of those cells is 1000.

Thanks; that helps. However, the problem is still not well defined, because there are (infinitely) many ways to define what a "cluster" is. Implicitly, there is a sense of nearness involved. Can 20 cells form a cluster provided they are "close enough," or do you require that they be contiguous? If so, are cells contiguous only when they share an edge, or can diagonally adjacent cells also be considered contiguous?

Once that is settled, a second problem is that the solutions will not be unique: you will potentially find many overlapping groups of cells that satisfy your criteria. Thus, you need to impose additional conditions to determine exactly which groups should be chosen.

As a point of departure for thinking (and playing around with the software if you like), consider the possibilities afforded by focal statistics. When a focal sum using a neighborhood of 20 or fewer cells exceeds 1000, you have found the center of a potential "cluster" (although there might still be contiguity questions). Thus, you could run focal sums of various small neighborhoods to detect possible clusters. That's easy and quick. You can broaden your search by looking at focal means of slightly larger neighborhoods: indeed, any neighborhood, of any size or shape, whose average value exceeds 1000/20 = 50 could be considered a "cluster."

(I notice that I interpreted your "threshold" to be a value that should be exceeded by a cluster's sum. If you are looking for clusters that sum exactly to the threshold, then unless there is something special about your values or the structure of your grid or your definition of "cluster," forget it: this is an NP-hard problem and the most you can hope for is to find some research software or approximate solutions that find clusters in extremely small grids.)

Ultimately, how you formulate and solve this problem depends on how you intend to use these clusters, but since you haven't shared that information with us, I cannot recommend particular approaches or software solutions.

The old forums contain detailed discussions of similar questions. Some creative searching might ferret out additional useful information.

Finally, you should be leery of solutions that convert the grid to points (even though that approach offers some capabilities not available with grid-based analysis in ArcGIS), unless your grid is very small: such solutions are likely to take impossible amounts of computing resources and time.

Anonymous User · ‎05-06-2010

Original User: dzaks

Bill, thanks for the thoughtful response.

While I intended to leave the question somewhat open ended, the question I am working on is flexible enough to allow for multiple approaches and a range of how a "cluster" can be interpreted.

The purpose of the analysis is to identify regions that have a high density of manure feedstocks as an input to anaerobic digesters for electricity production. Dense clusters (where transport costs would be low) would be highly suitable for a digester, while areas with feedstocks that are less dense would incur higher transport costs and be less viable. For the sake of discussion, the sum of cells within the cluster do not need to meet an exact amount, but act as a guide to differentiate between transport costs of feedstocks from various locations.

I am in the process of trying the point-based method suggested above and after 8 hours, am about 60% done 😉

Cheers-
~David

WilliamHuber · ‎05-06-2010

The focal mean will do a good job and, provided the neighborhood is not too large, will take just seconds to compute :). You can start up a new ArcGIS process, open the grid, and get the job done while you're waiting for the remaining 40% to finish. Then you can experiment with alternative neighborhood sizes to explore a spectrum of possible solutions.

Anonymous User · ‎05-11-2010

Original User: dzaks

Deelesh-

After running the script on the set of points (converted from raster), there seems to be horizontal and diagonal striations that are artifacts of the algorithm. Can you provide any documentation on how the clusters are chosen and how the search function decides what cell to "look" at next? Below is a link to some of my output.

http://www.flickr.com/photos/davidzaks/4598359975/

(Bill - don't worry, I am trying the focal tools as well...)

Cheers-
~David

DeeleshMandloi · ‎05-12-2010

The tool uses the Spatial Order and Collocate tools (provided in the same toolbox). These two tools are python script tools, so you can view their source to get an idea of what's going on.
First the model runs the Spatial Order tool. This tool generates a peano curve based on the extent of the input points. More info on peano curve that the tool implements is at this link. The pseudo code for the algorithm is also available here .
Once the Spatial order tool is run it populates a field with values ranging between 0 and 1. The idea here is if you sort the table based on these values your input points will be sorted based on proximity to each other. The collocate tool uses the sorted points and then groups them until the capacity value is exceeded.

Hope this helps
Deelesh

Anonymous User · ‎10-15-2012

Original User: ferranferrer

Hi,

I'm using partition tools and need support.

When i run Partition tools -I have added a iterate feature by field 'geo_county', the model stop and show this error.

PYTHON ERRORS:
Traceback Info:
File "C:\FERRAN\SOFT\0 GIS\SIG EXTENSIONS\DISTRICTING ARCGIS\AS16021\PartitioningTools\Scripts\Collocate.py", line 128, in <module>
if count % (int(0.05 * tot_features)) == 0:

Error Info:
<type 'exceptions.ZeroDivisionError'>: integer division or modulo by zero

Is a turnaround to this?

Thanks for your attention,

Ferran