Select X number of maximum value grid cells

MichaelJust · ‎05-26-2012

Hello,
I have a raster (grid) file that has about ~11,000 columns and ~ 6,000 rows. I want to select / create a new grid that contains the top 10,000 valued cells. I have done two analyses and want to compare the 'spatial overlap' between the two approaches for the top 10,000 cells. I have yet to arrive at any method, let alone a clever one, to perform this operation. Does anyone have a suggestion on how to perform such a task?

Thank you kindly,
Mike

SebastianSantibanez · ‎05-30-2012

You could try transforming the raster into an array using arcpy.RasterToNumPyArray. From there is just sorting the array and read the value contained in the position (9,999).
When you know this value you can use raster calculator in order to generate a new raster with the top 10,000 values of your raster.
I hope it helps

MichaelJust · ‎05-30-2012

Thank you this should work. However, the size of the raster exceeds Excel's (where I 'read' the array data right?) current row limits. So I think I'll need to raster to points in Arc and then use the SORT command in Arc and then do the Raster Calculator operations.

Thanks again,
Cheers,
Mike

SteveLynch · ‎05-31-2012

Mike

You could also try the following;
- open the Reclassify GP tool dialog and select your input raster
- hit the Classify... button (to the right of new values)
- on the Classification dialog, hit the % button (far left of Break Values)
- now you can drag the blue vertical lines to choose your top % (i.e. 10000 is what % of number of cells in your raster?)
- hit OK, and OK agian to do the Reclassify
Now you have a integer raster and you know what value represents the top 10000 values
- use the Con tool
- Input conditional raster = integer raster you created above
- Expression, hit the SQL button and create the "where clause", viz., "Value" = the value that represents the top 10000
- Input true raster = your initial input raster
- Hit OK

Cheers
Steve

curtvprice · ‎05-31-2012

Hello,
I have a raster (grid) file that has about ~11,000 columns and ~ 6,000 rows. I want to select / create a new grid that contains the top 10,000 valued cells. I have done two analyses and want to compare the 'spatial overlap' between the two approaches for the top 10,000 cells. I have yet to arrive at any method, let alone a clever one, to perform this operation. Does anyone have a suggestion on how to perform such a task?

Another approach that may be more automated that Steve's approach is the Slice() tool. Seems to me a better approach (for most modeling exercises) would be to locate the top 10 percent, rather than the top 1000, ie, where are the "hot areas", in a scale-independent manner:

topcells = SetNull(Slice(ingrid,100,"EQUALAREA") < 90,ingrid)

If you wanted to set this as a number of cells, you'd need an extra step to count the cells and pick a threshhold:

datacells = SetNull(IsNull(ingrid),1)
... arcpy.analysis.Statistics(datacells,"in_memory/tmpStat","COUNT SUM")
... Rows = arcpy.SearchCursor("in_memory/tmpStat")
... cellCount = Rows.next().SUM_COUNT
... arcpy.Delete_management("in_memory/tmpStat")
... # find percent cutoff for ~ 10,000 cells (if there are ties this will not be precise)
... # levels should be adjusted to match the magnitude of your percent 
... #   (this could be automated using math.log to measure the size 
... #   of your percent value, I leave that to someone else)
... lev = 100
... thresh = lev * ( 1.0 - (10000 / cellCount))
... topcells = SetNull(Slice(ingrid,lev,"EQUAL_AREA") < thresh,ingrid)

curtvprice · ‎05-31-2012

Hello,
I have a raster (grid) file that has about ~11,000 columns and ~ 6,000 rows. I want to select / create a new grid that contains the top 10,000 valued cells. I have done two analyses and want to compare the 'spatial overlap' between the two approaches for the top 10,000 cells. I have yet to arrive at any method, let alone a clever one, to perform this operation. Does anyone have a suggestion on how to perform such a task?

Another approach that may be more automated that Steve's approach is the Slice() tool. Seems to me that a more useful approach (for most modeling exercises) may be to locate the top 10 percent of cells, rather than the top 10,000, ie, where are the "hot areas", in a scale-independent manner:

from arcpy.sa import *
topcells10  = SetNull(Slice(ingrid,10,"EQUAL_AREA") <= 9,ingrid) # top 10% cell values
topcells10x = SetNull(Slice(ingrid,10,"EQUAL_AREA") <= 9,1) # top 10% cells tagged "1"
topcells01  = SetNull(Slice(ingrid,100,"EQUAL_AREA") <= 99,ingrid) # top 1%

If you decide to stick with your original approach of number of cells, you'd need an extra step to count the cells and estimate a percentage from that number:

datacells = SetNull(IsNull(ingrid),1)
tmpStat = "in_memory/tmpStat"
arcpy.analysis.Statistics(datacells,tmpStat,"COUNT SUM")
Rows = arcpy.SearchCursor(tmpStat)
cellCount = Rows.next().SUM_COUNT
arcpy.Delete_management(tmpStat)
# find percent cutoff for ~ 10,000 cells (if there are ties this will not be precise)
# "lev" value may be adjusted to match the magnitude of your percent and how
# close you want to hit your number of cells
#   (this could be automated using math.log10 to measure the size 
#   of your percent value; I leave that to someone else)
lev = 100
thresh = lev * ( 1.0 - (10000 / cellCount))
topcells = SetNull(Slice(ingrid,lev,"EQUAL_AREA") <= thresh,ingrid)

MichaelJust · ‎06-03-2012

Steve and Curt,
Thank you for the suggestions. These methods add some elegance to the procedure.

Cheers,
Mike