I'm having some strange problems with classifying certain raster images for display purposes. The histograms that show up under properties/symbology/classify just don't match the actual raster, and as a result, the classify options don't work very well. In particular, I'd love to be able to use quantiles to display the raster in 10 evenly balanced colors (each color representing 10% of the raster cells). But the quantiles function fails and instead the raster shows up as 95% one color (the lowest values) and then only a tiny fraction for the other colors. This is a grid file (but I'm having similar issues with a tiff), so the sampling is I believe automatically set at full (no skips).
Even though the classification stats should be based on the full raster, the mean and standard deviation listed on the classify window don't come close to matching the mean and standard deviation of the raster in the properties/source tab. I am inclined to believe the classify #'s are the wrong ones, the mean is way too low. The histogram shows up as a single bar at the extreme low end of the scale.
Any ideas what is going on? Are these rasters just corrupted somehow? Am I missing some key setting where I could readjust the sampling of the Grid and/or re-do the statistics (I did try Calculate Statistics, didn't change anything that I could see)?
The files are large (673 MB), floating point Grid and tiff rasters. I'm running 10.1, on a 32 bit Windows 7 machine (though we get similar results on 64-bit).
One other oddity - when I try to manually classify the images, and type in break points in the window on the right side of the box, every time I type in a new value, it shows up twice in the list. So if there are 8 breaks, for example, by the time I type in 4 new ones, I end up with 4 pairs of identical breaks! I'm assuming this could somehow be related to the above?
Any help would be appreciated!
ps just did a test where I converted the raster to an integer file instead of floating point, the classification problems persisted unfortunately...
It sounds as if there are NoData pixels being considered as real pixels and your NoData value is indicated as something else. I would consider trying the Reclassify or something to reclassify the values back to NoData. Or perhaps a Extract by Attribute process to strip out the erroneous pixel values that are skewing the statistics.
I got some help from ESRI support on this, but may not have it figured out all the way yet.
Apparently, the default behavior of ArcGIS is to calculate raster histograms quickly using some unspecified subset of the data (even though it says that the sampling pattern for GRID files is to use all the data/no skipping!). You can force the program to consider all of the unique data values using the instructions contained here: http://resources.arcgis.com/content/kbase?fa=articleShow&d=35443
But if you do that for a large raster, then you run into a separate constraint in terms of the # of values in the attribute table: 'The number of unique values reached the default limitation (>65,536) Note the number (65,636) should be read from settings'.
To fix this, from ESRI support: "To alter the default limitation you can navigate to Customize->ArcMap options and select the raster tab. Navigate to the 'Maximum number of unique values to Render' dialogue and you can change this value to 1000000. You should then be able to alter the classified renderer properties and these mean values should match those within the Raster dataset properties" (apparently you can also change that setting using advancedarcmapoptions.exe)
So I did both steps, and it worked beautifully for two of my 600MB files. But then it has stopped worked on a set of 3 additional files that are essentially identical in size and scope to the first two. Either A. the classification histogram seems to work, but then the breaks shown for the default "natural breaks" 5-class classification all show 0-0 as their range, or B. it produces this error message: "an error occurred within this application. The cause is undetermined." then the symbology window goes blank and you have to force the program to quit. If you set the max # of unique values at 2 billion, sometimes it works, and other times you get an "out of memory" message.
So I'm still a bit perplexed as to what is going on, and why my results are now so unstable. Could it just be that even though these files are ostensibly the same (floating point, 600MB, same extent and cell size) that some of them have way more unique values than the others and this gives ArcGIS memory fits? Or could it be some file corruption happening when these files are uploaded to dropbox using zip or mpk files to contain them?
I'm also wondering where the program stores any histogram that it calculates for a raster file - is it in the STA.adf file for a GRID file, for example? Somewhere else for a TIFF? And should the "calculate statistics" tool create a histogram for the raster that would then be used the next time I tried to classify it?
On our practice, when ArcGIS (raster layer property setting) comes to handle raster images for color enhancement and color-balancing, users are facing many challenges in operation. For example, not to be able to use good & effective image processing methods to adjust histogram, and then save into Lookup table, or threshold specific values for simple classification (and then save as feature class).
It looks to us that it is also true to MD model & Image Analysis, even though color-correction functions/some adjustment methods were introduced starting with 10.0.
Anyhow, we have been looking forward to seeing its improvements for a while ...