Latest Contributions by WilliamHuber

‎12-27-2013

The help page for types of kriging surfaces seems to say that the quantile map for any quantile q (between 0 and 100%) is constructed as the sum of the prediction map and z times the standard error (SE) map where z is the qth quantile for the standard normal distribution. This in fact is not the case for ordinary kriging when the variogram is not a pure nugget, as you can check by constructing these three maps and comparing any GA quantile map (for q differing from 50%) to the prediction and SE maps. The discrepancies--which are both positive and negative and about correct on average--vary with location, prediction, and standard error, so this does not seem to be the result of an approximate calculation. What is the software really computing? What formula does it use? (The help page echoes material from the book Using ArcGIS Geostatistical Analyst. See especially pages 262-264. I tested using no transformations of the data and specified variograms with no measurement error.)

‎05-29-2013

...I think of density as 0-5 accidents per square mile as any other density reading but my results go way above 136 (my total number of points), so I must be wrong. You're probably fine. As you point out, these are counts per square mile, not counts themselves. They can be extremely high over very small areas. The integral of the density ought to total 136 (or near enough). For an illustrated explanation of this in one dimension, please see http://stats.stackexchange.com/questions/4220/a-probability-distribution-value-exceeding-1-is-ok/4223#4223 (which discusses probability densities, but it's exactly the same idea). Also for classification, what is 1/2, 1/3, and 1/4 standard deviation? A standard deviation is just another number exactly like your data. Think of it as a data-centric unit of measurement. So, just as length can be measured in different units (Angstroms, furlongs, rods, meters, parsecs, ...), so can anything else. For instance, if you have a collection of length measurements and their standard deviation is 1.36 meters, then on a standard deviation scale your "unit" is exactly 1.36 meters long. Ergo, 1/4 SD is equivalent to 1.36/4 = 0.34 meters, 1/2 SD is 0.68 meters, and so on. It works the same way for counts per unit area as it does for length. Why use the SD as a unit of measurement? Because for many datasets about two-thirds of the values will be within one SD (one unit) of the average, 95% will be within two SDs of the average, and the vast majority will be within three SDs of the average (the "68-95-99.7" rule). If your dataset does not behave like this, it is exceptional: and that is interesting. So data-centric units of measurement can be useful for quickly diagnosing the behavior of the data (and in the hands of an experienced data analyst, they will suggest ways of re-expressing the data to reveal more information). As an application, when choosing regular intervals of 1/3 SD for the class breaks, you would guess that no more than ten or so classes (ten * 1/3 = 3 1/3) above the average and no more than ten or so classes below the average would be needed to display the full range of data. You would also anticipate that most of the data would fall into the middle six to ten classes, with the remaining 14 to 10 classes (respectively) devoted to displaying the upper and lower extremes. There's also another rule: for all datasets, it is impossible for more than a quarter of the values to be more than 2 SDs from the average (the arithmetic mean) or more than a ninth to be more than 3 SDs from the average or ... or more than 1/n 2 of the data to be more than n SDs from the average. (This is Chebeyshev's Inequality.) This provides upper limits on how many data possibly could fall into various classes defined by multiples of an SD. Thus, by using various small multiples of the SD to set class breakpoints when symbolizing data, you can have foreknowledge of the possible amounts of data within those classes and thereby anticipate and control the appearance of the map.

‎04-19-2011

A polygon "overlaps" a cell if and only if the cell's center falls inside the polygon. Many of your polygons do not contain cell centers. They will disappear in any zonal analysis. Two approaches to a solution are: Replace each disappearing polygon by a point (such as its centroid) and extract the value of the grid cell beneath that point. Resample the grid to a cellsize small enough to ensure that all (or most) polygons contain at least one cell center. Redo the zonal summary.

‎01-29-2011

The toe of the slope is obtained in the final step, Neala, where you create a floor of 98 m. Geometrically, you are creating a collection of similar cones: their tips are pointed upwards and located at all the boundary points. The boundary of the union of all these cones includes the sides of the quarry that would be created if you could dig as deep as possible subject to the 30 degree slope limitation. Imagine doing this but then re-filling the quarry (with water, say) up to a constant elevation of 98 m. The resulting shoreline is the toe of the slopes. The filling is done with the maximum function.

‎12-03-2010

In 9.x, you need to surround most operators (including "/") with white space, Lynn. A good approach is to first carry out a complex series of calculations one at a time in the Raster Calculator so you can catch such problems right where they happen. Later you can combine them into a monolithic expression for SOMA. BTW, it's simpler (and more accurate) to square the sine rather than subtract the square of the cosine from 1: remember, 1 - cos^2(x) = sin^2(x).

‎10-14-2010

Thanks, Donovan. However, I've never seen that link or that symbol. It's not there now: I have scoured the page for anything like it. FireFox 3.6.10, Win 7 x64.

‎09-09-2010

Is there a technique that will work on data in a linear formation? By specifying "linear" this question was asked in exactly the right way, because almost any attempt to do the analysis in two dimensions will produce biased results. Instead, represent the locations as measured distances along the coast, perform a kernel smooth of those points in the single (measure) dimension, and transfer the smooth back to the measured coastline. The kernel smooth can be done in a myriad ways with software ranging from Excel to ArcGIS to R, depending on what you're comfortable with. Transferring it back to the coastline is not so easy. One way is to dissect the coastline into segments, compute the average smoothed value within each segment, and attribute those segments with the averages. This can then be displayed as color (or pattern) coded line segments on the map. This gives you much better cartographic control than a 2D grid, anyway. The example offered is interesting for the cartographic distortions it introduces: by varying the apparent width of the colored coast, it alternately emphasizes and de-emphasizes the sea otter population. This width does not seem to convey any information of its own--maybe it's just a map-making artifact.

‎09-03-2010

Yes, but not a billion! I think you're on the low side, Stan: there are about 10^5 meters per degree and curvature is a squared unit, so failing to project your data will introduce an error of about (10^5)^2 = 10^10 = ten billion. Multiply that by 100 and you're up to an even trillion ;-). Anyway, whenever topographic calculations are off by this many orders of magnitude, it's invariably due to keeping raster data in decimal degrees instead of projecting them: that insight is what prompted Dan's initial response.

‎08-30-2010

The values should be >-0.1 (convex), -0.1 to -0.4 (planar), and >-o.4 (concave). My values are much larger as shown in an attachment. These thresholds are unusual. Curvatures always, in my experience, change sign to differentiate between some form of local convexity and local concavity, with zero always corresponding to flat.

‎08-04-2010

Yeah, these things like ratings, degree of difficulty, voting, marking as answered -- none of these things are air-tight and do contain at least some subjectivity, given the perspective and mood of the poster and the responder. But for the most part they are probably more valuable to many users than having none of them. Having information about the quality of a thread is useful and welcome, Jim. My point, though, is that the self-ratings of difficulty in fact do not tell us anything about difficulty or quality. They are misleading in that regard. I am advocating not implementing such a feature because I believe (based on anecdotal evidence from the old forums) that it is less than useless. You might take a tip from how peer-reviewed publications and Web pages are ranked: namely, by direct citations and links, respectively. If you made it as easy as possible for people to search for solutions in the forums and to reference them, you could build up a useful network of cross-references in the forums that would reveal which postings are worth highlighting. This could also be the nucleus of an automated FAQ, something we have been desperate to have for more than a decade. An example of what I mean by "as easy as possible" would be an automatic background search of the text in any new thread. After the poster has drafted it, she could be presented with a synopsis of closely related materials already in the forum and asked to indicate which, if any, appear to be related. Links to those could appear with her initial message. With a good search facility I suspect many threads would stop right there because their originators would find the answers they wanted (but didn't know how to search for).

‎05-06-2010

The focal mean will do a good job and, provided the neighborhood is not too large, will take just seconds to compute :). You can start up a new ArcGIS process, open the grid, and get the job done while you're waiting for the remaining 40% to finish. Then you can experiment with alternative neighborhood sizes to explore a spectrum of possible solutions.

‎05-05-2010

...clusters comprised of the smallest number of grid cells that meet a given threshold value. Could you elaborate on what this might mean? If by "meet a threshold value" you mean either "have values exceeding a given constant" or "have values equaling a given constant" (it's unclear which is intended), then just select all such grid cells: there's no question of clustering and no question of finding a "smallest number." Thus some clarification of your criterion would be helpful.

‎05-03-2010

What should replace the old MVP program? I'm sure ESRI has been thinking hard about this and is in the process of implementing a new one. The idea behind creating this thread about the question is that you, the users of the old and new forums, are likely to have good ideas about the features the program ought to have. Let's discuss them here (and trust that ESRI will listen). Begin with the objectives. Some that ESRI and all users should be interested in and value highly are: Enhancing the rate at which questions are usefully answered in a timely fashion. Encouraging people to initiate and participate in useful, interesting, and productive conversations. Providing feedback to help frequent contributors improve their work. Providing information to improve searches and the collection of FAQs. Promoting solutions to difficult as well as easy questions. Note that none of these objectives specifically includes rewarding contributors. There are, however, two reasons for augmenting a ratings program with rewards: first, and foremost, an appropriate rewards system promotes the objectives; and second, the value of frequent contributors to ESRI and the user community is high--it's probably the equivalent of several people in technical support--and therefore deserves some kind of compensation and recognition. But how to provide feedback and ratings? As a guiding principle, the user community should determine whether a question has been answered, whether the answer is useful, and how difficult the question was. The MVP program for the old forums made the originator of a thread the sole determiner of all three. This was a good start but, in my humble opinion, ultimately failed for many reasons: Most people with a question, especially newbies, cannot determine how difficult (or time-consuming) it might be to answer that question. (A rational user of the forums should always rate their question as the most difficult, regardless of its actual difficulty, in order to encourage responses. But that defeats the purpose of a difficulty rating.) In some cases highly useful solutions were offered to questions, but the originator of the question was unable to understand or appreciate the solutions (although many subsequent searchers did appreciate them). In the majority of cases, thread originators simply didn't bother to indicate whether they were satisfied with answers provided. It was possible to game the system by posing one simple question as a series of threads, each with a high difficulty rating, allowing a single respondent to garner many points for little work. (I don't believe anyone ever consciously did this, but similar situations did occur from time to time. I have been the beneficiary of a few of them.) Doubling the points for answers after five days of no response had the negative effect of encouraging people not to reply immediately: once a question went a couple of days without an answer, it made more sense just to wait a few more days. A way to overcome these deficiencies exists: provide a mechanism for all forum readers, not just the originator of a thread, to rate a thread's (or posting's) usefulness. Base the MVP awards on cumulative usefulness totals. But don't do so in a linear fashion, for otherwise a single popular posting could dominate the ratings (and allow for certain forms of cheating). Here's a simple example for discussion, not fully worked out but outlined to illustrate the main ideas. Readers could "vote" on the usefulness or interest of any posting (including questions and comments, not just solutions to problems), with the vote being binary (don't like = 0 points, like = 1 point) or numerical (e.g., the old forums used {1, 3, 5}). (BTW, allowing negative votes--although it sounds unfriendly--could be useful for identifying misleading, wrong, or crank messages.) Total votes can be displayed with each message and used for prioritizing search results. The originator of the thread might get extra weight in the voting as a nod to their special interest in the responses. To compute MVP scores, though, the total votes for each rated message would first be transformed in a nonlinear fashion to downweight unusually high totals. E.g., a positive total could be worth one MVP point, a total of 10 or more could be worth two MVP points, 100 or more could be worth three MVP points, etc. These MVP points would be summed over all of a contestant's messages to determine their cumulative MVP points. The scoring for a single contest period would be determined by the difference in cumulative MVP points achieved during that period. (Thus, old posts with ongoing popularity can keep garnering points for a contestant over time. Why not? That might encourage contributions that are longer, more thoughtful, and more complete than otherwise.) This could be augmented by votes for other categories, such as "interest" or "difficulty," but we should be concerned that the system would become unworkably complex. (The main purpose of such auxiliary votes would be providing additional feedback to contributors.) However, ESRI could at its option selectively overweight certain votes or add bonuses, such as for threads that first identify problems in the software and provide solutions that ultimately turn into software enhancements. Let me emphasize one special feature of this proposal: MVP awards are not directly proportional to a sum of individual points. This is what helps us promote the forum's multiple objectives, rather than emphasizing the mere garnering of "points." In particular, the downweighting of highly popular postings encourages broader participation. In the old forums this downweighting was too severe, though: no posting could ever accumulate more than the points associated with the thread's difficulty level. Making the cumulative MVP points earned by each contributor visible to readers could make it easier to identify those whose postings tend to be helpful. (Actually, average points per posting rather than total points would be more meaningful in this regard. Using averages would also help the community identify competent newcomers more quickly.) Finally, it seems worth remarking that if ESRI elects to continue awarding prizes in this program--as it should--it is important that whatever system is set up be clear and transparent. Let us not lose sight of the fact that the purpose of the program (if I may be bold as to say so) is to improve the user experience and only as a secondary matter to reward frequent contributors. I welcome your thoughts and suggestions about how this can be accomplished.

Online Status	Offline
Date Last Visited	‎11-11-2020 02:23 AM

My Ideas

Latest Contributions by WilliamHuber

How are kriging quantile maps calculated?

Re: Point Density Interpretation help please for bike accidents, and question on it p

Re: Data gaps - Zonal Statistics as Table tool

Re: Create slope and generate contours for pit face

Re: Help with Con Statement

Re: Combating spammers

Re: Coastline Population Density Analysis

Re: Planform Curvature values much too high

Re: Planform Curvature values much too high

Re: mark question as answered and easy - hard

Re: Identify clusters that meet a threshold value

Re: Identify clusters that meet a threshold value

MVP program design

Re: Help with Con Statement

Re: Coastline Population Density Analysis