Geostatistical Analyst - Kriging with output in regions defined by a zone data.

2926
2
04-26-2010 11:42 AM
CedricWannaz
New Contributor II
I am starting a new thread based on a hint from Bill Huber in the following thread:

http://forums.arcgis.com/threads/3142-Spatial-Analyst-Zonal-Statistics-error-related-to-9.3-SP-1

Does ArcGIS Desktop (toolbox, geoprocessor) handle Kriging with output in regions defined as polygon feature class or raster?

As explained in the aforementioned thread, I have rasters of components of a wind field, and I want to compute the flow through edges of a multi-scale grid. I initially created buffers around the edges and I was performing a Zonal Statistics of each component using buffers as zonal features. Bill mentioned that a better option would/could be to use a Kriging tool that estimates averages directly in my buffers.

I didn't find such an option either in the tools from the toolbox or directly using the Geostatistical Wizard.

Any help with this would be much appreciated, because Zonal Statistics has some "natural" limitations that are difficult for me to cope with.

Best regards,

Cedric
0 Kudos
2 Replies
CedricWannaz
New Contributor II
Bill,

    Thank you for your post! Please find below a description of what I am doing, with illustrations (corresponding to the letters) in the attached file. I took a rough resolution so it is easier to see.

A.   I have a vector field of wind velocity, whose components in U(lon) and V(lat) directions are given on a regular    (angular-) grid of points.

B.   This field is given every 6 hours over a year. I am interested in the yearly average flow at each location per direction and per sense, so I separate positive and negative components, and generate yearly average components per sense: U+, U-, V+, V-, that are somehow 4 scalar fields, but that can be seen as well as 4 vector fields of components along Lon, Lat +/-axes.

C.   I want to compute flows between cells of a multi-scale grid, that doesn�??t match �?? for obvious reasons - the aforementioned locations. I have to determine average orthogonal wind speed along each edge, to be in an Arakawa C-type situation to some extent.

D.   So my first idea, and this is what is implemented at this point, was to interpolate each component and to obtain �??averages per edge�?�. I am using Kriging for the interpolation (because is make sense to me to converge towards a mean value when we are far from locations were the wind field is given)..

E.   If I want the output flow for a grid cell, I need to know average values of: U+ for the Eastern edge, U- for the Western edge, V+ for the Northern edge, and V- for the Southern edge. I don�??t know how to obtain an average value along an edge, so I thought that it makes sense to create a rectangular buffer around each edge (here the widths have been exaggerated) and to perform a zonal stat of each component on each corresponding edge.

F.   To avoid overlaps that would be problematic when converting buffers to a raster for the zonal stat, I actually do it in two steps: one with horizontal rectangular buffers and one with vertical rectangular buffers.

I think that I cannot use Green�??s theorem or the divergence theorem for this purpose (am I right?), because my surfaces are indeed in the z direction (orthogonal to U+,U-,V+,V-) and I assume the wind speed profile to be constant over each step of my z-discretization (sigma surfaces). I was really excited by your idea however, because it could be very relevant for computing flows in the z direction, through cells bottoms and tops.

I have no experience with focal statistics. To some extent it seems that I am somehow emulating it by creating buffers and performing a zonal statistics, but I don�??t know at this point how to perform a focal statistics around objects of different lengths (the edges) in one or two operations. Is it obvious for you that I should use a focal statistics?

Thank you again for your post and for the discussion!

Cedric
0 Kudos
CedricWannaz
New Contributor II
Bill,

Thank you very much for your answer and my apologies for the lack of responsiveness on my side. I am traveling now, using a satellite connection that doesn�??t help.

Just to get the concepts clear: (lon, lat) are coordinates ... This distinction can be important within your larger grid cells. More about this below.


Yes, I should be more careful with the terminology that I am using. What I meant is that I have velocities in m/s (and not rad/s for theta or phi) at points as shown in the illustration, along axes U and V (in TpS) that are locally (at each point) parallel to parallels and meridians.

Continuing the previous thought, I wonder whether you might really be interested in total fluxes across each cell edge. If so, you need to account for total edge length. ...


I am actually computing the geodesic length of each edge. It is currently done in MATLAB using the Mapping extension for reading the shapefile of edges. I will ultimately do it in Python as the rest of my �??engine�?� is written in Python. As you mention, I could perform a simpler computation based on the latitudes of the N and S sides/edges, but I am already using the Mapping extension for computing the geodesic length of irregular polylines, so I am using it as well in this simpler case. Total fluxes across each cell edge (if you determine them based on the yearly average velocity across each edge) wouldn�??t allow me to determine what was really exchanged between two cells but the difference, and I need to know the in- and out- fluxes for my application.

Of more concern is whether you are kriging the positive and negative parts of each component separately. I don't think you should. Krige the components first, then--if you really must--separate the interpolated components into their positive and negative parts.


At this point I am doing something like this (in a far more optimized way):
for (lon, lat) in points of the grid described in A. :
    for t in all 6hrs timesteps in a year :
        if  windVelocity(lon, lat, t).U > 0
            Upos(lon, lat).value   += velocity(lon, lat, t).U
            Upos(lon, lat).counter += 1
        elseif windVelocity(lon, lat, t).U < 0
            Uneg(lon, lat).value   += velocity(lon, lat, t).U
            Uneg(lon, lat).counter += 1
        // .. same for Vpos, Vneg �?�
Upos(lon, lat).value /= Upos(lon, lat).counter
Uneg(lon, lat).value /= Uneg(lon, lat).counter
// .. same for Vpos, Vneg.


So in the end I have only four scalar fields (defined at locations given by the initial grid) to interpolate. Is your suggestion to interpolate every 6 hours and then combine/separate components using raster algebra for example? It would be a little lengthy, but I could imagine leaving a machine running a long time just on this task. It is interesting; I will certainly try at least on a subset of my �??6hrs�?� datasets and compare.

Also of concern is whether you are cokriging the two components. You definitely should be ... If it's not, cokriging is important.


I will come back on this when I return, and update my post.

I worry about this approach a little, because usually one does not separate a normal velocity into a positive and negative part. But I'll accept that this operation is physically meaningful in your model.


I am actually doing this because I am interested in computing the fate of pollutants released in the air (in any media actually, so I have other grids, but I focused on the air in this post). If you take just two compartments, one loaded with pollutant and the other free of pollutant, and equal but opposed flows between the two, the net flow is 0; however, there are transfers between the two compartments that contribute to mixing the pollutant.

... However, we already implicitly "connected" all such tangent planes ...


I completely agree.

The computer model ... so just make sure to choose the origin and cellsize of the interpolation grid so that its cell centers lie along edges.


This is somehow what I was trying to do with the buffer and the zonal stat. Buffers were targeting cells close enough to each edge, and the zonal stat. was giving the arithmetic mean over each edge neighborhood. As I didn�??t know how to obtain a statistics along (the vicinity of ) a line, I chose to create a thin buffer. Afterwards, I thought that it was moreover a �??good�?� thing because it would make the arithmetic mean more �??stable�?� with respect to artifacts that some interpolation techniques might produce.

For example, if all your edges lie along curves where the latitude or longitude are integers, then perform the kriging on a grid whose cellsize is one (decimal degree) but whose origin is offset by a half degree (for example, make the origin at (-180.5, -90.5)).


I am limited on this side because the grid is not fixed, but flexible and �??user-�?� defined. So I don�??t know a priory where and how much the grid will be refined. It may be possible to create an algorithm that builds ordered lists of longitudes and latitudes, and provides me with well suited pixel size and offset, but I will wait to fully understand the rest of your post (and practiced a bit to be more comfortable with this material) before going this way.

... In your attached Figure C, the cell sizes in the grid appear too be 1, 2, 4, 8, 16, or 32 times a basic cellsize of approximately one degree. To obtain the average along a vertical edge, for instance, perform the interpolation on a finer grid of some fraction of the basic cellsize, ...


As mentioned above, grids can be arbitrarily fine, and it is up to the user of the engine (myself included) to understand up to which resolution it makes sense to refine, according to the resolution of the dataset that he/she has (even if, technically, the engine is able to process cells that are smaller than pixels of datasets, as described in a previous thread that started this one actually). Users can specify refinement ratios for each transition between levels of refinement (I simply chose 1:2 everywhere for the illustration), and the coordinate system in which they want to create the grid (e.g. they can choose a World/Behrmann CS if they want each level of refinement to be made of equal area grid cells). So I cannot really base further algorithms on regularity in the grid, and it would be difficult to re-sample or interpolate all the datasets so they match with the grid (I guess).

A focal mean using a neighborhood of width 1 (cell) and height 10 cells will compute the average of this vertical edge and assign the result to a cell close to its center. ...  centers of all one-degree vertical edges in your model grid you can extract the focal means for those edges. You need to do this for all four scalar fields (U+, U-, V+, V-).


I will test this when I�??ll be back on my machine. Even if multi-scale grids are not fully matching the dataset rasters (in this case wind velocities), I can certainly find edges lengths and centers, and perform something like:
for each edge length L:
    Perform focal stat using neighborhood of width 1 pixel and height nPixels(L).
    Sample the output at locations given by the centers of edges of length L.


But what is the advantage over a �??zonal stat. on relevant buffers�?�, which is pinpointing more specifically the relevant regions? It seems to me (maybe naively) that the latter technique would be faster than a focal mean or a block mean ( at least for high res. rasters) that seem to be performed around every pixel ..(?)

...If you are assuming the z-components are constant, then net flow though any cell will be the sum of the U+ and V+ (around four edges) minus the sum of the U- and V- (around the same four edges). That net flow must equal the total contribution of all sources and sinks within the cell. (That's a restatement of the Divergence Theorem.) For this to work correctly, you must weight the vector components by the lengths of their normal edges


To describe more accurately what I am doing, my wind field is given by GEOS-Chem, that provides me with layers of horizontal velocities in the U and V directions given at certain locations (as mentioned above). Layers are given at certain altitudes like ~170m, ~360m, etc (they are more accurately surfaces of equal pressure). I don�??t have vertical velocities from GEOS-Chem (I could obtain them, but I am not able to fully process them at this time). As I have no source and sink of air, a very basic application of the Divergence theorem (continuity equation) tells me that  vertical (z) fluxes should compensate the default of horizontal fluxes (U, V). Starting from ground level grid cells for whom the vertical velocity through the surface at bottom is assumed to be 0, I can determine the vertical average velocities (and fluxes) through the top surface these cells. Repeating the process following the order of layers, I can obtain top and bottom vertical velocities through subsequent layers cells.

Back to fluxes, I am computing them by taking the average velocity per edge as a mean of the pixels values along each edge, multiplying them by the edges geodesic lengths, and multiplying again by height of the relevant layer. Is this  what you called "weighting" (-of average wind speeds), or was what you had in mind more subtle than that?


Best regards,

Cedric
0 Kudos