Semivariogram Analysis

Tae-Jung_JonathanKwon · ‎04-10-2014

I am trying to use semivariogram to analyse/characterize the spatial patterns of friction measurements collected at an equal interval (every minute) by a mobile unit. The measurements are GPS tagged so they can be visualized on the map as attached below. (the lines represent a road network)
[ATTACH=CONFIG]33016[/ATTACH]

I am trying to build a semivarogram model for each run (collected at different days) and compare their similarities/differences by examining the model parameters (i.e., sill, nugget, range..)

Here I have attached the diagram of datasets collected at two different days (Excel file is also attached)
[ATTACH=CONFIG]33017[/ATTACH]

As you can see, the friction measurements collected on these two days are similar to each other and thus I would expect that their spatial patterns are also similar... But their semivariograms show that they are quite different to each other..

For Day1
[ATTACH=CONFIG]33018[/ATTACH]

For Day2
[ATTACH=CONFIG]33019[/ATTACH]

Would anyone be able to reason why this is the case? I would expect that, for instance, their ranges should be similar but they aren't. I understand that the range (and other parameters) could vary subject to how one fits the empirical semivarainces by using different models (gaussian, exponential...etc), but there still exists a large difference in their spatial characteristics.

Any comments/suggestions would be appreciated...

Anonymous User · ‎04-10-2014

Original User: mboeringa2010

I don't think this type of data is well suited for analysis in Geostatistical Analyst. Geostatistics more or less assume a continuous phenomenon with 2D data, not 1D line data highly depended on something like a road surface type (e.g. ordinary asphalt / bitumen, open asphalt, double layered open etc...).

What assumptions for any spatial model do you have? And what are you trying to achieve by modelling the friction?

I would suggest trying to correlate friction with road surface data first (if available), in an ordinary (non-spatial) statistical software package, and see if there are any statistical relationships at all.

Tae-Jung_JonathanKwon · ‎04-10-2014

Hi, thanks for your answer.
Like I said, I am running the tool to see if there is any difference/similarity between two sets of data collected at different days. I am certain that there are factors that would affect the readings (then I would rather run the simple regression..) but my goal is not to determine these factors but rather to see/verify that the data collected under similar conditions (so do the fricton readings) would exhibit similar spatial patterns. For instance, the friction measurements collected under same weather conditions (like the examples that I've used..) would likely be similar, and so thus the range/sill/nugget would also be similar. But this was not the case for me.. Theoretically, kriging (which involves semivariogram modelling) should also work on any dimension including 1, 2, 3-D surfaces. I've used "R" and the results were similar so I don't think it is software issue.. Just hard to see/understand why the two data that are almost same would produce very different semivariograms...

I don't think this type of data is well suited for analysis in Geostatistical Analyst. Geostatistics more or less assume a continuous phenomenon with 2D data, not 1D line data highly depended on something like a road surface type (e.g. ordinary asphalt / bitumen, open asphalt, double layered open etc...).

What assumptions for any spatial model do you have? And what are you trying to achieve by modelling the friction?

I would suggest trying to correlate friction with road surface data first (if available), in an ordinary (non-spatial) statistical software package, and see if there are any statistical relationships at all.

Anonymous User · ‎04-10-2014

Original User: Eric6346

I talked this over with a few people, and none of us are completely sure exactly why this is happening, but we have a few ideas.

First, our semivariogram estimation algorithms implicitly assume that the data can, in fact, be accurately modeled with a semivariogram. When that is true, it does a good job of estimating the parameters. Unfortunately, when the data cannot be accurately modeled with a semivariogram, the calculations can produce unintuitive results. Even small changes in the input data can manifest in big changes in the semivariogram parameters because it's trying to fit something that fundamentally doesn't fit.

Second, your two datasets aren't as similar as you might think. The line graphs do look similar and mostly honor the same highs and lows, but their variances are quite different (Day 2 has twice the variance of Day 1). This explains the big differences in the sill estimation.

I wish I could give more helpful feedback on this issue, but you may want to rethink how you're quantifying the spatial structure of this phenomenon because it looks like comparing estimated semivariogram parameters is not going to work well.

MarcoBoeringa · ‎04-11-2014

but my goal is not to determine these factors but rather to see/verify that the data collected under similar conditions (so do the fricton readings) would exhibit similar spatial patterns.

Why aren't you simply graphing the two datasets against each other, since the data seems to be measured at about equal locations for each data point? The spread around the line day2=day1 will tell you quite a lot about "similarity".

Anonymous User · ‎04-11-2014

Original User: taejnkwon

Thanks Eric for the valuable feedback.

I think I should describe, at least briefly, what I am trying to do with semivariogram analyses.
The primary objective is to determine potential road monitoring sensor locations, from where road surface slipperiness under different weather conditions can be monitored. It is implicitly assumed that the road surface condition would improve by placing the sensor at that location by, for instance, 50%, and choose the second location in a sequential manner. Following this logic, for one type of weather condition (e.g., snow), the benefit to road users will increase by placing the sensor at a location with the lowest observed friction value (within the tested stretch of road), followed by the second sensor being placed at a site with the second lowest observed friction value.. The problem with this, however, is that there could be two low friction values observed at sites nearby each other and by placing the sensors this way would NOT be considered optimal (again, there is an issue of getting the optimality, but let's put this aside for now). This is where the semivariogram analyses come into play that, for instance, the range of spatial autocorrelation is used such that the first sensor location would also affect adjacent locations (i.e., road conditions would improve) to the same distance of the "range" at a decreasing rate, represented by its empirical semivariogram model. If so, then the second sensor location could possibly be changed to another location. Thus, it is important to analyse and characterize the model parameters for several different road condition types (e.g., icy, snowy, wet, and dry), and if there are some similarities, I can determine potential sensor locations. I anticipated that the under similar weather conditions (so do the friction values), their "range" would also be similar but there were a few exceptions (like the examples I used).

Eventually, I will use estimation error or variance, and formulate the problem as an optimization problem to minimize the total estimation errors for a given number of sensors, but this would be far more complicated and require more processing times. I guess I would have to integrate R or Matlab with ArcGIS to do this.. I believe many people have tried this before, especially in hydrology where the goal is to construct the optimal groundwater quality monitoring network.. Is there any option in ArcGIS that allows me to perform this similar task?

Thanks for your feedback in advance.

I talked this over with a few people, and none of us are completely sure exactly why this is happening, but we have a few ideas.

First, our semivariogram estimation algorithms implicitly assume that the data can, in fact, be accurately modeled with a semivariogram. When that is true, it does a good job of estimating the parameters. Unfortunately, when the data cannot be accurately modeled with a semivariogram, the calculations can produce unintuitive results. Even small changes in the input data can manifest in big changes in the semivariogram parameters because it's trying to fit something that fundamentally doesn't fit.

Second, your two datasets aren't as similar as you might think. The line graphs do look similar and mostly honor the same highs and lows, but their variances are quite different (Day 2 has twice the variance of Day 1). This explains the big differences in the sill estimation.

I wish I could give more helpful feedback on this issue, but you may want to rethink how you're quantifying the spatial structure of this phenomenon because it looks like comparing estimated semivariogram parameters is not going to work well.

Tae-Jung_JonathanKwon · ‎04-11-2014

Thanks for your suggestion but I've tried and confirmed their similarities but needed to quantify how similar/different they are.. Thanks

Why aren't you simply graphing the two datasets against each other, since the data seems to be measured at about equal locations for each data point? The spread around the line day2=day1 will tell you quite a lot about "similarity".

Anonymous User · ‎04-11-2014

Original User: mboeringa2010

This is where the semivariogram analyses come into play that, for instance, the range of spatial autocorrelation is used such that the first sensor location would also affect adjacent locations (i.e., road conditions would improve) to the same distance of the "range" at a decreasing rate, represented by its empirical semivariogram model. If so, then the second sensor location could possibly be changed to another location. Thus, it is important to analyse and characterize the model parameters for several different road condition types (e.g., icy, snowy, wet, and dry), and if there are some similarities, I can determine potential sensor locations. I anticipated that the under similar weather conditions (so do the friction values), their "range" would also be similar but there were a few exceptions (like the examples I used).

Just looking at the first graph you provided, showing the two data-series graphed against distance, tells me there are significant differences in the "spatial autocorrelation" over the entire distance measured, ranging from gradually changing friction values to steeply / abruptly changing friction over short distances consistent with a major change in conditions like a change in road surface type.

These results will make it hard to estimate a single semivariogram or "range" for a given set of conditions. Semivariogram estimation works best with the kind of data you also pointed out: gradually changing continuous phenomena like groundwater levels in sediments, concentrations of pollutants in air or water etc.

Here in the Netherlands, which probably has one of the densest road sensor networks of any country in the world, I think sensor placement is mainly determined by the desire to cover crucial road network links, get data for each section in between two highway intersections, and to cover notorious locations based on road traffic accident data which is registered in a country wide database. There is also lot of modelling going on for stuff like road traffic noise, particulate matter etc. I haven't heared of the actual modelling / optimization of sensor placement on road networks though. I have the feeling that is still largely done by pragmatic decisions / expert knowledge.

Anyway, let's see what Eric has more to say.

Tae-Jung_JonathanKwon · ‎04-11-2014

Hi,

Yes you are absolutely correct on that only the hot-spots are considered when designing the road sensor monitoring network. Hot-spots include, like you well pointed out, intersections, bridges, historically high wind areas/snowy areas.. etc. Here I am about to develop a all new approach to tackle this issue as none considers the spatial correlation factor..

The graph I provided was for dry condition, and the data from snowy/slushy weather conditions show more abrupt changes throughout the test route. Thus when modelling the semivariogram, the range for dry condition would be larger than the one for snowy conditions. I found this trend in many cases but just a few examples I found this hypothesis does not hold true (like the ones I used as an example). Thanks for the feedback though..

Just looking at the first graph you provided, showing the two data-series graphed against distance, tells me there are significant differences in the "spatial autocorrelation" over the entire distance measured, ranging from gradually changing friction values to steeply / abruptly changing friction over short distances consistent with a major change in conditions like a change in road surface type.

These results will make it hard to estimate a single semivariogram or "range" for a given set of conditions. Semivariogram estimation works best with the kind of data you also pointed out: gradually changing continuous phenomena like groundwater levels in sediments, concentrations of pollutants in air or water etc.

Here in the Netherlands, which probably has one of the highest road sensor networks of any country in the world, I think sensor placement is mainly determined by the desire to cover crucial road network links, get data for each section in between two highway intersections, and to cover notorious locations based on road traffic accident data which is registered in a country wide database. There is also lot of modelling going on for stuff like road traffic noise, particulate matter etc. I haven't heared of the actual modelling / optimization of sensor placement on road networks though. I have the feeling that is still largely done by pragmatic decisions / expert knowledge.

Anyway, let's see what Eric has more to say.