I used EBK Regression Prediction @Geostatistical Analyst Pro 2.8 to predict air pollution parameter using 5 explanatory rasters. One of the rasters was an Euclidean distance raster based on distance from a road feature. All the samples were taken along the road feature. The prediction result completely followed the road feature, which is fine, in my view, because I was estimating CO concentration along the road. This of course resulted to a high standard error in areas away from the road. Again, this is expected since sampling wasn't done outside the road feature/line. But I wanted to find out how the road raster was weighted (prioritized) such that it had the highest weight (presumably) to determine the configuration of the prediction map? I have attached a sample map for your view.
Solved! Go to Solution.
Hi @Elijah ,
The tool is estimating the effect of each explanatory variable with a regression-kriging equation. Each explanatory variable comes with a coefficient that indicates the expected change of the dependent variable for a one-unit increase in the explanatory variable. For distance to roads, it is trying to estimate the effect on CO2 by moving one distance unit (maybe 1 meter) further from a road.
This can become a problem because if all of your data are sampled on or near a road, all of the distances used to estimate the coefficient come from a narrow range of distances, likely all under 10 meters. But when you are predicting, you are predicting to areas relatively far from a road. The patterns that the tool detected in the narrow set of distances likely will not hold up for larger distances.
Generally speaking, you should be cautious when extrapolating outside the range of the explanatory variables of the input points. Specifically, if all of your points are near a road, I would not use distance to roads as an explanatory variable because there isn't enough variation to reliably estimate and extrapolate the effect. To do this, you would need samples of points at varying distances from roads.
Hope that helps,
Eric
PS, I'm also concerned that the distances to roads you're providing aren't accurate. If all points are truly on the road, that distance should always be 0. I'm wondering if the distance to roads of the input points are just an artifact of the cells of the Euclidean Distance raster not landing exactly on the road. If so, the "distance" of each point will be somewhat random and not correspond to any correlation between roads and CO2.
What is EBK Regression Prediction?—ArcGIS Pro | Documentation
There are a number of cautions contained within that link.
Hi @Elijah ,
The tool is estimating the effect of each explanatory variable with a regression-kriging equation. Each explanatory variable comes with a coefficient that indicates the expected change of the dependent variable for a one-unit increase in the explanatory variable. For distance to roads, it is trying to estimate the effect on CO2 by moving one distance unit (maybe 1 meter) further from a road.
This can become a problem because if all of your data are sampled on or near a road, all of the distances used to estimate the coefficient come from a narrow range of distances, likely all under 10 meters. But when you are predicting, you are predicting to areas relatively far from a road. The patterns that the tool detected in the narrow set of distances likely will not hold up for larger distances.
Generally speaking, you should be cautious when extrapolating outside the range of the explanatory variables of the input points. Specifically, if all of your points are near a road, I would not use distance to roads as an explanatory variable because there isn't enough variation to reliably estimate and extrapolate the effect. To do this, you would need samples of points at varying distances from roads.
Hope that helps,
Eric
PS, I'm also concerned that the distances to roads you're providing aren't accurate. If all points are truly on the road, that distance should always be 0. I'm wondering if the distance to roads of the input points are just an artifact of the cells of the Euclidean Distance raster not landing exactly on the road. If so, the "distance" of each point will be somewhat random and not correspond to any correlation between roads and CO2.