Hi there,
I've been attempting to use EBK 3D to generate subsurface temperature maps for my dissertation. I have ~ 400,000 surface point temperatures derived from a raster that was sampled to the resolution of the cell size (500 meters) and I have ~ 70,000 subsurface temperatures derived from wells. Yes, that's quite the difference in sample populations and the overall histogram looks double peaked. A secondary problem is that there is a known subsurface thermal anomaly, so the subsurface temperature population is itself double peaked.
The purpose of the generated maps is to outline any thermal anomalies, and from my initial results it appears that there are more than just the big one I mentioned, but I'm concerned that the other anomalies are just manifestations of the parameters I've chosen for the model. The model parameters I'm running are below.
* Transformation type: None (If a transformation is used the model looks very jagged / not geological)
* Semivariogram model type: K-Bessel
* Subset size: 20 (smallest to preserve small-scale variation / anomalies)
* Overlap factor: 1
* Number of simulations: 1000
* Elevation inflection factor: Esri Default
* Trend Removal: None (If trend removal is used the model looks very jagged / not geological)
* Eight Sector search neighborhood with 5 neighbors per sector both Max and Min (trying to balance the number of neighbors with preserving local anomalies)
* Search Neighborhood size: Esri Default
From the trainings it seems that I should be choosing an Empirical transformation (since I have basically a three peaked histogram) with first order trend removal (temperature increases with depth) but when I use those parameters the model looks terrible. I guess my concern is is the terrible looking model more correct? Am I choosing the "right" parameters? Are my subsets too small for the number of neighbors I want? Am I choosing the "right" 3D search neighborhood? Should I manually change elevation inflation factor? Again, I'm trying to preserve local variation as much as I can. I also don't understand if first order trends are removed from the population as a whole or are they removed within the search neighborhood.
Sorry if this was a little rambling and for having so many questions. I'm also likely to be asked about some of this for my dissertation defense as well and some of the parameters are a little bit like a black box to me.
Hi @geolane93_KU,
Without seeing the data and having a better understanding of the purpose, it's difficult to give concrete recommendations. However, I do have a few thoughts that might help.
First, if you have ArcGIS Pro 3.0 or later, look into the Compare Geostatistical Layers tool. You can create various different EBK3D outputs and compare their cross validation statistics to see which are more accurate than others. Then can help choosing a subset size, transformations, and semivariogram models.
Second, a subset size of 20 sounds quite small to me, particular for the K-Bessel semivariogram. My experience is that you should use at least 50 points in each subset for a semivariogram model with so many parameters (and, usually, more than 100 is better).
Third, I would consider removing some of the surface points that may be playing too dominant of a role in the model. The problem is alleviated somewhat by using sectored neighborhoods, but the comparatively dense sampling at the surface is likely still negatively impacting subsurface predictions. In particular, I suspect that the estimated Elevation Inflation Factor (EIF) is being most affected here, and the EIF is an extremely important parameter for accurate results.
Fourth, if the jagged edges and artifacts are far away from the input points (like in the top or bottom corner of the 3D extent), then I would not worry too much about them. EBK (2D and 3D) often produces these kinds of artifacts when you extrapolate (predicting outside the input points), but it tends to be very stable when interpolating (predicting between the input points).
Hey Eric,
Thanks for the input about the Compare Geostatistical Layers tool. This morning, I tried taking a subset of my data from one county and varying parameters one by one in the Geostatistical Wizard until I got a relatively good cross validation result. I then took the parameters and applied it to the entire dataset but left the EIF to be determined by default. I was just copying and pasting the results into Excel but having that tool in my toolbox will be nice. I've copied and pasted my model parameters and the cross-validation results below. I won't be able to check the geological nature of the model though until the GA Layer to Points tool is done on the fishnet for my horizon of interest.
Count 435893
Average CRPS 0.649415280032333
Inside 90 Percent Interval 91.11226837779
Inside 95 Percent Interval 95.1543153939155
Mean -0.0118574730387143
Root-Mean-Square 2.79174925039292
Mean Standardized -0.0278007174477421
Root-Mean-Square Standardized 1.35006183520992
Average Standard Error 2.79640144034622
Essentially the purpose of the study is to assess if any subsurface thermal anomalies are present in the data. I thought that using a small subset would be good since I would maybe capture more local variation. Am I losing resolution by increasing subset size?
Also, in terms of the EIF would it be valid to remove all the surface temperatures, check the EIF for just the subsurface data, and then use the subsurface data EIF on the whole dataset in the Geostatistical Wizard? I would guess that would remove the manipulation of the EIF by the surface temperatures.