When using empirical bayesian kriging on data, that has multiple samples present at the same location, the Geostatistical Wizard asks, if I want to use the max / min / mean value or if I want to "include all". Can anyone please elaborate on the difference between "including all" data compared to using, for example, a mean value?

If you choose Include All, all of the coincident points will be used to model the semivariogram, and they will each be given their own weights when making predictions. There is also the issue of number of neighbors in the searching neighborhood: Imagine that you have 10 values all sampled at the same location and use Include All, and you set the maximum number of neighbors to 10 in the searching neighborhood. In this case, when making predictions near the coincident points, the coincident points will fill up the entire searching neighborhood, and this could potentially give strange results.

When you choose Mean, the value at the location will be assigned the average of the values of the coincident points. This average will be used to model the semivariogram, and the searching neighborhood will treat it as a single value. The sample size will also be adjusted accordingly.

The logic is analogous for Min and Max coincident point options. The Remove All option will treat coincident points as if the value is Null, and none of the values will be used in the semivariogram modeling or prediction stages of kriging. Again, the sample size will be adjusted accordingly.

From the geoprocessing tools, coincident points are handled with an environmental setting. You can read more about it here: http://desktop.arcgis.com/en/desktop/latest/tools/environments/coincident-points.htm