Universal kriging is available in the Geostatistical Wizard. It's one of the six kriging types. The trend is calculated from polynomials of the (x,y) coordinates; it does not support covariates other than the spatial location. However, covariates can be used as cokriging variables.
If you have a good MLR model already, I wouldn't try to use the covariates as cokriging variables. If you want to try it anyway, in the Geostatistical Wizard, when you choose kriging on the first page, you can enter up to four datasets. The first one you enter is the variable you will interpolate, and the three additional datasets will be used as cokriging variables.
"Kriging" is often called "residual kriging," and there's a reason for this: you always perform kriging on the residuals of some model. This model can be almost anything, but "regression kriging," "kriging with external drift," "universal kriging," and "linear mixed model kriging" all generally refer to the simultaneous estimation of the covariate coefficients and the kriging parameters. However, you may find success with doing a sequential estimation: first calculate the coefficients using your MLR model. Then calculate the residuals, and perform Simple kriging on these residuals (you should use Simple kriging instead of Ordinary kriging because you know that the mean of the residuals is 0). Then add these interpolated residuals back into the MLR predictions. You'll lose some power because you are sequentially calculating parameters (rather than simultaneously estimating them), but you should still get defensible results.
As for comparing the values in one point to the average of the neighboring points, the Semivariogram/Covariance cloud is probably the best way to visualize this, but the result is a graph rather than a single correlation coefficient. If you really need to calculate the correlation coefficient (and you're ok with ignoring spatial correlation in the analysis), we have a tool called Neighborhood Selection that selects the neighbors of an input (x,y) location (use the same neighborhood parameters that you used in kriging). It will probably take a lot of work, but I'm sure you can write a Python script that will do what you're trying to do. I've never personally done this, so I don't want to try to outline an algorithm, but I'm sure all the tools are there to accomplish this task.
The steps you've outlined are correct.
Unfortunately, there's no way to make a continuous OLS surface. You'll need to make separate rasters for each resolution you want to test, but you only need to krig on the residuals once. You can export the interpolated residual surface to any cell size and extent that you want.
I now understand what you're trying to do with the correlation coefficient. However, I don't think a correlation coefficient will work here because if you want to correlate a single point to the mean of its neighbors, you'll only be able to calculate a single coefficient for the entire surface (since you need repeated samples), so it won't help you in deciding which particular locations should be given preference.
The first thing that comes to mind is the Voronoi Map tool. It's an interactive graphical tool, and if you use Standard Deviation, Entropy, or Interquartile Range, you'll get an estimate of the local variability. A small local variability indicates that the predictions are more constant in that area, so they might be good candidates for new sites because the area can be better represented by a single value. Note that you'll need to convert your rasters to points to run the tool.