Many scientific phenomena are studied via mathematical (i.e., computer) models and field experiments, simultaneously.
Computer simulations are lots cheaper, but usually not so cheap as to allow infinite exploration of the configuration space(s).
So the goal here is to build an apparatus that can harmonize the two data types
There are three processes involved:
Let \(y^F(x)\) denote a field observation under conditions \(x\), and \(y^R(x)\) denote the real output under condition \(x\).
\(R\) and \(F\) are assumed to be related as follows.
\[ y^F(x) = y^R(x) + \varepsilon, \quad \mbox{ where } \varepsilon \stackrel{\mathrm{iid}}{\sim} \mathcal{N}(0, \sigma_\varepsilon^2) \]
Let \(y^M(x, u)\) denote the (deterministic) output of a computer model run under conditions \(x\) and tuning or calibration parameters \(u\).
Some people make a big deal about the difference between the two, calling the former a tuning parameter, and the latter a calibration parameter.
Kennedy and O’Hagan (2001) proposed a framework for coupling \(M\) and \(F\).
KOH represent a real process \(R\) as
\[ \begin{aligned} y^R(x) &= y^M(x, u^*) + b(x), \\ \mbox{so that } \quad y^F(x) &= y^M(x,u^*) + b(x) + \varepsilon \end{aligned} \]
\[ -b(x) = y^M(x, u^*) - y^R(x). \]
The unknowns are \(u^*\), \(\sigma_\varepsilon^2\), and the discrepancy \(b(\cdot)\).
If the computer model \(M\) is fast, then inference is straightforward via \(N_F\) residuals between computer model outputs and field observations
\[ y^F(X^F) - y^M(X^F, u) \]
If \(M\) is slow or otherwise indirectly available,
Rather than performing inference for \(y^M\) separately, using just the \(N_M\) runs, as is typical of a computer experiment in isolation,
From a Bayesian perspective this is the coherent thing to do:
But this approach is fraught with computational challenges.
Liu, et al., (2009), proposed going “back to basics” by fitting \(\hat{y}^M(\cdot, \cdot)\) first,
Ok, enough of the high-level stuff, lets see how this works …
We wish to predict the amount of time it takes for a wiffle ball to hit the ground when dropped at a certain height.
So we perform a physical experiment:
(Many thanks to Derek Bingham and Jason Loeppky for this example.)
ball <- read.csv("ball.csv")
plot(ball, xlab="height", ylab="time")
One option is, of course, to fit the field data with a GP, be done and go home.
library(laGP)
field.fit <- newGP(as.matrix(ball$height), ball$time, d=0.1,
g=var(ball$time)/10, dK=TRUE)
mle <- jmleGP(field.fit, drange=c(eps, 10), grange=c(eps, var(ball$time)),
dab=c(3/2, 8))
Now make a prediction using this \(\hat{y}^F\) on a grid,
hr <- range(ball$height)
hs <- seq(0, 1, length=100)
heights <- hs*diff(hr)+hr[1]
p <- predGP(field.fit, as.matrix(heights), lite=TRUE)
The result is too wigly, and involves high uncertainty in the gap.
plot(ball, xlab="height", ylab="time"); lines(heights, p$mean)
lines(heights, qnorm(0.05, p$mean, sqrt(p$s2)), lty=2)
lines(heights, qnorm(0.95, p$mean, sqrt(p$s2)), lty=2)
lines(heights, 10*sqrt(p$s2)-0.6, col=2, lty=3, lwd=2)
legend("topleft", c("Fhat summary", "Fhat sd"), lty=c(1,3), col=c(1,2))
Perhaps by coupling with “known physics” we can mitigate that effect.
What does “Physics 101” say?
\[ t = \sqrt{2h/g}. \]
Somewhat realistically, we don’t know the value of \(g\) for the location where the balls were dropped.
Consider the following computer implementation of our mathematical model using coded inputs in \([0,1]^2\)
timedrop <- function(x, u, hr, gr)
{
g <- diff(gr)*u + gr[1]
h <- diff(hr)*x + hr[1]
return(sqrt(2*h/g))
}
Now lets fit the computer model on a maximin LHS in 2d of size 21.
library(lhs)
XU <- maximinLHS(21, 2) ## we're going to want to randomize over this
gr <- c(6, 14)
ym <- timedrop(XU[,1], XU[,2], hr, gr)
Now lets train a GP on those realizations.
ymhat <- newGPsep(XU, ym, d=0.1, g=1e-7, dK=TRUE)
mle <- mleGPsep(ymhat, tmax=10)
Lets visualize our computer model output over a range of heights, for particular choices of \(g\).
Some better than others, but possibly all biased.
us <- seq(0, 1, length=6)
XX <- expand.grid(hs, us)
pmhat <- predGPsep(ymhat, XX, lite=TRUE)
plot(ball); matlines(heights, matrix(pmhat$m, ncol=length(us)))
The modularized apparatus calibrates \(u\) via the discrepancy between emulated computer model output and field data runs.
bhat.fit <- function(X, Y, Ym, da, ga, clean=TRUE)
{
bhat <- newGPsep(X, Y-Ym, d=da$start, g=ga$start, dK=TRUE)
if(ga$mle) cmle <- jmleGPsep(bhat, drange=c(da$min, da$max),
grange=c(ga$min, ga$max), dab=da$ab, gab=ga$ab)
else cmle <- mleGPsep(bhat, tmin=da$min, tmax=da$max, ab=da$ab)
cmle$nll <- - llikGPsep(bhat, dab=da$ab, gab=ga$ab)
if(clean) deleteGPsep(bhat)
else cmle$gp <- bhat
return(cmle)
}
bhat.fit
combines \(\hat{b} + \hat{\sigma}_\varepsilon^2\) fits.Now we need to create an objective that we can optimize, over coded gravity \(u\)-values, to find the best setting \(\hat{u}\) estimating the unknown \(u^*\).
calib <- function(u, X, Y, ymhat, da, ga, clean=TRUE)
{
Xu <- cbind(X, matrix(rep(u, nrow(X)), ncol=length(u), byrow=TRUE))
Ym <- predGPsep(ymhat, Xu, lite=TRUE)$mean
cmle <- bhat.fit(X, Y, Ym, da, ga, clean=clean)
return(cmle)
}
Since its in 1d, lets evaluate it on a \(u\)-grid.
u <- seq(0, 1, length=100)
unll <- rep(NA, length(u))
X <- as.matrix((ball$height - hr[1])/diff(hr))
da <- darg(list(mle=TRUE), expand.grid(X[,1], u))
ga <- garg(list(mle=TRUE), ball$time)
for(i in 1:length(u)) unll[i] <- calib(u[i], X, ball$time, ymhat, da, ga)$nll
Visualizing the likelihood surface for \(u\).
plot(u, unll, type="l", xlab="u", ylab="negative log likelihood")
obj <- function(x, X, Y, ymhat, da, ga) calib(x, X, Y, ymhat, da, ga)$nll
soln <- optimize(obj, lower=0, upper=1, X=X, Y=ball$time,
ymhat=ymhat, da=da, ga=ga)
uhat <- soln$minimum; abline(v=uhat, col=2, lty=2)
Lets run back through some of the calculations to get out the estimated bias (gpi
reference) with the \(\hat{u}\) value we found.
clean=FALSE
to bhat
:bhat <- calib(uhat, X, ball$time, ymhat, da, ga, clean=FALSE)
Then we may obtain predictions over our heights grid, with full covariance for later.
p <- predGPsep(bhat$gp, as.matrix(hs))
mb <- p$mean
q1b <- qnorm(0.95, mb, sqrt(diag(p$Sigma)))
q2b <- qnorm(0.05, mb, sqrt(diag(p$Sigma)))
qr <- range(c(q1b, q2b))
The bias straddles zero.
plot(heights, mb, type="l", xlab="height", ylab="time bias", ylim=qr)
lines(heights, q1b, col=2, lty=2); lines(heights, q2b, col=2, lt=2)
First, obtain the prediction from the emulator, with full covariance structure.
pmhat <- predGPsep(ymhat, cbind(hs, uhat))
Now, for full propagation of uncertainty, lets combine sample paths from both emulator and bias processes.
library(mvtnorm)
Ym <- rmvnorm(1000, pmhat$mean, pmhat$Sigma)
Yb <- rmvnorm(1000, p$mea, p$Sigma)
Yc <- Ym + Yb
Extract quantiles from the combined sample paths.
q1c <- apply(Yc, 2, quantile, prob=0.05)
q2c <- apply(Yc, 2, quantile, prob=0.95)
Craziness!
plot(ball); lines(heights, pmhat$mean)
lines(heights, pmhat$mean + mb, col=3, lwd=2)
lines(heights, q1c, col=3, lty=2)
lines(heights, q2c, col=3, lty=2)
legend("topleft", c("yMhat", "yMhat+bhat"), col=c(1,3), lty=1, lwd=1:2)
The calibration apparatus doesn’t care about minimizing bias;
\[ \hat{Y}_{N_F}^{B|u} = Y_{N_F}^F - \hat{Y}_{N_F}^M|u \]
When it chooses the \(\hat{b}\) hyperparameters and \(\hat{u}\), via large likelihood,
In particular, our estimate of the gravitational constant, \(\hat{g}\) via \(\hat{u}\)
ghat <- uhat*diff(gr)+gr[1]
ghat
## [1] 6.454133
loses some of its physical interpretation (and its way too small).
We have to be satisfied with \(g\) as a “tuning” parameter, challenging interpretation.
If minimizing bias is really what we want, then some adjustments are needed. See
Both sacrifice prediction for enhanced interpretation.
The thing to keep in mind is that the calibration apparatus couples two highly flexible nonparametric GP models, linked by a tuning parameter \(u\).
Authors looking for more flexible GP models have deliberately deployed similar tactics outside the calibration setting.
Surprisingly, the KOH framework nests these two options, yet precedes them by more than a decade.
What happens when we remove some of that flexibility in the calibration context?
Here is how you can accomplish that with laGP
.
se2.fit <- function(X, Y, Ym, clean=TRUE)
{
gp <- newGP(X, Y-Ym, d = 0, g = 0)
cmle <- list(nll=-llikGP(gp))
if(clean) deleteGP(gp)
else cmle$gp <- gp
return(cmle)
}
bhat.fit
.We need a slightly adjusted calibration objective.
calib.nobias <- function(u, X, Y, ymhat, clean=TRUE)
{
Xu <- cbind(X, matrix(rep(u, nrow(X)), ncol=length(u), byrow=TRUE))
Ym <- predGPsep(ymhat, Xu, lite=TRUE)$mean
cmle <- se2.fit(X, Y, Ym, clean=clean)
return(cmle)
}
Again, since its in 1d, lets evaluate it on a \(u\)-grid.
unll.se2 <- rep(NA, length(u))
for(i in 1:length(u))
unll.se2[i] <- calib.nobias(u[i], X, ball$time, ymhat)$nll
plot(u, unll.se2, type="l", xlab="u", ylab="negative log likelihood")
abline(v=uhat, col=2, lty=2); legend("top", "uhat-biased", col=2, lty=2, bty="n")
obj.nobias <- function(x, X, Y, ymhat) calib.nobias(x, X, Y, ymhat)$nll
soln <- optimize(obj.nobias, lower=0, upper=1, X=X, Y=ball$time, ymhat=ymhat)
uhat.nobias <- soln$minimum
ghat.nobias <- uhat.nobias*diff(gr)+gr[1]
ghat.nobias
## [1] 8.158814
How do our predicted times look?
cmle.nobias <- calib.nobias(uhat.nobias, X, ball$time, ymhat, clean=FALSE)
se2.p <- predGP(cmle.nobias$gp, as.matrix(hs), lite=TRUE)
pmhat.nobias <- predGPsep(ymhat, cbind(hs, uhat.nobias), lite=TRUE)
q1nob <- qnorm(0.05, pmhat.nobias$mean, sqrt(pmhat.nobias$s2+se2.p$s2))
q2nob <- qnorm(0.95, pmhat.nobias$mean, sqrt(pmhat.nobias$s2+se2.p$s2))
Cleaner, but better? Maybe it under-predicts for higher balls?
plot(ball); lines(heights, pmhat.nobias$mean, col=4, lwd=2)
lines(heights, q1nob, col=4, lty=2)
lines(heights, q2nob, col=4, lty=2)
legend("topleft", c("yMhat+se2"), col=4, lty=1, lwd=2)
When we have two models and we don’t know which is best,
In what follows we collect some of the code above into stand-alone functions that can be called in a leave-one-out fashion,
Note that throughout we are conditioning on the computer model fit to the full LHS sample.
Cutting-and-pasting from earlier code.
calib.pred <- function(XX, X, Y, ymhat, da, ga, T=1000)
{
g <- unll <- u <- seq(0,1, length(100))
for(i in 1:length(u)) {
cmle <- calib(u[i], X, Y, ymhat, da, ga)
unll[i] <- cmle$nll; g[i] <- cmle$g
}
ga$mle <- FALSE; ga$start <- g[which.min(unll)]
soln <- optimize(obj, lower=0, upper=1, X=X, Y=Y, ymhat=ymhat, da=da, ga=ga)
bhat <- calib(soln$minimum, X, Y, ymhat, da, ga, clean=FALSE)
p <- predGPsep(bhat$gp, XX)
pmhat <- predGPsep(ymhat, cbind(XX, soln$minimum))
Yc <- rmvnorm(T, pmhat$mean, pmhat$Sigma) + rmvnorm(T, p$mean, p$Sigma)
mc <- pmhat$mean + p$mean; s2c <- apply(Yc, 2, var)
q1c <- apply(Yc, 2, quantile, prob=0.05)
q2c <- apply(Yc, 2, quantile, prob=0.95)
deleteGPsep(bhat$gp)
return(list(mean=mc, s2=s2c, q1=q1c, q2=q2c, uhat=soln$minimum))
}
ga <- garg(list(mle=TRUE), ball$time)
uhat <- q1 <- q2 <- m <- s2 <- rep(NA, nrow(X))
for(i in 1:nrow(X)) {
cp <- calib.pred(X[i,,drop=FALSE], X[-i,,drop=FALSE], ball$time[-i],
ymhat, da, ga)
m[i] <- cp$mean; s2[i] <- cp$s2
q1[i] <- cp$q1; q2[i] <- cp$q2
uhat[i] <- cp$uhat
}
What \(\hat{u}\) values did we get?
summary(uhat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01783 0.05225 0.05597 0.05762 0.06238 0.12770
plot(ball)
points(ball$height, m, col=2, pch=20)
segments(ball$height, q1, ball$height, q2, col=2)
calib.nobias.pred <- function(XX, X, Y, ymhat)
{
soln <- optimize(obj.nobias, lower=0, upper=1, X=X, Y=Y, ymhat=ymhat)
bhat <- calib.nobias(soln$minimum, X, Y, ymhat, clean=FALSE)
p <- predGP(bhat$gp, XX, lite=TRUE)
pmhat <- predGPsep(ymhat, cbind(XX, soln$minimum), lite=TRUE)
mc <- pmhat$mean + p$mean; s2c <- pmhat$s2 + p$s2
q1c <- qnorm(0.05, mc, sqrt(s2c)); q2c <- qnorm(0.95, mc, sqrt(s2c))
deleteGP(bhat$gp)
return(list(mean=mc, s2=s2c, q1=q1c, q2=q2c, uhat=soln$minimum))
}
Again, cutting and pasting above. Then leave-one-out prediction below.
q1nb <- q2nb <- mnb <- s2nb <- rep(NA, nrow(X))
for(i in 1:nrow(X)) {
cp <- calib.nobias.pred(X[i,,drop=FALSE], X[-i,,drop=FALSE],
ball$time[-i], ymhat)
mnb[i] <- cp$mean; s2nb[i] <- cp$s2; q1nb[i] <- cp$q1; q2nb[i] <- cp$q2
}
plot(ball)
points(ball$height, mnb, col=3, pch=20)
segments(ball$height, q1nb, ball$height, q2nb, col=3)
Comparison by proper scoring (Gneiting & Raftery, 2007; Eq (27)):
b <- mean(- (ball$time - m)^2/s2 - log(s2))
nb <- mean(- (ball$time - mnb)^2/s2nb - log(s2nb))
scores <- c(biased=b, unbiased=nb)
scores
## biased unbiased
## 4.216665 3.977722
Don’t forget that the computer experiment design was random (LHS),