Below are problem statements and solutions for homework problems on the yield
game collected from the class homework sets, starting in homework 3. Questions are in italics.
Consider a first dabble into addressing the “yield” problem from your final project. For this problem I want you to perform a first or second-order analysis (or multiple analyses) of your choosing. And to do that you will likely find that you need to perform some more runs. So choose wisely. Below is a list of mandatory and suggested items to complete for this assignment.
Mandatory:
First, lets read the data, and reformat into an ordinary design matrix.
data <- read.table("rbg_first17.txt", header=TRUE)[1:7,]
nna <- sum(!is.na(data[,9:18]))
D <- matrix(NA, nrow=nna, ncol=7)
y <- rep(NA, nna)
k <- 1
for(i in 1:nrow(D)) {
for(j in 1:10) {
if(is.na(data[i,8+j])) { break; }
D[k,] <- as.numeric(data[i,2:8])
y[k] <- data[i,8+j]
k <- k+1
}
}
colnames(D) <- names(data)[2:8]
Then convert into coded inputs.
r <- apply(D, 2, range)
X <- as.data.frame(D)
for(j in 1:ncol(X)) X[,j] <- 2*(X[,j] - r[1,j])/(r[2,j] - r[1,j])-1
names(X) <- paste("x", 1:ncol(r), sep="")
apply(X, 2, range)
## x1 x2 x3 x4 x5 x6 x7
## [1,] -1 -1 -1 NaN NaN NaN -1
## [2,] 1 1 1 NaN NaN NaN 1
Since three of the inputs don’t vary, lets discard them for now.
X <- X[,-(4:6)]
Now we’re ready to do a first-order fit.
fit1 <- lm(y~., data=X)
summary(fit1)
##
## Call:
## lm(formula = y ~ ., data = X)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.81706 -0.24804 0.00062 0.24511 0.84520
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.15173 0.07714 170.485 <2e-16 ***
## x1 -0.28208 0.17250 -1.635 0.1124
## x2 -0.45669 0.29206 -1.564 0.1284
## x3 0.75971 0.35351 2.149 0.0398 *
## x7 -0.14644 0.75584 -0.194 0.8477
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3984 on 30 degrees of freedom
## Multiple R-squared: 0.8591, Adjusted R-squared: 0.8403
## F-statistic: 45.73 on 4 and 30 DF, p-value: 2.378e-12
fit1.bak <- step(fit1, scope=formula(fit1), direction="backward", trace=0)
summary(fit1.bak)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = X)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.81706 -0.24194 0.01283 0.24205 0.83300
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.14562 0.06932 189.638 < 2e-16 ***
## x1 -0.31259 0.06932 -4.509 8.72e-05 ***
## x2 -0.51160 0.06932 -7.380 2.61e-08 ***
## x3 0.69259 0.06932 9.991 3.27e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3921 on 31 degrees of freedom
## Multiple R-squared: 0.8589, Adjusted R-squared: 0.8453
## F-statistic: 62.91 on 3 and 31 DF, p-value: 2.776e-13
Here is a first exploration of interactions, focusing on the first three main effects.
fit2 <- lm(y~.^2, data=X[,1:3])
fit2.bak <- step(fit2, scope=formula(fit2), direction="backward", trace=0)
summary(fit2.bak)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x1:x2 + x2:x3, data = X[, 1:3])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.59580 -0.19782 -0.01166 0.17840 0.79382
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.21959 0.07058 187.289 < 2e-16 ***
## x1 -0.23863 0.07058 -3.381 0.00208 **
## x2 -0.58556 0.07058 -8.296 3.81e-09 ***
## x3 0.76655 0.07058 10.860 9.83e-12 ***
## x1:x2 -0.16992 0.07058 -2.407 0.02266 *
## x2:x3 -0.12592 0.07058 -1.784 0.08489 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3645 on 29 degrees of freedom
## Multiple R-squared: 0.886, Adjusted R-squared: 0.8663
## F-statistic: 45.07 on 5 and 29 DF, p-value: 8.417e-13
One option would be to ignore the interaction terms, and do our typical steepest ascent calculations. Looking at the coefficients above, that would move us in the right direction for three out of the four terms. (Since \(x_1\) and \(x_2\) would move “negative”, their product would move positive, i.e., the wrong way for the first interaction term.) Nevertheless, that could look as follows.
b <- coef(fit1.bak)[-1]
delta <- b/abs(b[3])
delta
## x1 x2 x3
## -0.451337 -0.738677 1.000000
xi.path <- path <- data.frame(matrix(NA, nrow=5, ncol=length(b)))
for(i in 1:5) {
path[i,] <- delta * (i-1)
xi.path[i,] <- (path[i,] + 1) * (r[2,1:3] - r[1,1:3])/2 + r[1,1:3]
}
path <- cbind(path, xi.path)
names(path) <- c(names(delta), colnames(D[,1:3]))
path
## x1 x2 x3 N P K
## 1 0.0000000 0.000000 0 11.000000 15.0000000 3.5
## 2 -0.4513370 -0.738677 1 6.937967 11.3066148 5.0
## 3 -0.9026739 -1.477354 2 2.875935 7.6132296 6.5
## 4 -1.3540109 -2.216031 3 -1.186098 3.9198443 8.0
## 5 -1.8053478 -2.954708 4 -5.248130 0.2264591 9.5
Another option could be to try to maximize the predictive surface corresponding to fit2.bak
within the design region, and them maybe also searching outside the design region in increments. However, the effect of \(x_1x_2\) is going to be essentially negated by \(x_2 x_3\) as we move in \((x_1, x_2, x_3)\) space. So it is perhaps not surprising that we obtain the following outcome:
f <- function(x) {
p <- predict(fit2.bak, newdata=data.frame(x1=x[1], x2=x[2], x3=x[3]))
return(0.0-as.numeric(p))
}
optim(rep(0,3), f, method="L-BFGS-B", lower=-1, upper=1)$par
## [1] -1 -1 1
f <- function(x, mu) {
p <- predict(fit2.bak, newdata=data.frame(x1=x[1], x2=x[2], x3=x[3]))
return(0.0-as.numeric(p) + mu * t(x) %*% x)
}
mu <- 1/(1:4)
xi.path2 <- path2 <- data.frame(matrix(NA, nrow=length(mu), ncol=length(b)))
for(i in 1:length(mu)) {
path2[i,] <- optim(rep(0,3), f, method="L-BFGS-B", mu=mu[i])$par
xi.path2[i,] <- (path2[i,] + 1) * (r[2,1:3] - r[1,1:3])/2 + r[1,1:3]
}
path2 <- cbind(path2, xi.path2)
names(path2) <- c(names(delta), colnames(D[,1:3]))
path2
## x1 x2 x3 N P K
## 1 -0.09295638 -0.3102447 0.4028092 10.163393 13.448777 4.104214
## 2 -0.12451434 -0.6715797 0.8511194 9.879371 11.642102 4.776679
## 3 -0.07332405 -1.1166742 1.3607533 10.340084 9.416629 5.541130
## 4 0.10014406 -1.6990267 1.9609959 11.901297 6.504867 6.441494
path
actually. The biggest effect is the downplaying of \(x_1\) relative to \(x_3\).Suggested: (Although these are suggestions, you are expected to do something and report on what happened. Don’t forget to address the second mandatory section below.)
Just to see what happens, I will take the first three elements of both paths above. And then to merge with 6. below, I’ll do a space-filling LHS design in the other 4 variables. In particular …
library(lhs)
D2 <- as.data.frame(2*randomLHS(6, 4)-1)
r[1,4:6] <- 0
r[2,4:6] <- apply(D[,4:6], 2, max)
for(i in 1:nrow(D2)) D2[i,] <- (D2[i,] + 1) * (r[2,-(1:3)] - r[1,-(1:3)])/2 + r[1,-(1:3)]
names(D2) <- colnames(D)[4:7]
The code above is what I ran to generate the new runs, but the actual runs I did are different (due to the random seed). And I’m not going to actually show them to you, or say how many replicates I performed. However, I will reveal that my new runs came back with much better yield values.
newys <- as.numeric(as.matrix(read.table("rbg_first17.txt", header=TRUE)[8:12,9:18]))
newys <- newys[!is.na(newys)]
c(max(y), max(newys))
## [1] 15.4954 16.2487
By way of showing some persistent improvement, the comparison below shows that the averages in the new and old group have an even bigger gap.
c(mean(y), mean(newys))
## [1] 13.01825 14.33825
Before making any new suggestions, lets bring in the full set of data, convert to coded inputs, and then explore some new models.
data <- read.table("rbg_first17.txt", header=TRUE)[1:13,]
nna <- sum(!is.na(data[,9:18]))
D <- matrix(NA, nrow=nna, ncol=7)
y <- rep(NA, nna)
k <- 1
for(i in 1:nrow(D)) {
for(j in 1:10) {
if(is.na(data[i,8+j])) { break; }
D[k,] <- as.numeric(data[i,2:8])
y[k] <- data[i,8+j]
k <- k+1
}
}
colnames(D) <- names(data)[2:8]
r <- apply(D, 2, range)
X <- as.data.frame(D)
for(j in 1:ncol(X)) X[,j] <- 2*(X[,j] - r[1,j])/(r[2,j] - r[1,j])-1
names(X) <- paste("x", 1:ncol(r), sep="")
apply(X, 2, range)
## x1 x2 x3 x4 x5 x6 x7
## [1,] -1 -1 -1 -1 -1 -1 -1
## [2,] 1 1 1 1 1 1 1
How about a simple first order model with interactions.
fit <- lm(y~.^2, data=X)
fit.bak <- step(fit, scope=formula(fit), direction="backward", trace=0)
summary(fit.bak)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x7 + x1:x2 + x1:x4 +
## x1:x5, data = X)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.02496 -0.20578 0.01492 0.22544 0.82908
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.47986 0.09095 159.208 < 2e-16 ***
## x1 -1.51821 0.25003 -6.072 1.23e-07 ***
## x2 -1.16824 0.13252 -8.816 4.17e-12 ***
## x3 0.42194 0.19975 2.112 0.039215 *
## x4 1.32237 0.12352 10.706 4.58e-15 ***
## x5 -2.28353 0.22756 -10.035 4.93e-14 ***
## x7 1.01234 0.24238 4.177 0.000107 ***
## x1:x2 -0.23086 0.08418 -2.742 0.008217 **
## x1:x4 7.88253 0.80377 9.807 1.12e-13 ***
## x1:x5 -6.79517 0.64717 -10.500 9.45e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3738 on 55 degrees of freedom
## Multiple R-squared: 0.9457, Adjusted R-squared: 0.9368
## F-statistic: 106.4 on 9 and 55 DF, p-value: < 2.2e-16
There are too many relevant variables in play, and clearly we don’t have enough runs yet to trust a full second-order model (an easy thing to verify by trying to fit one), and my remaining budget this week not sufficient to fix that. So I’m going to skip the whole second order stuff for now, and jump down to space filling in order to get a few more runs for the week.
(Skip.)
(Skip.)
(Skip.)
I’m going to do a small number space-filling runs within the bounding box of the current cache of runs. To keep it simple, I’m going to borrow the mymaximin
function from class.
library(plgp) ## for distance()
mymaximin <- function(n, m, T=100000, Xorig=NULL) {
X <- matrix(runif(n*m), ncol=m) ## initial design
X <- rbind(X, Xorig) ## This is the only change!
d <- distance(X)
d <- as.numeric(d[upper.tri(d)])
md <- min(d)
for(t in 1:T) {
row <- sample(1:n, 1)
xold <- X[row,] ## random row selection
X[row,] <- runif(m) ## random new row
dprime <- distance(X)
dprime <- as.numeric(dprime[upper.tri(dprime)])
mdprime <- min(dprime)
if(mdprime > md) { md <- mdprime ## accept
} else { X[row,] <- xold } ## reject
}
return(X)
}
Now, running to get four new locations:
Xu <- as.matrix(data[,2:8])
for(j in 1:ncol(Xu)) Xu[,j] <- 2*(Xu[,j] - r[1,j])/(r[2,j] - r[1,j])-1
Xnew <- mymaximin(4,7, Xorig=as.matrix(Xu))[1:4,]
Dnew <- Xnew
for(i in 1:nrow(Dnew)) Dnew[i,] <- (Dnew[i,] + 1) * (r[2,] - r[1,])/2 + r[1,]
I ran those four, randomly choosing the first one to have four replicates (so I can get a sense of the noise this week), and the other three to have two.
Mandatory (again):
Lets read the full data back in, and convert to coded variables.
data <- read.table("rbg_first17.txt", header=TRUE)
nna <- sum(!is.na(data[,9:18]))
D <- matrix(NA, nrow=nna, ncol=7)
y <- rep(NA, nna)
k <- 1
for(i in 1:nrow(D)) {
for(j in 1:10) {
if(is.na(data[i,8+j])) { break; }
D[k,] <- as.numeric(data[i,2:8])
y[k] <- data[i,8+j]
k <- k+1
}
}
colnames(D) <- names(data)[2:8]
r <- apply(D, 2, range)
X <- as.data.frame(D)
for(j in 1:ncol(X)) X[,j] <- 2*(X[,j] - r[1,j])/(r[2,j] - r[1,j])-1
names(X) <- paste("x", 1:ncol(r), sep="")
Now, how about re-fitting our simple first order model with interactions from 1. above.
fit <- lm(y~.^2, data=X)
fit.bak <- step(fit, scope=formula(fit), direction="backward", trace=0)
summary(fit.bak)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x1:x3 + x1:x4 +
## x1:x5 + x1:x6 + x1:x7 + x2:x3 + x2:x4 + x2:x5, data = X)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.02400 -0.26772 0.02515 0.21014 0.83371
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.10147 0.09377 150.387 < 2e-16 ***
## x1 2.48533 0.36883 6.738 7.49e-09 ***
## x2 -1.95209 0.26149 -7.465 4.40e-10 ***
## x3 0.54782 0.21463 2.552 0.013307 *
## x4 0.34526 0.12427 2.778 0.007318 **
## x5 -2.56309 0.27757 -9.234 4.66e-13 ***
## x6 1.88808 0.23327 8.094 3.80e-11 ***
## x7 1.00866 0.25141 4.012 0.000172 ***
## x1:x3 0.54886 0.19649 2.793 0.007023 **
## x1:x4 -2.43444 0.50942 -4.779 1.21e-05 ***
## x1:x5 -1.09842 0.35059 -3.133 0.002693 **
## x1:x6 0.84277 0.28968 2.909 0.005101 **
## x1:x7 -0.65728 0.18692 -3.516 0.000848 ***
## x2:x3 -0.26368 0.15337 -1.719 0.090821 .
## x2:x4 -0.56412 0.36889 -1.529 0.131552
## x2:x5 1.22771 0.41669 2.946 0.004597 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3888 on 59 degrees of freedom
## Multiple R-squared: 0.9388, Adjusted R-squared: 0.9233
## F-statistic: 60.38 on 15 and 59 DF, p-value: < 2.2e-16
Did our max change at all?
max(y)
## [1] 16.2487
Briefly tell me what you did this week on the yield problem, and describe what you learned.
First lets read in the data.
data <- read.table("rbg_first17.txt", header=TRUE)
nna <- sum(!is.na(data[,9:18]))
X <- matrix(NA, nrow=nna, ncol=7)
y <- rep(NA, nna)
k <- 1
for(i in 1:nrow(X)) {
for(j in 1:10) {
if(is.na(data[i,8+j])) { break; }
X[k,] <- as.numeric(data[i,2:8])
y[k] <- data[i,8+j]
k <- k+1
}
}
colnames(X) <- names(data)[2:8]
I decided on some runs to perform in the following way.
Here is our sequential mymaxmin
from above, modified to scale the upper end of random draws in each coordinate.
mymaximin <- function(n, m, upper, T=100000, Xorig=NULL) {
X <- matrix(runif(n*m), ncol=m) ## initial design
for(i in 1:n) X[i,] <- X[i,] * upper
X <- rbind(X, Xorig) ## This is the only change!
d <- distance(X)
d <- as.numeric(d[upper.tri(d)])
md <- min(d)
for(t in 1:T) {
row <- sample(1:n, 1)
xold <- X[row,] ## random row selection
X[row,] <- runif(m)*upper ## random new row
dprime <- distance(X)
dprime <- as.numeric(dprime[upper.tri(dprime)])
mdprime <- min(dprime)
if(mdprime > md) { md <- mdprime ## accept
} else { X[row,] <- xold } ## reject
}
return(X)
}
upper <- apply(X, 2, max)*1.1
Xnew <- mymaximin(8, 7, upper, Xorig=X)[1:8,]
colnames(Xnew) <- colnames(X)
This is basically from the end of the last homework.
fit <- lm(y~.^2, data=as.data.frame(X))
fit.bak <- step(fit, scope=formula(fit), direction="backward", trace=0)
summary(fit.bak)
##
## Call:
## lm(formula = y ~ N + P + K + Na + Ca + Mg + Nx + N:K + N:Na +
## N:Ca + N:Mg + N:Nx + P:K + P:Na + P:Ca, data = as.data.frame(X))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.02400 -0.26772 0.02515 0.21014 0.83371
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.895147 1.481242 6.680 9.38e-09 ***
## N 0.656722 0.111090 5.912 1.81e-07 ***
## P -0.356630 0.114135 -3.125 0.002760 **
## K 0.206572 0.206254 1.002 0.320655
## Na 0.425797 0.086658 4.914 7.46e-06 ***
## Ca -0.219995 0.046594 -4.722 1.49e-05 ***
## Mg 0.186021 0.051843 3.588 0.000678 ***
## Nx 0.120800 0.022414 5.390 1.30e-06 ***
## N:K 0.027104 0.009703 2.793 0.007023 **
## N:Na -0.025157 0.005264 -4.779 1.21e-05 ***
## N:Ca -0.006785 0.002165 -3.133 0.002693 **
## N:Mg 0.020301 0.006978 2.909 0.005101 **
## N:Nx -0.004869 0.001385 -3.516 0.000848 ***
## P:K -0.018922 0.011006 -1.719 0.090821 .
## P:Na -0.008471 0.005540 -1.529 0.131552
## P:Ca 0.011020 0.003740 2.946 0.004597 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3888 on 59 degrees of freedom
## Multiple R-squared: 0.9388, Adjusted R-squared: 0.9233
## F-statistic: 60.38 on 15 and 59 DF, p-value: < 2.2e-16
I’m still not ready to go for quadratic models with the small amount of data that I have.
p <- predict(fit.bak, newdata=as.data.frame(Xnew), se.fit=TRUE)
orders <- data.frame(sfit=sort(p$fit), ofit=order(p$fit), sse=sort(p$se.fit), ose=order(p$se.fit))
orders
## sfit ofit sse ose
## 3 8.81132 3 0.1792659 5
## 6 12.82005 6 0.3744761 6
## 7 13.74133 7 0.6485942 7
## 4 14.42335 4 0.9114264 8
## 5 15.35397 5 0.9958642 2
## 1 16.53453 1 1.1158401 4
## 8 16.87482 8 1.2660747 1
## 2 17.60224 2 1.5779876 3
So it looks like, by my criteria, from the bottom rows of this data frame. I decided to do two replicates at each of four locations to save some budget for next week.
Lets see if I got lucky and made any improvement.
data <- as.matrix(read.table("rbg_first21.txt", header=TRUE))
oldys <- as.numeric(data[1:18,9:18])
oldys <- oldys[!is.na(oldys)]
newys <- as.numeric(data[18:21,9:18])
newys <- newys[!is.na(newys)]
c(max(oldys), max(newys))
## [1] 16.24870 14.19934
Later on, after we covered the details of GP regression, I coded up a sequential search method based on the GP predictive mean fit to the data. The script fits a GP to the existing data, and then optim
is called with the (negative of the) GP predictive mean as an objective, with search initialized at the current best value. The bounds of the search were set to ten percent larger than the current design region to start with, and then after each iteration of search (after choosing optimizing and choosing to perform a yield run at that location) those bounds were narrowed by about 10% each time. Eventually actual progress on maximal yield outputs was obtained. To start out with, I performed several replicates to ensure that I could learn the noise for the week. At the end I was performing just one evaluation (no further replicates). I stopped the iterative search in this fashion once no further progress was being made. The script that I used is in yield_gp.R.