Instructions

This homework is due on Thursday, November 16th at 12:30pm (the start of class). Please turn in all your work. The primary purpose of this homework is explore C and OpenMP in R. This description may change at any time, however notices about substantial changes (requiring more/less work) will be additionally noted on the class web page. Note that there are two prongs to submission, via Canvas and Bitbucket (in asc-repo/hwk/hw5). You don’t need to use Rmarkdown but your work should be just as pretty if you want full marks.

Problem 1: Multivariate Normal Likelihood (35 pts)

Consider a multivariate normal generating mechanism defined in terms of a so-called lengthscale parameter \(\theta\)

\[ Y \sim \mathcal{N}_n (0, \Sigma) \quad \mbox{where} \quad \Sigma_{ij} = \exp \left\{- D_{ij}/\theta \right\} \]

and the \(n \times n\) matrix \(D\) contains pairwise (Euclidean) distances between \(Y_i\) and \(Y_j\) values in a predictor, or \(X\)-space.

Our goal is to infer the unknown \(\theta\) parameter from data by evaluating the likelihood for a grid of \(\theta\) values

thetas <- seq(0.1, 3, length=100)

using data generated as

X <- runif(300, -1, 1)
D <- as.matrix(dist(matrix(X, ncol=1), diag=TRUE, upper=TRUE))
Sigma <- exp(-D) + diag(sqrt(.Machine$double.eps), nrow(D))
library(mvtnorm)
Y <- rmvnorm(10000, sigma=Sigma)

and report on the MLE.

First, lets source the R file.

source("mvnllik.R")

Your task is to

  1. Develop a C/R interface in order to utilize the C version from R, defining the function logliks in R, which takes the same arguments as logliks.R excepting ll.

The C code implmenting the new functionality resides in mvnllik_sol.c. That file includes the new function logliks_R which is called by the new R/C interface function below (after loading the new shared object).

logliks <- function(Y, D, thetas, call=c("logliks_R", "logliks_omp_R"), verb=0) 
  {
    m <- nrow(D)
    if(ncol(Y) != m) stop(" dimension mismatch")

    call <- match.arg(call)

    ret <- .C(call,
        n = as.integer(nrow(Y)),
        m = as.integer(m),
        Y = as.double(t(Y)),
        D = as.double(D),         ## symmetric, no transpose
        thetas = as.double(thetas),
        tlen = as.integer(length(thetas)),
        verb = as.integer(verb),
        llik = double(length(thetas)),
        DUP=FALSE)$llik

    return(ret)
  }

Running this code requires loading a shared object built as R CMD SHLIB -o mvnllik_sol.so mvnllik_sol.c.

dyn.load("mvnllik_sol.so")
  1. Extend the C version to utilize OpenMP parallelization. There are at least two opportunities to do this, but you only need to do one.

The code supporting this is also in the new mvnllik_sol.c file mentioned above, including new C functions logliks_omp_R and logliks_omp. The implementation parallelizes over each of the theta values with #pragma omp parallel.

  1. Show that all four versions give the same likelihood curve, and maximizing value, over the \(\theta\) grid. I.e., show that they give exactly the same output, demonstrating correctness.

First lets perform calculations and records timings (for part d.) for the two non-C options provided in the problem description.

verb <- 0
time.R <- system.time(ll.R <- logliks.R(Y, D, thetas, verb=verb))
time.R2 <- system.time(ll.R2 <- logliks.R(Y, D, thetas, verb=verb, ll=loglik2))

Then the two C versions …

time.C <- system.time(ll.C <- logliks(Y, D, thetas, verb=verb))
time.Comp <- system.time(ll.Comp <- logliks(Y, D, thetas, verb=verb, call="logliks_omp_R"))

The plots below show that all four versions yield the same log likelihood surface and maximizing value.

par(mfrow=c(1,4))
plot(thetas, ll.R, type="l", main="R/lib")
abline(v=thetas[which.max(ll.R)], col=2)
legend("bottomright", "mle", col=2, lty=1)
plot(thetas, ll.R2, type="l", main="R/by-hand")
abline(v=thetas[which.max(ll.R2)], col=2)
plot(thetas, ll.C, type="l", main="C")
abline(v=thetas[which.max(ll.C)], col=2)
plot(thetas, ll.C, type="l", main="C/OpenMP")
abline(v=thetas[which.max(ll.C)], col=2)

  1. Provide a timing comparison between the four versions. Be sure to note how many cores your parallel OpenMP version is using.

The (elapsed) timings are summarized as follows.

c(R=time.R[3], R2=time.R2[3], C=time.C[3], Comp=time.Comp[3])
##    R.elapsed   R2.elapsed    C.elapsed Comp.elapsed 
##       41.038      109.685       41.750        6.011
  1. Now, using an R with accelerated linear algebra (Intel MKL, OSX Accelerate, or Microsoft R/MRAN), re-perform your comparison from d. and report the results. (If your timings are not all* much faster then you’re doing it wrong.)*

This entire Rmd file is re-compiled with an MKL-linked R and provided as hw5_sol_mkl.html.

Problem 2: Parallel bootstrapped linear regression (25 pts)

Parallelize bootreg with OpenMP and report timings comparing your new parallel version to the R and C versions provided in class.

First we load in the R file from lecture.

source("bootreg.R")

The C code supporting this solution can be found in bootreg_sol.c and randomkit.c with randomkit.h.

The shared object may be compiled as R CMD SHLIB -o bootreg_sol.so bootreg_sol.c randomkit.c, and loaded as follows.

dyn.load("bootreg_sol.so")

The following R interface function can be used to call the parallel version on the C side.

bootols <- function(X, y, B=199, icept=TRUE, inv=FALSE, call=c("bootols_R", "bootols_omp_R"))
{
    if(icept) X <- cbind(1, X)  
    m <- ncol(X)
    n <- nrow(X)
    if(n != length(y)) stop("dimension mismatch")

    call <- match.arg(call)

    ret <- .C(call,
              X = as.double(t(X)),
              y = as.double(y),
              n = as.integer(n),
              m = as.integer(m),
              B = as.integer(B),
              inv = as.integer(inv),
              beta.hat = double(m*B),
              DUP = FALSE)

    ## must be careful to return a transposed beta.hat matrix
    return(matrix(ret$beta.hat, nrow=B, byrow=TRUE))
}

Now for some timings.

## generate testing data
n <- 5000
d <- 200
X2 <- 1/matrix(rt(n*d, df=1), ncol=d)
beta <- c(1:d, 0)
Y2 <- beta[1] + X2 %*% beta[-1] + rnorm(100, sd=3)

## testing bootstrapped OLS
time2.Rlm <- system.time(boot.Rlm <- bootols.R(X2, Y2, B=1000, method="lm"))
time2.Rinv <- system.time(boot.Rinv <- bootols.R(X2, Y2, B=1000, method="inv"))
time2.Rsolve <- system.time(beta.Rsolve <- bootols.R(X2, Y2, B=1000, method="solve"))
time2.Rcp <- system.time(beta.Rcp <- bootols.R(X2, Y2, B=1000, method="cp"))
time2.Cinv <- system.time(boot.Cinv <- bootols(X2, Y2, B=1000, inv=TRUE))
time2.Csolve <- system.time(beta.Csolve <- bootols(X2, Y2, B=1000))
time2.Comp <- system.time(beta.Csolve <- bootols(X2, Y2, B=1000, call="bootols_omp_R"))

The (elapsed) timings are summarized as follows.

c(Rlm=time2.Rlm[3], Rinv=time2.Rinv[3], Rsolve=time2.Rsolve[3], Rcp=time2.Rcp[3],
    Cinv=time2.Cinv[3], Csolve=time2.Csolve[3], Comp=time2.Comp[3])
##    Rlm.elapsed   Rinv.elapsed Rsolve.elapsed    Rcp.elapsed   Cinv.elapsed 
##        176.170        223.726        115.159         92.973        117.005 
## Csolve.elapsed   Comp.elapsed 
##        112.043         29.450

Problem 3: Rcpp for swap (15 points)

Revisit the swap function from problem 2 of homework 2 and write a version in Rcpp which, when called as swap.Rcpp(v, i, j) swaps entry i and j without copying the vector v. Demonstrate correctness, and compare timings with our fast swap.eval.

Rcpp uses pointers, which means no copying, and leads to a trivial Cpp implementation.

library(Rcpp)
cppFunction("
void swapCpp(IntegerVector v, int i, int j){
  int tmp = v[i-1];
  v[i-1] = v[j-1];
  v[j-1] = tmp;
  }
")

First lets check for correctness.

v <- 1:1000000000
swapCpp(v, 1, 2)
v[1:5]
## [1] 2 1 3 4 5

To check that it is fast relative to our niave and eval-based solutions, lets paste those in too.

swap <- function(v, i, j)
{
  tmp <- v[i]
  v[i] <- v[j]
  v[j] <- tmp
}
swap.eval <- function()
{
  e <- quote({tmp <- v[i]; v[i] <- v[j]; v[j] <- tmp})
  eval(e, envir=parent.frame())
}

The code below collects timings in duplicate.

v <- 1:1000000000
i <- 1; j <- 2
library(microbenchmark)
time.swap <- c(system.time(swap(v, i, j))[3], system.time(swap(v, i, j))[3])
time.cpp <- c(system.time(swapCpp(v, i, j))[3], system.time(swapCpp(v, i, j))[3])
time.eval <- c(system.time(swap.eval())[3], system.time(swap.eval())[3])

The timings are summarized below.

cbind(swap=time.swap, cpp=time.cpp, eval=time.eval)
##          swap   cpp  eval
## elapsed 0.805 0.001 0.805
## elapsed 0.814 0.000 0.002

Problem 4: Rcpp for mvnllik (25 pts)

Write an Rcpp version of logliks, called logliks.Rcpp which takes the same arguments. You may do this one of two ways, both for extra credit:

Demonstrate correctness (i.e., show that your new version gives the same answers as the old ones) and compare timings with the other methods.

The C++ file mvnllik_sol.cpp contains an Rcpp wrapper which supports both serial and OpenMP versions via an argument omp which is FALSE by default.

sourceCpp("mvnllik_sol.cpp")

The code below collects the output and timings under these two new variations.

time.Rcpp <- system.time(ll.Rcpp <- logliksCpp(t(Y), D, thetas, verb=verb))
time.RcppOmp <- system.time(ll.RcppOmp <- logliksCpp(t(Y), D, thetas, omp=TRUE, verb=verb))

The table below augments the one above with the new timings.

c(R=time.R[3], R2=time.R2[3], C=time.C[3], Comp=time.Comp[3], Rcpp=time.Rcpp[3], RcppOmp=time.RcppOmp[3])
##       R.elapsed      R2.elapsed       C.elapsed    Comp.elapsed 
##          41.038         109.685          41.750           6.011 
##    Rcpp.elapsed RcppOmp.elapsed 
##          41.956           5.627

As a partial solution for the second option, we can consider using RArmadillo to replace dmvnorm. The code linked from that URL has been pasted into mvnllik_arma.cpp, and the following wrapper has been written to enable looping over theta, etc.

library(RcppArmadillo)
sourceCpp("mvnllik_arma.cpp")

loglik.arma.omp <- function(Y, D, theta, nth=8) 
  {
    if(ncol(Y) != nrow(D)) stop(" dimension mismatch")
    Sigma <- exp(-D/theta) + diag(sqrt(.Machine$double.eps), nrow(D))
    return(sum(dmvnrm_arma_mc(Y, rep(0, ncol(Y)), Sigma, TRUE, nth)))
  }

Now we can pass this new loglik.arma.omp function to our logliks.R function, which was written to utilize a generic log density function.

formals(loglik.arma.omp)$nth <- 16
time.RarmaOmp <- system.time(ll.RarmaOmp <- logliks.R(Y, D, thetas, verb=verb, ll=loglik.arma.omp))

The table below augments the ones above with the new time from above.

c(R=time.R[3], R2=time.R2[3], C=time.C[3], Comp=time.Comp[3], Rcpp=time.Rcpp[3], 
    RcppOmp=time.RcppOmp[3], RarmaOmp=time.RarmaOmp[3])
##        R.elapsed       R2.elapsed        C.elapsed     Comp.elapsed 
##           41.038          109.685           41.750            6.011 
##     Rcpp.elapsed  RcppOmp.elapsed RarmaOmp.elapsed 
##           41.956            5.627            8.511

The plots below verify that the new calculations are producing results identical to the old ones.

par(mfrow=c(1,3))
plot(thetas, ll.Rcpp, type="l", main="Rcpp serial")
abline(v=thetas[which.max(ll.R)], col=2)
legend("bottomright", "mle", col=2, lty=1)
plot(thetas, ll.RcppOmp, type="l", main="RCpp/Openmp")
abline(v=thetas[which.max(ll.R2)], col=2)
plot(thetas, ll.RarmaOmp, type="l", main="Rarma/OpenMP")
abline(v=thetas[which.max(ll.R2)], col=2)

Results using MKL are provided in the companion document hw5_sol_mkl.html.