This homework is due on Thursday, November 16th at 12:30pm (the start of class). Please turn in all your work. The primary purpose of this homework is explore C and OpenMP in R. This description may change at any time, however notices about substantial changes (requiring more/less work) will be additionally noted on the class web page. Note that there are two prongs to submission, via Canvas and Bitbucket (in asc-repo/hwk/hw5). You don’t need to use Rmarkdown but your work should be just as pretty if you want full marks.

Problem 1: Multivariate Normal Likelihood (35 pts)

Consider a multivariate normal generating mechanism defined in terms of a so-called lengthscale parameter \(\theta\)

\[ Y \sim \mathcal{N}_n (0, \Sigma) \quad \mbox{where} \quad \Sigma_{ij} = \exp \left\{- D_{ij}/\theta \right\} \]

and the \(n \times n\) matrix \(D\) contains pairwise (Euclidean) distances between \(Y_i\) and \(Y_j\) values in a predictor, or \(X\)-space.

Our goal is to infer the unknown \(\theta\) parameter from data by evaluating the likelihood for a grid of \(\theta\) values

thetas <- seq(0.1, 3, length=100)

using data generated as

X <- runif(300, -1, 1)
D <- as.matrix(dist(matrix(X, ncol=1), diag=TRUE, upper=TRUE))
Sigma <- exp(-D) + diag(sqrt(.Machine$double.eps), nrow(D))
Y <- rmvnorm(10000, sigma=Sigma)

and report on the MLE.

Your task is to

  1. Develop a C/R interface in order to utilize the C version from R, defining the function logliks in R, which takes the same arguments as logliks.R excepting ll.
  2. Extend the C version to utilize OpenMP parallelization. There are at least two opportunities to do this, but you only need to do one.
  3. Show that all four versions give the same likelihood curve, and maximizing value, over the \(\theta\) grid. I.e., show that they give exactly the same output, demonstrating correctness.
  4. Provide a timing comparison between the four versions. Be sure to note how many cores your parallel OpenMP version is using.
  5. Now, using an R with accelerated linear algebra (Intel MKL, OSX Accelerate, or Microsoft R/MRAN), re-perform your comparison from d. and report the results. (If your timings are not all much faster then you’re doing it wrong.)

Problem 2: Parallel bootstrapped linear regression (25 pts)

Parallelize bootreg with OpenMP and report timings comparing your new parallel version to the R and C versions provided in class.

Here are some helpful suggestions/observations.

Problem 3: Rcpp for swap (15 points)

Revisit the swap function from problem 2 of homework 2 and write a version in Rcpp which, when called as swap.Rcpp(v, i, j) swaps entry i and j without copying the vector v. Demonstrate correctness, and compare timings with our fast swap.eval.

Problem 4: Rcpp for mvnllik (25 pts)

Write an Rcpp version of logliks, called logliks.Rcpp which takes the same arguments. You may do this one of two ways, both for extra credit:

Demonstrate correctness (i.e., show that your new version gives the same answers as the old ones) and compare timings with the other methods.