Department of Statistics, Virginia Tech" output: html_document --- ## Instructions This homework is **due on Thursday, November 16th at 12:30pm** (the start of class). Please turn in all your work. The primary purpose of this homework is explore `C` and `OpenMP` in R. This description may change at any time, however notices about substantial changes (requiring more/less work) will be additionally noted on the class web page. Note that there are two prongs to submission, via Canvas and Bitbucket (in `asc-repo/hwk/hw5`). You don't need to use `Rmarkdown` but your work should be just as pretty if you want full marks. ## Problem 1: Multivariate Normal Likelihood (35 pts) Consider a multivariate normal generating mechanism defined in terms of a so-called lengthscale parameter $\theta$ $$ Y \sim \mathcal{N}_n (0, \Sigma) \quad \mbox{where} \quad \Sigma_{ij} = \exp \left\{- D_{ij}/\theta \right\} $$ and the $n \times n$ matrix $D$ contains pairwise (Euclidean) distances between $Y_i$ and $Y_j$ values in a predictor, or $X$-space. - Treat $\theta$ as a positive scalar quantity. - Such a mechanism comprises the essence of Gaussian process regression. Our goal is to infer the unknown $\theta$ parameter from data by evaluating the likelihood for a grid of $\theta$ values ```{r} thetas <- seq(0.1, 3, length=100) ``` using data generated as ```{r} X <- runif(300, -1, 1) D <- as.matrix(dist(matrix(X, ncol=1), diag=TRUE, upper=TRUE)) Sigma <- exp(-D) + diag(sqrt(.Machine$double.eps), nrow(D)) library(mvtnorm) Y <- rmvnorm(10000, sigma=Sigma) ``` and report on the MLE. - The functions in [mvnllik.R](mvnllik.R) provide two versions (coded in R with and without the help of external libraries) of evaluating the likelihood via the MVN density function, and it also contains a wrapper function iterating over a vector of potential $\theta$ values. - The functions is [mvnllik.c](mvnllik.c) provide analogues in C. Your task is to a. Develop a C/R interface in order to utilize the C version from R, defining the function `logliks` in R, which takes the same arguments as `logliks.R` excepting `ll`. b. Extend the C version to utilize `OpenMP` parallelization. There are at least two opportunities to do this, but you only need to do one. c. Show that all four versions give the same likelihood curve, and maximizing value, over the $\theta$ grid. I.e., show that they give exactly the same output, demonstrating correctness. d. Provide a timing comparison between the four versions. Be sure to note how many cores your parallel `OpenMP` version is using. e. Now, using an R with accelerated linear algebra (Intel MKL, OSX Accelerate, or Microsoft R/MRAN), re-perform your comparison from d. and report the results. (If your timings are not *all* much faster then you're doing it wrong.) ## Problem 2: Parallel bootstrapped linear regression (25 pts) Parallelize `bootreg` with `OpenMP` and report timings comparing your new parallel version to the R and `C` versions provided in class. - Your comparison should include both ordinary R and a version with accelerated linear algebra. Here are some helpful suggestions/observations. - I recommend using "`#pragma omp parallel`" rather than "`#pragma omp parallel for`". - Be careful to ensure that the parallelized instances (threads) are actually algorithmically independent, which may require introducing some auxiliary storage. - R's random number generator (RNG) is not "thread safe", which means that if two threads ask for a random deviate at the same time, they might get the same number (i.e., not very random). How can you work around that without compromising efficiency? ## Problem 3: `Rcpp` for `swap` (15 points) Revisit the swap function from problem 2 of [homework 2](hwk2.html) and write a version in `Rcpp` which, when called as `swap.Rcpp(v, i, j)` swaps entry `i` and `j` without copying the vector `v`. Demonstrate correctness, and compare timings with our fast `swap.eval`. ## Problem 4: `Rcpp` for `mvnllik` (25 pts) Write an `Rcpp` version of `logliks`, called `logliks.Rcpp` which takes the same arguments. You may do this one of two ways, both for extra credit: - as a wrapper around the `C` function `logliks`; - as a `C++` version of the R function `logliks2`. Demonstrate correctness (i.e., show that your new version gives the same answers as the old ones) and compare timings with the other methods.