Department of Statistics, Virginia Tech

This homework is **due on Thursday, November 16th at 12:30pm** (the start of class). Please turn in all your work. The primary purpose of this homework is explore `C`

and `OpenMP`

in R. This description may change at any time, however notices about substantial changes (requiring more/less work) will be additionally noted on the class web page. Note that there are two prongs to submission, via Canvas and Bitbucket (in `asc-repo/hwk/hw5`

). You don’t need to use `Rmarkdown`

but your work should be just as pretty if you want full marks.

Consider a multivariate normal generating mechanism defined in terms of a so-called lengthscale parameter \(\theta\)

\[ Y \sim \mathcal{N}_n (0, \Sigma) \quad \mbox{where} \quad \Sigma_{ij} = \exp \left\{- D_{ij}/\theta \right\} \]

and the \(n \times n\) matrix \(D\) contains pairwise (Euclidean) distances between \(Y_i\) and \(Y_j\) values in a predictor, or \(X\)-space.

- Treat \(\theta\) as a positive scalar quantity.
- Such a mechanism comprises the essence of Gaussian process regression.

Our goal is to infer the unknown \(\theta\) parameter from data by evaluating the likelihood for a grid of \(\theta\) values

`thetas <- seq(0.1, 3, length=100)`

using data generated as

```
X <- runif(300, -1, 1)
D <- as.matrix(dist(matrix(X, ncol=1), diag=TRUE, upper=TRUE))
Sigma <- exp(-D) + diag(sqrt(.Machine$double.eps), nrow(D))
library(mvtnorm)
Y <- rmvnorm(10000, sigma=Sigma)
```

and report on the MLE.

- The functions in mvnllik.R provide two versions (coded in R with and without the help of external libraries) of evaluating the likelihood via the MVN density function, and it also contains a wrapper function iterating over a vector of potential \(\theta\) values.
- The functions is mvnllik.c provide analogues in C.

Your task is to

- Develop a C/R interface in order to utilize the C version from R, defining the function
`logliks`

in R, which takes the same arguments as`logliks.R`

excepting`ll`

. - Extend the C version to utilize
`OpenMP`

parallelization. There are at least two opportunities to do this, but you only need to do one. - Show that all four versions give the same likelihood curve, and maximizing value, over the \(\theta\) grid. I.e., show that they give exactly the same output, demonstrating correctness.
- Provide a timing comparison between the four versions. Be sure to note how many cores your parallel
`OpenMP`

version is using. - Now, using an R with accelerated linear algebra (Intel MKL, OSX Accelerate, or Microsoft R/MRAN), re-perform your comparison from d. and report the results. (If your timings are not
*all*much faster then you’re doing it wrong.)

Parallelize `bootreg`

with `OpenMP`

and report timings comparing your new parallel version to the R and `C`

versions provided in class.

- Your comparison should include both ordinary R and a version with accelerated linear algebra.

Here are some helpful suggestions/observations.

- I recommend using “
`#pragma omp parallel`

” rather than “`#pragma omp parallel for`

”. - Be careful to ensure that the parallelized instances (threads) are actually algorithmically independent, which may require introducing some auxiliary storage.
- R’s random number generator (RNG) is not “thread safe”, which means that if two threads ask for a random deviate at the same time, they might get the same number (i.e., not very random). How can you work around that without compromising efficiency?

`Rcpp`

for `swap`

(15 points)Revisit the swap function from problem 2 of homework 2 and write a version in `Rcpp`

which, when called as `swap.Rcpp(v, i, j)`

swaps entry `i`

and `j`

without copying the vector `v`

. Demonstrate correctness, and compare timings with our fast `swap.eval`

.

`Rcpp`

for `mvnllik`

(25 pts)Write an `Rcpp`

version of `logliks`

, called `logliks.Rcpp`

which takes the same arguments. You may do this one of two ways, both for extra credit:

- as a wrapper around the
`C`

function`logliks`

; - as a
`C++`

version of the R function`logliks2`

.

Demonstrate correctness (i.e., show that your new version gives the same answers as the old ones) and compare timings with the other methods.