November 7, 2017

Motivation

Rcpp

Rcpp is one of the most popular package on CRAN.

It is dedicated to simplify the connection between R and C++.

Useful resources:

  • Advanced R by Hadley Wickham

  • Rcpp gallery

  • Eddelbuettel, Dirk (2013) Seamless R and C++ Integration with Rcpp. Springer, New York. ISBN 978-1-4614-6867-7.

Installing Rcpp

Prerequistes:

  • Windows: Rtools

  • OSX: Xcode command line tools

  • Linux (e.g., Debian/Ubuntu): core software developement utilities

sudo apt-get install r-base-dev

Let's check that it works:

library(Rcpp)
evalCpp("1+1")
## [1] 2

R programming

  • Vectorization is primordial for performant R code (related functions are loops in C…)

  • But you need a lot of R vocabulary …

  • … and the code is somewhat harder to read

  • in addition, not always possible

Why C++

  • loops are ok!

  • many existing and optimized built-in libraries (e.g., STL)

  • recursion

  • much easier than C or Fortran (with Rcpp)

Still:

  • development time is longer, harder debugging
  • need to learn another language

Rcpp is here to make C++ code similar to R code

  • Well integrated in RStudio

  • but the resulting code is less generic

Microbenchmark

A useful package for comparing pieces of code: microbenchmark.

library(microbenchmark)

V <- runif(1e7)

microbenchmark(V*0.1, V/10)

Basics

Rcpp types

First example

Let's create a .cpp file, usually in a src folder

#include <Rcpp.h> 
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector timesTwo(NumericVector x) {
  return x * 2;
}

and source it (e.g., with sourceCpp(file = ) or from RStudio).

Can also be used directly in 'R':

cppFunction("NumericVector timesTwo(NumericVector x) {
  return x * 2;
}")

Second example: column-wise multiplication

We aim at rescaling columns. R versions:

## Don't do this
naive_mat <- function(m, v){
  for(i in 1:nrow(m)){
    for(j in 1:ncol(m)){m[i, j] <- m[i, j]/v[j]}
  }
  return(m)
}

## Better
naive_mat2 <- function(m, v){
  for(j in 1:ncol(m)){m[, j] <- m[, j]/v[j]}
  return(m)
}

Second example (cont'd)

By Googling, you can find more efficient versions:

library(microbenchmark)
m <- matrix(rnorm(1200000), ncol=600)
v <- rep(c(1.5, 3.5, 4.5, 5.5, 6.5, 7.5), length = ncol(m))


microbenchmark(# naive_mat(m,v), ## way too slow
  naive_mat2(m, v),
  t(t(m) * v), 
  m %*% diag(v),
  m * rep(v, rep.int(nrow(m),length(v))), 
  m * rep(v, rep(nrow(m),length(v))), 
  m * rep(v, each = nrow(m)))

Second example (Rcpp version)

## Create a new matrix and return it
cppFunction("
NumericMatrix naive_mat_cpp0(NumericMatrix m, NumericVector v){
  NumericMatrix m2(m.nrow(), m.ncol());
  for(int i = 0; i < m.nrow(); i++){
    for(int j = 0; j < m.ncol(); j++){
      m2(i, j) = m(i, j) * v(j);
    }
  }
  return(m2);
}"
)

microbenchmark(m * rep(v, rep.int(nrow(m),length(v))),
               naive_mat_cpp0(m, v))

Second example (alternative Rcpp version)

## Modifies m directly
cppFunction("
void naive_mat_cpp1(NumericMatrix m, NumericVector v){
  for(int i = 0; i < m.nrow(); i++){
    for(int j = 0; j < m.ncol(); j++){
      m(i, j) *= v(j); // equivalent to m(i, j) = m(i, j) * v(j)
    }
  }
}
")

print(m[1:3,1:3]); naive_mat_cpp1(m, v); print(m[1:3,1:3])

microbenchmark(m * rep(v, rep.int(nrow(m),length(v))),
               naive_mat_cpp1(m, v))

Second example (cont'd)

What exactly is slow? Accessing the elements (e.g., m[i,j] or m(i,j))

Instead, let's exploit that matrices are vectors (stored by columns):

## Modifies m directly, works with pointers
cppFunction("void naive_mat_cpp_p(NumericMatrix m, NumericVector v){
  double* ptrm = &m(0,0); // pointer to m
  double* ptrv = &v(0); // pointer to v
  for(int j = 0; j < m.ncol(); j++, ptrv++){
    for(int i = 0; i < m.nrow(); i++, ptrm++){
      *ptrm *= *ptrv;
    }
  }
}")

microbenchmark(m * rep(v, rep.int(nrow(m),length(v))),
               naive_mat_cpp_p(m, v))

Main differences

  • indices starts at 0

  • explicit return with predefined type

  • semicolons at the end of statements

  • use parenthesis instead of square brackets to access elements

  • C++ bool is true or false, R logical is (TRUE, FALSE, NA)

  • careful with types, e.g., division by integer:

evalCpp("5/3"); evalCpp("5/3.")
## [1] 1
## [1] 1.666667

Third example: STL library

The Standard Template Library contains useful data structures.

cppFunction("
  IntegerVector which_cpp(NumericVector v, double d){
    std::vector<int> res; // empty vector
    for(int i = 0; i < v.length(); i++){
      if(v[i] >= d) res.push_back(i+1);
    }
    return(wrap(res));
}")

w <- runif(1e7)

microbenchmark(which(w >= 0.99999),
               which_cpp(w, 0.99999))

Sugar

Rcpp sugar

Make C++ functions look like their R equivalent.

No need to reinvent the wheel.

Available functions include:

  • logical operators (<, <=, >, >=, ==, !=, !)

  • arithmetic operators (+, -, *, /)

  • logical summary function (any(), all())

  • stats functions (p/d/r/q)

  • other functions (abs, pow, diag, pmin, sapply))

Sugar example

Ref: http://adv-r.had.co.nz/Rcpp.html

any_naR <- function(x) any(is.na(x))

cppFunction("bool any_naC(NumericVector x) {
  return is_true(any(is_na(x)));
}")

x0 <- runif(1e5); x1 <- c(x0, NA); x2 <- c(NA, x0)

microbenchmark(
  any_naR(x0), any_naC(x0),
  any_naR(x1), any_naC(x1),
  any_naR(x2), any_naC(x2)
)

Functions

R functions can be called from C++

cppFunction("
 NumericMatrix chol_from_cpp(NumericMatrix M, Function f){
    return(f(M));
}")
xgrid <- seq(0,1, length.out = 101)

M <- exp(-(as.matrix(dist(xgrid)^2))) + diag(1e-6, 101)

chol_M <- chol_from_cpp(M, chol)

Robject is more generic to collect the output of an R function.

Miscellanous

Linear algebra

Popular Rcpp package include RcppArmadillo and RcppEigen relying on the corresponding C++ template libraries.

OpenMP

Rcpp works with OpenMP (maybe not on MacOS) by adding

  • // [[Rcpp::plugins(openmp)]]

to the .cpp file.

Modules

Rcpp Modules provide a way to expose more complex C++ classes and the related functions to R.

One example is given at the end of the Quick reference guide

Creating packages

Creating a Rcpp package from RStudio

It is well automated.

Here: R 3.4.2 and Rcpp 0.12.13, devtools 1.11.1

Also, roxygen2 makes documenting functions easier.