Instructions

This homework is due on Tuesday, October 31st at 12:30pm (the start of class). Please turn in all your work. The primary purpose of this homework is to hone debugging and profiling skills and explore parallelization with Monte Carlo experiments. This description may change at any time, however notices about substantial changes (requiring more/less work) will be additionally noted on the class web page. Note that there are two prongs to submission, via Canvas and Bitbucket (in asc-repo/hwk/hw4). You don’t need to use Rmarkdown but your work should be just as pretty if you want full marks.

Problem 1: Profiling (15 pts)

Below are two, very compact, versions of functions for calculating powers of a matrix.

powers3 <- function(x, dg) outer(x, 1:dg,"^")
powers4 <- function(x, dg) 
 {
   repx <- matrix(rep(x, dg), nrow=length(x))
   return(t(apply(repx, 1, cumprod)))
 }
  1. First, briefly explain the thought process behind these two new methods.

  2. Then provide a summary of the computation time for all four versions (two from lecture and the two above). Use the same x from lecture.

x <- runif(10000000)
  1. Profile the code to explain why the two new versions disappoint relative to the original two. Cite particular subroutines which cause the slowdowns with reference to the profile summaries. Are these subroutines they creating memory or computational bottlenecks, or both?

Problem 2: Annie & Sam (10 pts)

How would adjust the code for the Annie & Sam example(s) to accommodate other distributions? E.g., if \(S \sim \mathcal{N}(10.5, 1)\) and \(A\sim\mathcal{N}(11, 1.5)\)?

Problem 3: Bootstrap with boot (15 pts)

Re-write the least-squares regression bootstrap from lecture using the boot library.

Problem 4: Bootstrapped splines (15 pts)

Design a bootstrap to assess the predictive uncertainty in our fits from slides 49–53 from stats.pdf.
Rather than specifying df = 11, use the CV option to fit the degrees of freedom. You may code the routine by hand, or within the boot library. Provide a visualization of the bootstrapped average predictive mean and central 90% quantiles.

Problem 5: Spam MC shell script (15 pts)

Design a shell script spam_mc.sh which automates a crop of batch Monte Carlo instances for our spam “bakeoff”, which are to be run in parallel. It must take an integer augment specifying how many parallel instances to create. For example, make it so one can execute

./spam_mc.sh 16

to get 16 batches in parallel. Some warnings and specifications:

Be careful to ensure that any temporary files used by each instance do not trample on others.

Problem 6: Spam MC “bakeoff” in parallel (30 pts)

Re-write the spam MC “bakeoff” from lecture with sockets (i.e., via the parallel package) rather than via “batch mode”.

You must also provide (via Bitbucket):

Problem \(\star\): Spam summary

This isn’t a real problem. It is just here as a place-holder to say that for Problems 5/6 you must demonstrate, in your PDF on Canvas, that you have been able to collect a substantial number of MC repetitions in order to re-visualize the results from lecture (which are only based on 5 repetitions). I expect at least thirty repetitions, which many would regard as a minimum number in order to “trust” the resulting accuracy distributions.