## Instructions

This homework is due on Tuesday, October 31st at 12:30pm (the start of class). Please turn in all your work. The primary purpose of this homework is to hone debugging and profiling skills and explore parallelization with Monte Carlo experiments. This description may change at any time, however notices about substantial changes (requiring more/less work) will be additionally noted on the class web page. Note that there are two prongs to submission, via Canvas and Bitbucket (in asc-repo/hwk/hw4). You don’t need to use Rmarkdown but your work should be just as pretty if you want full marks.

## Problem 1: Profiling (15 pts)

Below are two, very compact, versions of functions for calculating powers of a matrix.

powers3 <- function(x, dg) outer(x, 1:dg,"^")
powers4 <- function(x, dg)
{
repx <- matrix(rep(x, dg), nrow=length(x))
return(t(apply(repx, 1, cumprod)))
}
1. First, briefly explain the thought process behind these two new methods.

2. Then provide a summary of the computation time for all four versions (two from lecture and the two above). Use the same x from lecture.

x <- runif(10000000)
1. Profile the code to explain why the two new versions disappoint relative to the original two. Cite particular subroutines which cause the slowdowns with reference to the profile summaries. Are these subroutines they creating memory or computational bottlenecks, or both?

## Problem 2: Annie & Sam (10 pts)

How would adjust the code for the Annie & Sam example(s) to accommodate other distributions? E.g., if $$S \sim \mathcal{N}(10.5, 1)$$ and $$A\sim\mathcal{N}(11, 1.5)$$?

## Problem 3: Bootstrap with boot (15 pts)

Re-write the least-squares regression bootstrap from lecture using the boot library.

• Briefly compare and contrast to the results we obtained in lecture.

## Problem 4: Bootstrapped splines (15 pts)

Design a bootstrap to assess the predictive uncertainty in our fits from slides 49–53 from stats.pdf.
Rather than specifying df = 11, use the CV option to fit the degrees of freedom. You may code the routine by hand, or within the boot library. Provide a visualization of the bootstrapped average predictive mean and central 90% quantiles.

## Problem 5: Spam MC shell script (15 pts)

Design a shell script spam_mc.sh which automates a crop of batch Monte Carlo instances for our spam “bakeoff”, which are to be run in parallel. It must take an integer augment specifying how many parallel instances to create. For example, make it so one can execute

./spam_mc.sh 16

to get 16 batches in parallel. Some warnings and specifications:

• You will need to do some research (i.e., Googling) to get help with Bash scripting here.
• The code must work appropriately for any (positive integer) argument provided, with a sensible default when no argument is provided. Additionally, it must provide a warning when the argument is greater than the number of cores on the machine. For some help on that (and as an example of Googling for help), see: https://stackoverflow.com/questions/6481005/how-to-obtain-the-number-of-cpus-cores-in-linux-from-the-command-line.
• If you choose to work in OSX on your Mac, make sure to test your implementation on your Ubuntu virtual machine.

Be careful to ensure that any temporary files used by each instance do not trample on others.

## Problem 6: Spam MC “bakeoff” in parallel (30 pts)

Re-write the spam MC “bakeoff” from lecture with sockets (i.e., via the parallel package) rather than via “batch mode”.

You must also provide (via Bitbucket):

• An R script called spam_snow.R that runs the entire “bakeoff” using four parallel instances (sockets), by default.
• A shell script called spam_snow.sh that takes an integer command-line argument specifying the number of parallel instances (sockets) to create. Alternatively, you can make spam_snow.R directly executable (with the same command-line argument). Please indicate which in your solution and/or in a README.md file on Bitbucket. Whatever you choose, make sure to have a warning if the argument provided implies more instances than cores, as in Problem 5.

## Problem $$\star$$: Spam summary

This isn’t a real problem. It is just here as a place-holder to say that for Problems 5/6 you must demonstrate, in your PDF on Canvas, that you have been able to collect a substantial number of MC repetitions in order to re-visualize the results from lecture (which are only based on 5 repetitions). I expect at least thirty repetitions, which many would regard as a minimum number in order to “trust” the resulting accuracy distributions.

• Even with many instances in parallel, this will may take tens of hours so don’t leave this until the last minute.
• If you need more computing power, please reach out to Steve to get access to our Linux servers.
• You don’t need to do this fully for both 5 & 6, just one of them. Windows users may find 6 easier than 5 because to work in Bash will require the virtual machine, which is much slower. Mac users should be fine either way.
• Watched MCs don’t work better than un-watched ones. (Like boiling water.) Work on one of the other, less computationally intensive, problems while these are running.