--- title: "Homework 6" subtitle: "Advanced Statistical Computing (STAT 6984)" author: "Robert B. Gramacy ( : )
Department of Statistics, Virginia Tech" output: html_document --- ## Instructions This homework is **due on Tuesday, December 5th at 12:30pm** (the start of class). Please turn in all your work. The primary purpose of this homework is get familiar with ARC. This description may change at any time, however notices about substantial changes (requiring more/less work) will be additionally noted on the class web page. Note that there are two prongs to submission, via Canvas and Bitbucket (in `asc-repo/hwk/hw6`). You don't need to use `Rmarkdown` but your work should be just as pretty if you want full marks. ## Problem 1: Duplicating the examples (33 pts) Get the MCPI and MH (snow) examples from the [ARC R User Guide](https://secure.hosting.vt.edu/www.arc.vt.edu/userguide/r/#resources) to work on `dragonstooth`. You may need to modify the `qsub` scripts . Provide evidence that the code worked, which may include a summary of output, `jobload` summary, the output of `gstatement -h -a ascclass`, etc. ## Problem 2: Predicting satellite drag (33 pts) Run the satellite drag bakeoff, provided on the class web page, on `dragonstooth`. - Allocate five nodes (so that five-fold CV is used) and you may need to reserve up to 12 hours. - Provide evidence that the code is fully utilizing the resources you have allocated via `jobload`, and when it is done show the allocation expenditure via `gstatement -h -a ascclass`. - Provide a boxplot of the RMSPEs that come out. ## Problem 3: Spam Bakeoff (34 pts) Revisit the spam bakeoff from [homework 4](hwk4.html) a. first with GNU `parallel` on `cascades`; you may wish to consult [`spam_mc.qsub`](spam_mc.qsub), however you will need to make some modifications in order to get GNU `parallel` to distribute runs to multiple nodes; b. then with `parallel`/`Rmpi` on `dragonstooth`; you may wish to consult [`spam_snow.R`](spam_snow.R) In both cases you must set things up so that all cores of five nodes are in fully utilized simultaneously. - Perform at least thirty reps of 10-fold CV. - Note that on `cascades` there are 32 cores, but only 24 on `dragonstooth`. - Also note that nothing is OMP-parallelized here (although MKL is used), so you don't a special `mpirun` setup here as we did for satellite drag. (Using those arguments will slow things down.) - Provide evidence that the code is fully utilizing the resources you have allocated via `jobload`, and when it is done show the allocation expenditure via `gstatement -h -a ascclass`. - Provide a boxplot of the hit rates that come out, side-by-side for parts a. and b.