Robert B. Gramacy Professor of Statistics

Advanced Statistical Computing

STAT 6984 his is a second course on statistical computing. Although basics will be revisited, the pace will be swift so we can get to advanced computing and data management topics as quickly as possible. The main programming language will be R, but by the end it will primarily act as the "glue" binding together other languages, databases, computing architectures and interfaces, as appropriate for the task(s) at hand. We will learn how statisticians can best leverage modern desktop computing (multiple cores), cluster computing (multiple nodes) and distributed computing (hadoop/Amazon EC2) and the coming wave of exascale computing (GPU/TPU/Xeon Phi). The goal is to make students marketable as postdocs at National Lab and similar research facilities where statisticians are expected to have the same computing skills as other applied scientists. A high bar of computing experience is required for graduating Ph.D.s to be competitive applicants for those positions, and likewise at investment banks/hedge funds, semiconductor companies, industrial engineering giants (Boeing, GE), etc. An aspect of that preparation will be “back to basics” with navigating the Unix shell, manipulating data therein, compiling libraries with make, version control (e.g., Git), and good habits/best practice with code development and data management.

Notices

  • Homework 2 due date changed to 26 Sept.

Lecture materials

Homework Due at the start of lecture

  • Homework 2 (Rmd): R functions, OOP and scripts, due 26 Sept 2017

Computing

Tim Warburton teaches a similar class to CMDA undergraduates, and this slide offers a nice snap-shot of the toolchain computational modelers (and statisticians) need to be effective researchers and collaborators. If you find it helpful, think of our class as catching you up with what undergraduates in other quantitative fields know about scientific computing, with a slight emphasis on statistics and data analytics.

The "home base" language for this course is R, which can be obtained from CRAN. R Studio is an excelent multi-platform graphical interface to R which you will likely prefer to the default Windows/OSX GUI(s).

Throught the course we will encounter several other helpful tools, platforms and languages. The (incomplete) list of resources below, blending tutorials and best-practice guides, may be helful.