Robert B. Gramacy Professor of Statistics

Advanced Statistical Computing

STAT 6984 his is a second course on statistical computing. Although basics will be revisited, the pace will be swift so we can get to advanced computing and data management topics as quickly as possible. The main programming language will be R, but by the end it will primarily act as the "glue" binding together other languages, databases, computing architectures and interfaces, as appropriate for the task(s) at hand. We will learn how statisticians can best leverage modern desktop computing (multiple cores), cluster computing (multiple nodes) and distributed computing (hadoop/Amazon EC2) and the coming wave of exascale computing (GPU/TPU/Xeon Phi). The goal is to make students marketable as postdocs at National Lab and similar research facilities where statisticians are expected to have the same computing skills as other applied scientists. A high bar of computing experience is required for graduating Ph.D.s to be competitive applicants for those positions, and likewise at investment banks/hedge funds, semiconductor companies, industrial engineering giants (Boeing, GE), etc. An aspect of that preparation will be “back to basics” with navigating the Unix shell, manipulating data therein, compiling libraries with make, version control (e.g., Git), and good habits/best practice with code development and data management.

Notices

  • None at this time.

Lecture materials

Homework Due at the start of lecture

Computing

The recommended language for this course is R, which can be obtained from CRAN. R Studio is an excelent multi-platform graphical interface to R which you will likely prefer to the default Windows/OSX GUI(s).

Throught the course we will encounter several other helpful tools, platforms and languages. The (incomplete) list of resources below, blending tutorials and best-practice guides, may be helful.

  • A guide to the bash shell.