Advanced Statistical Computing
STAT 6984 his is a second course on statistical computing. Although basics will be revisited, the pace will be
swift so we can get to advanced computing and data management topics as quickly as possible. The main programming
language will be
R, but by the end it will primarily act as
the "glue" binding together other languages, databases, computing architectures and interfaces, as appropriate for
the task(s) at hand. We will learn how statisticians can best leverage modern desktop computing (multiple cores),
cluster computing (multiple nodes) and distributed computing (hadoop/Amazon EC2) and the coming wave of exascale
computing (GPU/TPU/Xeon Phi). The goal is to make students marketable as postdocs at National Lab and similar
research facilities where statisticians are expected to have the same computing skills as other applied
scientists. A high bar of computing experience is required for graduating Ph.D.s to be competitive applicants for
those positions, and likewise at investment banks/hedge funds, semiconductor companies, industrial engineering
giants (Boeing, GE), etc. An aspect of that preparation will be “back to basics” with navigating the Unix shell,
manipulating data therein, compiling libraries with make, version control (e.g., Git), and good habits/best
practice with code development and data management.
- None at this time.
Homework Due at the start of lecture
- Homework 1: coming soon.
The recommended language for this course is
which can be obtained from CRAN.
R Studio is an excelent multi-platform graphical
R which you will likely prefer to the default
Throught the course we will encounter several other helpful tools, platforms and languages. The (incomplete) list of resources below, blending tutorials and best-practice guides, may be helful.
- A guide to the bash shell.