Robert B. Gramacy Professor of Statistics
Intermediate Data Analytics and Machine Learning
CMDA/CS/STAT 4654 is a technical analytics course that will teach supervised and unsupervised learning strategies,
including regression, generalized linear models, regularization, dimension reduction methods, treebased methods for
classification, and clustering. Upperlevel analytical methods are shown in practice: e.g.,
neural networks and Gaussian processes. It is targeted towards students who have completed (and remember the concepts
from) a course in introductory statistics and mathematical modeling.
We will make extensive use of calculus, linear algrbra, and probability.
Computational tools, such as the R
language for statistical
computing, will be used for illustration in class be essential for completing homework problems.
Notices
 Class is canceled on Wednesday Feb 7. Office hours on Tuesday Feb 6 are canceled. Monday and Wednesday (TA) office hours are still on. Homework 1 is still due Feb 7.
 The TA will hold office hours in the Old Security Building, Wed 12pm and Thu 11am12pm.
 Lectures will primarily be slidesbased, supplemented by board calculations and computing demonstration in
R
. For complete notes you must come to class!
Lectures

Part 1: Introduction & Overview (doc format)

Part 2: Least Squares (doc format)
Supplementary code: correlations, and conditional distributions
Data files: wages, and mutual funds
Supplemental lecture: maximum likelihood (doc format) 
Part 3: Linear Model (doc format)
Supplementary code: lmmc.R, and for MC sampling under the linear model 
Part 4: Dragnostics & Transformations (doc format)
Data files: Anscombe, rent, pickups, telemarketing, imports, and food sales
Homework Due at the start of lecture
 Homework 0: prerequesites, due 24 Jan 2018
Solutions (Rmd)  Homework 1 (Rmd): least squares and linear models, due 7 Feb 2018
Data files: tractors
Solutions (Rmd)  Homework 2 (Rmd): diagnostics and transformations, due 21 Feb 2017
Data files: tractors, transforms, and cheese
Computing
The recommended language for this course is R
,
which can be obtained from CRAN.
Other languages such as MATLAB
are allowed but are not recommended.
Examples in lecture, and help in office hours, etc., will be exclusively in R
.
Below are some helpful R
resources:
 A quick R tutorial and accompanying code file
 Some helpful video tutorials and step by step guides
 R Studio is an excelent multiplatform graphical
interface to
R
which you will likely prefer to the default Windows/OSX GUI(s).  If you must, MATLAB code supporting the book can be downloaded here.