Robert B. Gramacy Professor of Statistics

Intermediate Data Analytics and Machine Learning

CMDA/CS/STAT 4654 is a technical analytics course that will teach supervised and unsupervised learning strategies, including regression, generalized linear models, regularization, dimension reduction methods, tree-based methods for classification, and clustering. Upper-level analytical methods are shown in practice: e.g., advanced naïve Bayes, neural networks and Gaussian processes. It is targeted towards students who have completed (and remember the concepts from) a course in introductory statistics and mathematical modeling. We will make extensive use of calculus, linear algrbra, and probability. Computational tools, such as the R language for statistical computing, will be used for illustration in class be essential for completing homework problems.


  • The final exam (Rmd) is due Wednesday May 10, no later than 11:58pm. Data files: elec.csv, wspts.csv, and spam.csv.
  • Our second exam was on Monday April 17; solutions.
  • Our first exam was on Monday February 27; solutions. As a means of appling a cuve, the point-maximum was reduced from 100 to 85.


Homework Due at the start of lecture


The recommended language for this course is R, which can be obtained from CRAN. Other languages such as MATLAB are allowed but are not recommended. Examples in lecture, and help in office hours, etc., will be exclusively in R. Below are some helpful R resources: