Intermediate Data Analytics and Machine Learning
CMDA/CS/STAT 4654 is a technical analytics course that will teach supervised and unsupervised learning strategies,
including regression, generalized linear models, regularization, dimension reduction methods, tree-based methods for
classification, and clustering. Upper-level analytical methods are shown in practice: e.g., advanced naïve Bayes,
neural networks and Gaussian processes. It is targeted towards students who have completed (and remember the concepts
from) a course in introductory statistics and mathematical modeling.
We will make extensive use of calculus, linear algrbra, and probability.
Computational tools, such as the
R language for statistical
computing, will be used for illustration in class be essential for completing homework problems.
Homework Due at the start of lecture
- Homework 0 (Rmd): prerequesites, due 25 Jan 2017
- Homework 1 (Rmd): least squares and linear models, due 8 Feb 2017
Data files: tractors
- Homework 2 (Rmd): diagnostics and transformations, due 22 Feb 2017
Data files: tractors, transforms, and cheese
- Homework 3 (Rmd): multiple linear and stepwise regression, due 15 March 2017
Data files: nutrition, beef, and pollution
- Homework 4 (Rmd): model selection (CV & bootstrap), due 31 March 2017
Data files: pollution
Supplementary file: fit_mse.R
- Homework 5 (Rmd): time series and GLMs, due 12 Apr 2017
Data files: UK Gas, seatbelts, and adult income
- Homework 6 (Rmd): EM and clustering, due 28 April 2017
The recommended language for this course is
which can be obtained from CRAN.
Other languages such as
MATLAB are allowed but are not recommended.
Examples in lecture, and help in office hours, etc., will be exclusively in
Below are some helpful