Robert B. Gramacy Professor of Statistics
Applied Regression Analysis
BUS 41100 is a course about regression, a powerful and widely used data
analysis technique. Students will learn how to use regression to analyze
a variety of complex real world problems. Heavy emphasis will be placed
on analysis of actual datasets, and implementation in the
R
language for
statistical computing. Topics covered include: simple linear
regression, multiple regression, prediction, variable selection,
residual diagnostics, time series (autoregression), and classification
(logistic regression).
Notices
 The take home final is due Thursday June 9. It requires data on electricity demand, racial profiling and spam. Note that the times for Gleacher dropoff have changed to reflect their hours for finals week.
 The midterm is on May 4/7; solutions. To practice, you may attempt the Fall 2011 exam. Solutions will be discussed in class on April 27/30.
 The midterm project (data on TVs) was posted on April 27/30 and is due on the date of the midterm exam, May 4/7; solutions. To practice, you may attempt the Fall 2011 project (data on house tax). Solutions will be discussed in class on April 27/30.
 Please take note of the class remarking policy before requesting a regrade on homework, quizes or exams.
Lectures & Demos

Part 1: Introduction to Correlated Data
Demos: mainR
code (data on pickups and wages); extra stratification examples, and correlation examples  Part 2:
Simple Linear Regression
Demos: mainR
code (data on mutual funds and the stock market)  Part 3:
Inference and Estimation for SLR
Demos: mainR
code (data on mutual funds)
The demo on sampling distributions for linear models requires the two files linked here  Part 4:
Diagnostics and Transformations
Demos: mainR
code (data from Anscombe, on rents, pickups, telemarketing, imports, and Consolidated Foods, Inc.)  Part 5:
Multiple Linear Regression
Demos: mainR
code (data on pickups, and sales)  Part 6:
More Topics in MLR
Demos: mainR
code (data on 2000 census, supervisors and grades)  Part 7:
Model Choice and Data Mining
Demos: mainR
code (data on 2000 census, crime and wine)  Part 8:
An Introduction to Time Series
Demos: mainR
code (data on airline passangers, beer production, Dow Jones IA and weather)  Part 9:
Binary Data and Classification
Demos: mainR
code (data on NBA point spreads, and German credit)
Homework Due at the start of lecture
 Homework 1 for Part 1,
due 6 & 9 April 2016
Data: teacher's pay
Solutions: warmup and graded  Homework 2 for Part 2,
due 13 & 16 April 2016
Data: scatter plots, tractors, and the stock market
Solutions: warmup and graded  Homework 3 for Part 3,
due 20 & 23 April 2016
Data: newspapers, and crime
Solutions: warmup (with question 1.2) and graded  Homework 4 for Part 4,
due 27 & 30 April 2016
Data: transforms, cheese, and newspapers
Solutions: warmup and graded  Homework 5 for Part 5 & 6,
due 18 & 21 May 2016
Data: nutrition, and beef
Solutions: graded  Homework 6 for Part 7,
due 25 & 28 May 2016
Data: mortality and pollution
Solutions: graded  Homework 7 for Part 8,
due 1 & 4 June 2016
Data: gas
Solutions: graded
Computing
The recommended language for this course is R
,
which can be obtained from CRAN.
Other languages such as MATLAB
, STATA
, SAS
,
MINITAB
, etc., are allowed but are not recommended.
Examples in lecture, and help in office hours, etc., will be exclusively in R
.
Below are some helpful R
resources:
 A quick R tutorial and accompanying code file
 The University offers
R
tutoring in the Regenstein library  Some helpful video tutorials and step by step guides
 R Studio is an excelent multiplatform graphical
interface to
R
which you will likely prefer to the default Windows/OSX GUI(s).  Instructions for changing
the default working directory for
R
on Windows