Department of Statistics, Virginia Tech

This homework is due on **Tuesday, December 5th at 2pm** (the start of class). Please turn in all your work. This homework primarily covers rank-based correlation and linear regression.

**Calculations by hand**: Throughout this homework, and beyond, “by hand” means either (1) you utilize quantile/distribution tables, and/or Gaussian approximations, as appropriate, and otherwise do all of your calculations with pen and paper (and a calculator); or (2) you write code, say in R, building up all of the steps yourself, i.e., not using a library function that automates the entire procedure (see next bullet).**Using a software library**: Through this the homework, and beyond, “using a software library” means you can feed your data into a built-in function, like`t.test`

and`binom.test`

in R, and interpret the output as appropriate. Be sure to provide details on the library you used, how you used it, what the output was, and what it means.

A husband and a wife go bowling together and they kept their scores for 10 games to see if there was a relationship between their scores. The scores were recorded in order as follows.

```
bowling <- data.frame(husband=c(147, 158, 131, 142, 183, 151, 196, 129, 155, 158),
wife=c(122, 128, 125, 123, 115, 120, 108, 143, 124, 123))
```

- (8 pts) Compute Kendall’s \(\tau\) “by hand”.
- (7 pts) Test the hypothesis of independence using a two-tailed test based on a., “by hand”.
- (5 pts) Calculate a. and b. “using a software library”.

A new worker is assigned to a machine that manufactures bolts. Each day a sample of bolts is examined and the percent defective is recorded. Do the following data indicate a significant improvement over time for that worker?

`bolts <- data.frame(day=1:13, def=c(6.1, 7.5, 7.7, 5.9, 5.2, 6.1, 5.3, 4.5, 4.9, 4.6, 3.0, 4.0, 3.7))`

You should perform the following tasks “by hand” or “using a software library”, as you please.

- (5 pts) Calculate numerical summaries of your data, this includes at least finding the mean, the median and standard deviation for the
`def`

variable. - (5 pts) Construct a helpful plot to investigate relationships in your data, perhaps starting with a scatterplot. Comment on what you observe; do the data appear to be related?
- (5 pts) Perform an appropriate test of association using Spearman’s \(\rho\).
- (5 pts) Perform an appropriate test of association using Kendall’s \(\tau\).

A driver kept track of the number of miles she traveled and the number of gallons put in the tank each time she bought gasoline.

```
mpg <- data.frame(miles=c(142, 116, 194, 250, 88, 157, 225, 159, 43, 208),
gallons=c(11.1, 5.7, 14.2, 15.8, 7.5, 12.5, 17.9, 8.8, 3.4, 15.2))
```

- (5 pts) Draw a diagram (e.g., a scatterplot) showing these points, using gallons as the \(x\)-axis. Ideally this should be performed using software.
- (7 pts) Estimate \(a\), and \(b\) in a linear model, \(y=a+bx\), using the method of least squares.
- (4 pts) “by hand”
- (3 pts) “using a software library”

- (3 pts) Plot the least squares regression line on the diagram of part a., ideally using software.
- (7 pts) Suppose the EPA estimated this car’s mileage at 18 miles per gallon. Test the null hypothesis that this figure applies to this particular car and driver. Use the test for the slope.
- (4 pts) “by hand”
- (3 pts) “using software”

- (5 pts) Calculate a 95% confidence interval for the slope.
- (3 pts) Provide the estimate for the slope, obtained from the median of the slopes from part e. Solve this “by hand”.

A random sample of American colleges and universities resulted in the following numbers of students and faculty (Spring 1973).

```
univ <- data.frame(name=c("American International", "Bethany Nazarene", "Carlow", "David Lipscomb",
"Florida International", "Heidelberg", "Lake Erie", "Mary Harin Baylor", "Newburry", "St. Ambrose",
"Smith", "Texas Women's", "Wofford"),
students=c(2546, 1355, 87, 1858, 4500, 1141, 784, 1063, 753, 1189, 2755, 5602, 988),
faculty=c(129, 75, 87, 99, 300, 109, 77, 64, 61, 90, 240, 300, 73))
```

- (5 pts) Draw a diagram (e.g., a scatterplot) showing these points, using
`faculty`

as the \(x\)-axis. Ideally this should be performed using software. - (7 pts) Estimate \(a\), and \(b\) in a linear model, \(y=a+bx\), using the method of least squares.
- (4 pts) “by hand”
- (3 pts) “using a software library”

- (3 pts) Plot the least squares regression line on the diagram of part a, ideally using software.
- (7 pts) Test the hypothesis that an increase of one faculty member is accompanied by an average increase of 15 students.
- (4 pts) “by hand”

- (3 pts) “using software”

- (4 pts) “by hand”
- (5 pts) Calculate a 95% confidence interval for the slope.
- (3 pts) Provide the estimate for the slope, obtained from the median of the slopes from part e. Solve this “by hand”.