Department of Statistics, Virginia Tech

This homework is due on **Tuesday, Nov 7th at 2pm** (the start of class). Please turn in all your work. This homework primarily covers on statistical tests based on ranks (via means).

**Calculations by hand**: Throughout this homework, and beyond, “by hand” means either (1) you utilize quantile/distribution tables, and/or Gaussian approximations, as appropriate, and otherwise do all of your calculations with pen and paper (and a calculator); or (2) you write code, say in R, building up all of the steps yourself, i.e., not using a library function that automates the entire procedure (see next bullet).**Using a software library**: Through this the homework, and beyond, “using a software library” means you can feed your data into a built-in function, like`t.test`

and`binom.test`

in R, and interpret the output as appropriate. Be sure to provide details on the library you used, how you used it, what the output was, and what it means.

Test the following data to see if the mean high temperature in Des Moines is higher than the mean high temperature in Spokane, for randomly sampled days in the summer.

```
desmoines <- c(83, 91, 94, 89, 89, 96, 91, 92, 90)
spokane <- c(78, 82, 81, 77, 79, 81, 80, 81)
```

- (10 pts) Perform the test “by hand”: show the ranking, clearly state the hypotheses, test statistic and conclusion.
- (5 pts) Perform the test “using a software library”.

In a controlled environment laboratory, 10 men and 10 women were tested to determine the room temperature they found to be the most comfortable. There results were:

```
men <- c(74, 72, 77, 76, 76, 73, 75, 73, 74, 75)
women <- c(75, 77, 78, 79, 77, 73, 78, 79, 78, 80)
```

Assuming that these temperatures resemble a random sample from their respective populations, is the temperature that men and women feel comfortable at about the same?

- (10 pts) Perform the test “by hand”: show the ranking, clearly state the hypotheses, test statistic and conclusion.
- (5 pts) Perform the test “using a software library”.

Fusible interlinings are being used with increasing frequency to support outer fabrics and improve the shape and drape of various pieces of clothing. The following data represents extensibility (%) at 100gm/cm for both high-quality fabric (H) a d poor-quality fabric (P) specimens.

```
H <- c(1.2, 0.9, 0.7, 1.0, 1.7, 1.7, 1.1, 0.9, 1.7, 1.9, 1.3, 2.1, 1.6, 1.8, 1.4, 1.3, 1.9, 1.6,
0.8, 2.0, 1.7, 1.6, 2.3, 2.0)
P <- c(1.6, 1.5, 1.1, 2.1, 1.5, 1.3, 1.0, 2.6)
```

Answer the following.

- (5 pts) Calculate numerical summaries of your data, this includes at least finding the mean, the median and standard deviation for each type of fabric.
- (5 pts) Construct a helpful plot to compare your data, I suggest a boxplot.
*(Need help, ask Google!)*If you can create a single boxplot to summarize both H and P together, even better. Comment, do they appear to be equal? - (10 pts) Perform the test “using a software library”. You should provide the hypotheses, and values for the test statistic and \(p\)-value.

Random samples from each of three different types of light bulbs were tested to see how long the light bulbs lasted, with the following results:

```
bulbs <- list(
A=c(73, 64, 67, 62, 70),
B=c(84, 80, 81, 77),
C=c(82, 79, 71, 75))
```

Do these results indicate a significant difference between brands? If so which brands differ?

- (15 pts) Perform the test “by hand”: show the ranking, clearly state the hypotheses, test statistic and conclusion.
- (5 pts) Perform the test “using a software library”.

Four job training programs were tried on 20 new employees, where 5 employees were randomly assigned to each training program. The 20 employees were then placed under the same supervisor and, at the end of certain period, the supervisor ranked the employees according to job ability, with the lowest ranks being assigned to the employees with the lowest job ability.

```
program.rank <- list(
one=c(4, 6, 7, 2, 10),
two=c(1, 8, 12, 3, 11),
three=c(20, 19, 16, 14, 5),
four=c(18, 15, 17, 13, 9))
```

Do these data indicate a difference in the effectiveness of the various training programs? Perform the test “by hand”. State the hypotheses, the test statistic and obtained \(p\)-value.

The amount of iron present in the livers of white rats is measured after the animals had been fed on of five diets for a prescribed length of time. There were 10 animals randomly assigned to each of the five diets.

```
diet <- list(
A=c(2.23, 1.14, 2.63, 1.00, 1.35, 2.01, 1.64, 1.13, 1.01, 1.70),
B=c(5.59, 0.96, 6.96, 1.23, 1.61, 2.94, 1.96, 3.68, 1.54, 2.59),
C=c(4.50, 3.92, 10.33, 8.23, 2.07, 4.90, 6.98, 6.42, 3.72, 6.00),
D=c(1.35, 2.06, 0.74, 0.96, 1.16, 2.08, 0.69, 0.68, 0.84, 1.34),
E=c(1.40, 2.51, 2.49, 1.74, 1.59, 1.36, 3.00, 4.81, 5.21, 5.12))
```

Answer the following.

- (5 pts) Calculate numerical summaries of your data, this includes at least finding the mean, the median and standard deviation for each diet.
- (5 pts) Construct a helpful plot to compare your data, I suggest a boxplot.
*(Need help, ask Google!)*If you can create a single boxplot to summarize all diets together, even better. Comment on this plot, which ones appear to be equal? - (10 pts) Perform the test “using a software library”. You should provide the hypotheses, and values for the test statistic and \(p\)-value. Conduct Pairwise comparisons if appropriate, note you can use the Wilcoxon test to do so.