Instructions

This homework is due on Tuesday, Nov 7th at 2pm (the start of class). Please turn in all your work. This homework primarily covers on statistical tests based on ranks (via means).

Problem 1: Summer temperatures (15 pts)

Test the following data to see if the mean high temperature in Des Moines is higher than the mean high temperature in Spokane, for randomly sampled days in the summer.

desmoines <- c(83, 91, 94, 89, 89, 96, 91, 92, 90)
spokane <- c(78, 82, 81, 77, 79, 81, 80, 81)
  1. (10 pts) Perform the test “by hand”: show the ranking, clearly state the hypotheses, test statistic and conclusion.

We consider the following hypotheses.

\[ \begin{aligned} \mathcal{H}_0 &: \; \mu_{\mathrm{DM}} \leq \mu_{\mathrm{Sp}} \\ \mathcal{H}_1 &: \; \mu_{\mathrm{DM}} > \mu_{\mathrm{Sp}} \end{aligned} \]

Equivalently, we could frame the hypotheses in terms of the distribution functions as in the class text.

So we are preforming an upper-tailed test. First, lets extract the data sizes, treating Des Moines as \(X\) and Spokane as \(Y\).

nx <- length(desmoines)
ny <- length(spokane)

The ranks for the combined sample may be computed as follows.

r <- rank(c(desmoines, spokane))
r
##  [1]  9.0 13.5 16.0 10.5 10.5 17.0 13.5 15.0 12.0  2.0  8.0  6.0  1.0  3.0
## [15]  6.0  4.0  6.0

To get ranks from the \(X\) population we can use the seq_along function.

rx <- r[seq_along(desmoines)]
rx
## [1]  9.0 13.5 16.0 10.5 10.5 17.0 13.5 15.0 12.0

Then, \(t\) is obtained as the sum of those numbers.

t <- sum(rx) 
t
## [1] 117

Lets see how many ties there are

ties <- length(r) - length(unique(r))
ties
## [1] 4

That’s a substantial number relative to the total amount of data, so we will proceed with the CLT approximation and calculate \(T_1\). First, some auxiliary calculations.

N <- nx + ny
r2sum <- sum(r^2)
r2sum
## [1] 1782

Then finishing off the \(T_1\) calculation.

t1 <- t - nx*(N+1)/2
t1 <- t1/sqrt(nx*ny/((N*(N-1)))*r2sum - nx*ny*(N+1)^2/(4*(N-1)))
t1
## [1] 3.476908

And then the \(p\)-value calculation from a standard normal for the upper-tail test.

pnorm(t1, lower.tail=FALSE)
## [1] 0.000253616
  • Reject the null hypothesis and conclude that the mean high temperature in DesMoines is higher than the mean high temperature in Spokane.
  1. (5 pts) Perform the test “using a software library”.
wilcox.test(desmoines, spokane, alternative="greater", exact=FALSE, correct=FALSE)
## 
##  Wilcoxon rank sum test
## 
## data:  desmoines and spokane
## W = 72, p-value = 0.0002536
## alternative hypothesis: true location shift is greater than 0
  • Identical result.

Problem 2: Room temperature (15 pts)

In a controlled environment laboratory, 10 men and 10 women were tested to determine the room temperature they found to be the most comfortable. There results were:

men <- c(74, 72, 77, 76, 76, 73, 75, 73, 74, 75)
women <- c(75, 77, 78, 79, 77, 73, 78, 79, 78, 80)

Assuming that these temperatures resemble a random sample from their respective populations, is the temperature that mean and women feel comfortable at about the same?

  1. (10 pts) Perform the test “by hand”: show the ranking, clearly state the hypotheses, test statistic and conclusion.

Here we will perform a two-tailed test by entertaining the following hypotheses.

\[ \begin{aligned} \mathcal{H}_0 &: \; \mu_{\mathrm{men}} = \mu_{\mathrm{women}} \\ \mathcal{H}_1 &: \; \mu_{\mathrm{men}} \ne \mu_{\mathrm{women}} \end{aligned} \]

We will treat men as the \(X\)s and women as the \(Y\)s. First, lets extract the data sizes.

nx <- length(men)
ny <- length(women)

The ranks for the combined sample may be computed as follows.

r <- rank(c(men, women))
r
##  [1]  5.5  1.0 13.0 10.5 10.5  3.0  8.0  3.0  5.5  8.0  8.0 13.0 16.0 18.5
## [15] 13.0  3.0 16.0 18.5 16.0 20.0

Then extract the ranks for the \(X\) population.

rx <- r[seq_along(men)]

Then, \(t\) is obtained as the sum of those numbers.

t <- sum(rx) 
t
## [1] 68

Lets see how many ties there are

ties <- length(r) - length(unique(r))
ties
## [1] 11

Again substantial, so we will use the Gaussian approximation.

First, some auxiliary calculations.

N <- nx + ny
r2sum <- sum(r^2)
r2sum
## [1] 2860.5

Then finishing off the \(T_1\) calculation.

t1 <- t - nx*(N+1)/2
t1 <- t1/sqrt(nx*ny/((N*(N-1)))*r2sum - nx*ny*(N+1)^2/(4*(N-1)))
t1
## [1] -2.817132

And then the \(p\)-value calculation from a standard normal.

2*min(pnorm(t1), pnorm(t1, lower.tail=FALSE))
## [1] 0.004845463
  • Reject the null hypothesis and conclude that the mean comfortable temperature is different for men and women.
  1. (5 pts) Perform the test “using a software library”.
wilcox.test(men, women, exact=FALSE, correct=FALSE)
## 
##  Wilcoxon rank sum test
## 
## data:  men and women
## W = 13, p-value = 0.004845
## alternative hypothesis: true location shift is not equal to 0
  • Identical result.

Problem 3: Materials science (20 pts)

Fusible interlinings are being used with increasing frequency to support outer fabrics and improve the shape and drape of various pieces of clothing. The following data represents extensibility (%) at 100gm/cm for both high-quality fabric (H) a d poor-quality fabric (P) specimens.

H <- c(1.2, 0.9, 0.7, 1.0, 1.7, 1.7, 1.1, 0.9, 1.7, 1.9, 1.3, 2.1, 1.6, 1.8, 1.4, 1.3, 1.9, 1.6,
    0.8, 2.0, 1.7, 1.6, 2.3, 2.0)
P <- c(1.6, 1.5, 1.1, 2.1, 1.5, 1.3, 1.0, 2.6)

Answer the following.

  1. (5 pts) Calculate numerical summaries of your data, this includes at least finding the mean, the median and standard deviation for each type of fabric.

The mean, median and standard deviation for both fabrics may be computed as follows.

rbind(H=c(mean=mean(H), median=median(H), sd=sd(H)),
    P=c(mean=mean(P), median=median(P), sd=sd(P)))
##       mean median        sd
## H 1.508333    1.6 0.4442059
## P 1.587500    1.5 0.5303301

A quick way to get similar information is via the summary command. Rather than standard deviation we get the inter-quartile range.

rbind(H=summary(H), P=summary(P))
##   Min. 1st Qu. Median     Mean 3rd Qu. Max.
## H  0.7   1.175    1.6 1.508333   1.825  2.3
## P  1.0   1.250    1.5 1.587500   1.725  2.6
  • Based on these results it would seem unlikely that we could conclude that the two populations were different.
  1. (5 pts) Construct a helpful plot to compare your data, I suggest a boxplot. If you can create a single boxplot to summarize both H and P together, even better. Comment, do they appear to be equal?
boxplot(H, P, ylab="extensibility", xlab="fabric", main="High versus Poor quality")

  • Again, these look very similar.
  1. (10 pts) Perform the test “using a software library”. You should provide the hypotheses, and values for the test statistic and \(p\)-value.

We shall test the following two-tailed hypotheses.

\[ \begin{aligned} \mathcal{H}_0 &: \; \mu_{\mathrm{H}} = \mu_{\mathrm{P}} \\ \mathcal{H}_1 &: \; \mu_{\mathrm{H}} \ne \mu_{\mathrm{P}} \end{aligned} \]

There are ties in the ranks, so we will ultimately determine that an CLT approximation is necessary.

wilcox.test(H, P, exact=FALSE, correct=FALSE)
## 
##  Wilcoxon rank sum test
## 
## data:  H and P
## W = 96, p-value = 1
## alternative hypothesis: true location shift is not equal to 0

The library says this is an easy accept of the null hypothesis. The \(p\)-value is 1. The test statistic it provides is \(T'\), from class, but this is not the one that is directly used via the CLT. Lets check that.

nx <- length(H)
ny <- length(P)
r <- rank(c(H, P))
rx <- r[seq_along(H)]
t <- sum(rx) 
tprime <- t - nx*(nx+1)/2
tprime
## [1] 96
  • Indeed.

Problem 4: Light bulbs (20 pts)

Random samples from each of three different types of light bulbs were tested to see how long the light bulbs lasted, with the following results:

bulbs <- list(
    A=c(73, 64, 67, 62, 70), 
    B=c(84, 80, 81, 77),
    C=c(82, 79, 71, 75))

Do these results indicate a significant difference between brands? If so which brands differ?

  1. (15 pts) Perform the test “by hand”: show the ranking, clearly state the hypotheses, test statistic and conclusion.

The hypothesis involved in this first stage, following the Kruskal–Wallis testing setup, are as follows.

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{All three population distributions functions are identical.} \\ \mathcal{H}_1 &: \; \mbox{At least one of the populations tends to yield larger observations} \\ & \;\;\;\; \mbox{than at least on of the other populations.} \end{aligned} \]

First lets calculate the \(n_i\) and \(N\).

n <- sapply(bulbs, length)
N <- sum(n)

Then calculate the global ranks.

r <- rank(unlist(bulbs))
r
## A1 A2 A3 A4 A5 B1 B2 B3 B4 C1 C2 C3 C4 
##  6  2  3  1  4 13 10 11  8 12  9  5  7

Then get the squares of the ranks for each sample.

k <- length(bulbs)
g <- factor(rep.int(seq_len(k), n))
rs <- tapply(r, g, sum)
rs
##  1  2  3 
## 16 42 33

Now we can evaluate the test statistic.

s2 <- (sum(r^2) - N*(N+1)^2/4)/(N-1)
t <- (sum(rs^2/n) - N*(N+1)^2/4)/s2
t
## [1] 8.403297

And then we can calculate the \(p\)-value.

pchisq(t, k-1, lower.tail=FALSE)
## [1] 0.01497088
  • Reject the null hypothesis at the typical 5% level and conclude that not all types have equal means.

Since we have rejected \(\mathcal{H}_0\), we can embark on a pairwise comparison with two-tailed Wilcox tests. For that I am going to rely on the shortcut of the library code that we used above.

p.values <- c(AB=wilcox.test(bulbs$A, bulbs$B)$p.value,
    AC=wilcox.test(bulbs$A, bulbs$C)$p.value,
    BC=wilcox.test(bulbs$B, bulbs$C)$p.value)
p.values
##         AB         AC         BC 
## 0.01587302 0.03174603 0.34285714
  • From this we conclude that bulbs A and B differ, and A and C differ, but not B and C.
  1. (5 pts) Perform the test “using a software library”.
kruskal.test(bulbs)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  bulbs
## Kruskal-Wallis chi-squared = 8.4033, df = 2, p-value = 0.01497
  • Same result as above.
  • No need to repeat the pairwise tests.

Problem 5: Job training (10 pts)

Four job training programs were tried on 20 new employees, where 5 employees were randomly assigned to each training program. The 20 employees were then placed under the same supervisor and, at the end of certain period, the supervisor ranked the employees according to job ability, with the lowest ranks being assigned to the employees with the lowest job ability.

program.rank <- list(
    one=c(4, 6, 7, 2, 10),
    two=c(1, 8, 12, 3, 11),
    three=c(20, 19, 16, 14, 5),
    four=c(18, 15, 17, 13, 9))

Do these data indicate a difference in the effectiveness of the various training programs? Perform the test “by hand”. State the hypotheses, the test statistic and obtained \(p\)-value.

For this Kruskal–Wallis test the hypotheses are as follows.

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{All four population distributions functions are identical.} \\ \mathcal{H}_1 &: \; \mbox{At least one of the populations tends to yield larger observations} \\ & \;\;\;\; \mbox{than at least on of the other populations.} \end{aligned} \]

First lets calculate the \(n_i\) and \(N\).

n <- sapply(program.rank, length)
N <- sum(n)

Then calculate the global ranks.

r <- rank(unlist(program.rank))

Then get the squares of the ranks for each sample.

k <- length(program.rank)
g <- factor(rep.int(seq_len(k), n))
rs <- tapply(r, g, sum)
rs
##  1  2  3  4 
## 29 35 74 72

Now we can evaluate the test statistic.

s2 <- (sum(r^2) - N*(N+1)^2/4)/(N-1)
t <- (sum(rs^2/n) - N*(N+1)^2/4)/s2
t
## [1] 9.72

And then we can calculate the \(p\)-value.

pchisq(t, k-1, lower.tail=FALSE)
## [1] 0.02110251
  • Reject the null hypothesis at the usual 5% level and conclude that there is a difference in the effectiveness of the training programs.

Problem 6: Iron diets (20 pts)

The amount of iron present in the livers of white rats is measured after the animals had been fed on of five diets for a prescribed length of time. There were 10 animals randomly assigned to each of the five diets.

diet <- list(
    A=c(2.23, 1.14, 2.63, 1.00, 1.35, 2.01, 1.64, 1.13, 1.01, 1.70), 
    B=c(5.59, 0.96, 6.96, 1.23, 1.61, 2.94, 1.96, 3.68, 1.54, 2.59),
    C=c(4.50, 3.92, 10.33, 8.23, 2.07, 4.90, 6.98, 6.42, 3.72, 6.00),
    D=c(1.35, 2.06, 0.74, 0.96, 1.16, 2.08, 0.69, 0.68, 0.84, 1.34),
    E=c(1.40, 2.51, 2.49, 1.74, 1.59, 1.36, 3.00, 4.81, 5.21, 5.12))

Answer the following.

  1. (5 pts) Calculate numerical summaries of your data, this includes at least finding the mean, the median and standard deviation for each diet.

Taking advantage of the list structure, we may calculate the mean, median, and standard deviation in one fell swoop as follows.

rbind(mean=sapply(diet, mean),
    median=sapply(diet, median),
    sd=sapply(diet, sd))
##                A       B        C         D        E
## mean   1.5840000 2.90600 5.707000 1.1900000 2.923000
## median 1.4950000 2.27500 5.450000 1.0600000 2.500000
## sd     0.5615494 1.98257 2.413614 0.5255685 1.560655

Alternately, we can shove it all into summary.

lapply(diet, summary)
## $A
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.133   1.495   1.584   1.933   2.630 
## 
## $B
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.960   1.558   2.275   2.906   3.495   6.960 
## 
## $C
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.070   4.065   5.450   5.707   6.840  10.330 
## 
## $D
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.680   0.765   1.060   1.190   1.347   2.080 
## 
## $E
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.360   1.627   2.500   2.923   4.357   5.210
  • Both indicate substantial changes in mean (and median). Populations A and D are similar, whilst so are B and E, with C in a league of its own.
  1. (5 pts) Construct a helpful plot to compare your data, I suggest a boxplot. If you can create a single boxplot to summarize all diets together, even better. Comment on this plot, which ones appear to be equal?
boxplot(diet, ylab="iron", xlab="diet", main="Iron Diets")

  • This confirms what we observed above: Populations A and D are similar, whilst so are B and E, with C in a league of its own.
  1. (10 pts) Perform the test “using a software library”. You should provide the hypotheses, and values for the test statistic and \(p\)-value. Conduct Pairwise comparisons if appropriate, note you can use the Wilcoxon test to do so.

We start with a Kruskal–Wallis test with the following hypotheses.

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{All five population distributions functions are identical.} \\ \mathcal{H}_1 &: \; \mbox{At least one of the populations tends to yield larger observations} \\ & \;\;\;\; \mbox{than at least on of the other populations.} \end{aligned} \]

kruskal.test(diet)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  diet
## Kruskal-Wallis chi-squared = 26.672, df = 4, p-value = 2.315e-05
  • An easy reject of the null: there are differences between (some of) the populations.

Consider the following pairwise Wilcox tests, with CLT approximations due to ties where appropriate.

p.values <- c(AB=wilcox.test(diet$A, diet$B)$p.value,
    AC=wilcox.test(diet$A, diet$C)$p.value,
    AD=wilcox.test(diet$A, diet$D, exact=FALSE, correct=FALSE)$p.value,
    AE=wilcox.test(diet$A, diet$E)$p.value,
    BC=wilcox.test(diet$B, diet$C)$p.value,
    BD=wilcox.test(diet$B, diet$D, exact=FALSE, correct=FALSE)$p.value,
    BE=wilcox.test(diet$B, diet$E)$p.value,
    CD=wilcox.test(diet$C, diet$D)$p.value,
    CE=wilcox.test(diet$C, diet$E)$p.value,
    DE=wilcox.test(diet$D, diet$E)$p.value)
p.values[p.values >= 0.05]
##        AB        AD        BE 
## 0.1431401 0.1039797 0.9117972
  • The conclusion is that A is similar to B and D, but those latter two are not similar to one another. B and E are similar to each other. All other pairs involve rejections at the 5% level.
  • However, notice that we have performed ten tests — we can expect both type I and type II errors to be substantial in this multiple comparison context.