Department of Statistics, Virginia Tech

In this segment we consider **several “related” samples**, enhancing the signed ranks test for paired observations in a manner similar to how the Kruskal-Wallis rank test ehances the Mann-Whitney test. These tests are designed to detect differences in \(k\) possibly different treatments, \(k \geq 2\). The observations are arranged into \(b > 1\) blocks, which are groups of \(k\) experimental units similar to each other in some important respects.

- The \(k\) experimental units within each block are matched randomly with the \(k\) treatments being scrutinized,
- so that each treatment is administered once and only once within each block.

The experimental arrangement described here is usually called a **randomized complete block design**, in contrast to an **incomplete** block design in which the blocks do not contain enough experimental units to enable all the treatments to be applied in all blocks, so each treatment appears in some blocks but not others.

The **data** consist of \(b\) mutually independent \(k\)-variate random variables \((X_{i1}, X_{i2}, \dots, X_{ik})\), called \(b\) blocks, \(i=1, \dots, b\). The random variable \(X_{ij}\) is in block \(i\) and is associated with treatment \(j\). The \(b\) blocks are arranged as follows.

Block \ Treatment | 1 | 2 | \(\cdots\) | k |
---|---|---|---|---|

1 |
\(X_{11}\) | \(X_{12}\) | \(\cdots\) | \(X_{1c}\) |

2 |
\(X_{21}\) | \(X_{22}\) | \(\cdots\) | \(X_{2c}\) |

\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |

\(b\) |
\(X_{b1}\) | \(X_{b2}\) | \(\cdots\) | \(X_{bk}\) |

The Friedman test, which is an extension of the sign test for heterogeneity among the treatments proceeds as follows. It assumes that the \(b\) \(k\)-variate random variables are mutually independent. (I.e., that the results within one block do not influence the results within the other blocks.)

- Rank each row (block) separately and let \(R(X_{ij})\) be the rank, from 1 to \(k\), assigned to \(X_{ij}\), with average ranks in the case of ties as usual.

- Then sum the ranks for each treatment (column) to obtain \(R_j\) where:

\[ R_j = \sum_{i=1}^n R(X_{ij}), \quad \mbox{ for } \quad j=1, 2, \dots, k. \]

The **test statistic** in the case of no ties in the ranks is the following.

\[ T_1 = \frac{12}{bk(k+1)} \sum_{j=1}^k \left(R_j - \frac{b(k+1)}{2} \right)^2 \]

If ties are present an adjustment needs to be made:

\[ T_1' = \frac{(k-1)\sum_{j=1}^k \left(R_j - \frac{b(k+1)}{2} \right)^2 }{A_1 - C_1} \] where \[ A_1 = \sum_{i=1}^b \sum_{j=1}^k [R(X_{ij})]^2 \quad \mbox{ and } \quad C_1 = bk(k+1)^2/4. \]

Although this version of the statistic is most common in software packages (e.g., in `friedman.test`

in R), a “more accurate” version (involving more adjustments) is common.

\[ T_2 = \frac{(b-1) T_1'}{b(k-1) - T_1'} \]

The exact **null distribution** is calculated in a similar spirit to other methods based on ranks, however in practice approximations are the norm, even in software. Approximately,

- \(T_1' \sim \chi^2_{k-1}\), and similarly \(T_1 \sim \chi^2_{k-1}\), and
- \(T_2 \sim F_{k-1,(b-1)(k-1)}\).

As with other multi-sample setups, the **hypotheses** are only “two-tailed”.

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{Each ranking of the random variables with a block is equally likely} \\ & \;\;\;\; \mbox{(i.e., the treatments have identical effects)} \\ \mathcal{H}_1 &: \; \mbox{At least one of the treatments tends to yield larger observed values} \\ & \;\;\;\; \mbox{than at least one other treatment} \\ \end{aligned} \]

- The \(p\)-values are simple \(\chi^2\) and \(F\) distribution calculations.

\[ \varphi = \left\{ \begin{array}{c} \mathbb{P}(T_1 > t_1) \\ \mathbb{P}(T_1' > t_1') \\ \mathbb{P}(T_2 > t_2) \\ \end{array} \right. \]

When the null hypothesis is rejected, we may embark on a pairwise test of treatments in a **multiple comparisons** format in the following way.

Treatments \(i\) and \(j\) are considered different if the following inequality is satisfied \[ |R_j - R_i| > t^{1-\alpha/2}_{(b-1)(k-1)} \left[\frac{2(bA_1 - \sum R_j^2)}{(b-1)(k-1)} \right]^\frac{1}{2}, \] where \(t^{1-\alpha/2}_{(b-1)(k-1)}\) is the \(1-\alpha/2\) quantile of a Student-\(t\) distribution with \((b-1)(k-1)\) degrees of freedom.

Alternatively, the expression above can be expressed as a function of \(T_1'\) as

\[ |R_j - R_i| > t^{1-\alpha/2}_{(b-1)(k-1)} \left[\frac{(A_1 - C_1)2b}{(b-1)(k-1)} \left(1 - \frac{T_1'}{b(k-1)}\right) \right]^\frac{1}{2}, \]

If there are no ties in the ranks, then we obtain the following simplifications to the auxiliary quantities

\[ A_1 = \frac{bk(k+1)(2k+1)}{6} \quad \mbox{ and } \quad A_1 - C_1 = \frac{bk(k+1)(k-1)}{12}. \]

Consider an experiment involving the manufacture of composite tubes under varying amounts of pressure. The table below reports the percentage of tubes in four batches that did not contain any defects, as varied by the treatment variable of pressure.

Batch \ Pressure | 8500 | 8700 | 8900 | 9100 |
---|---|---|---|---|

1 |
90.3 | 92.5 | 85.5 | 82.5 |

2 |
89.2 | 89.5 | 90.8 | 89.5 |

3 |
98.2 | 90.6 | 89.6 | 85.6 |

4 |
93.9 | 94.7 | 86.2 | 87.4 |

We will test the following hypotheses.

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{The different pressures have identical affect on the percentage of non-defectives} \\ \mathcal{H}_1 &: \; \mbox{There is an affect due to pressure} \\ \end{aligned} \]

First, lets port the data to R.

```
tubes <- cbind(c(90.3, 89.2, 98.2, 93.9),
c(92.5, 89.5, 90.6, 94.7),
c(85.5, 90.8, 89.6, 86.2),
c(82.5, 89.5, 85.6, 87.4))
colnames(tubes) <- c("8500", "8700", "8900", "9100")
rownames(tubes) <- 1:4
```

Here is a quick way to quickly rank each row (block) and put it back in table form.

```
Rtab <- t(apply(tubes, 1, rank))
Rtab
```

```
## 8500 8700 8900 9100
## 1 3 4.0 2 1.0
## 2 1 2.5 4 2.5
## 3 4 3.0 2 1.0
## 4 3 4.0 1 2.0
```

- Note that there are ties in the ranks.

Then we can sum the columns of that able to get the \(R_j\)-values.

```
R <- colSums(Rtab)
R
```

```
## 8500 8700 8900 9100
## 11.0 13.5 9.0 6.5
```

Since there are ties we will have to calculate \(A_1\) and \(C_1\).

```
A1 <- sum(Rtab^2)
b <- nrow(tubes)
k <- ncol(tubes)
C1 <- b*k*(k+1)^2/4
c(A1, C1)
```

`## [1] 119.5 100.0`

Then we can calculate the adjusted version \(T_1'\).

```
T1prime <- (k-1)*sum((R - b*(k+1)/2)^2)/(A1 - C1)
T1prime
```

`## [1] 4.076923`

Using the \(\chi^2\) version of the test we obtain the following \(p\)-value.

`pchisq(T1prime, k-1, lower.tail=FALSE)`

`## [1] 0.2532767`

The alternate \(T_2\) calculation works out as follows.

```
T2 <- (b-1)*T1prime / (b*(k-1) - T1prime)
T2
```

`## [1] 1.543689`

Using the \(F\)-distribution function we obtain the following \(p\)-value.

`pf(T2, k-1, (b-1)*(k-1), lower.tail=FALSE)`

`## [1] 0.2694005`

- Either way, we fail to reject the null hypothesis and conclude that there is no difference among the pressures.

Here is what a library version says.

`friedman.test(tubes)`

```
##
## Friedman rank sum test
##
## data: tubes
## Friedman chi-squared = 4.0769, df = 3, p-value = 0.2533
```

- Identical to our \(\chi^2\) calculation above.

The **Quade test** involves the same setup as the Friedman test, but it has more power than the Friedman test when there are few (3–4) treatments. When there are (4–5) treatments the two tests are about the same. Friedman is preferred when the number of treatments are large (6+). If the Friedman test is an extension of the sign test, the Quade is more of an extension of the Wilcoxon signed ranks test.

The Quade test starts out with the same block-wise ranks \(R(X_{ij})\) as above.

- Then calculate the range (or span \(\Delta_i\)) of values in each block:

\[ \Delta_i = \max_{j}\{X_{ij}\} - \min_j\{X_{ij}\}, \quad \mbox{ for } i=1,\dots,b. \]

And then rank the \(b\) ranges and call the ranks \(Q_1, \dots, Q_b\), where \(Q_i = Q(\Delta_i)\), using average ranks in the case of ties as usual.

Finally, the block rank \(Q_i\) is multiplied by the difference between the rank within block \(i\), \(R(X_{ij})\), and the average ranks within blocks, \((k+1)/2\), to get the product \(S_{ij}\), where \[ S_{ij} = Q_i \left[R(X_{ij}) - \frac{k+1}{2} \right] \] is a “score” statistic that represents the relative size of each observation within the block, adjusted to reflect the relative significance of the block in which it appears.

The **test statistic** is \[
T_3 = \frac{(b-1) B}{A_2 - B}
\] where \(B\) is the so-called “treatment sum of squares” \[
B = \frac{1}{b} \sum_{j=1}^k S_j^2 \quad \mbox{ via } \quad S_j = \sum_{i=1}^b S_{ij}
\] and \[
A_2 = \sum_{i=1}^b \sum_{j=1}^k S_{ij}^2.
\] If there are no ties, \(A_2\) simplifies to \[
A_2 = b(b+1)(2b+1)k(k+1)(k-1)/72.
\]

- Note that \(T_3\) is the two-way ANOVA test statistic calculated on the scores \(S_{ij}\).

Again, the null distribution of \(T_3\) is possible to plow through as in ranks.R, but a convenient approximation comes via an \(F\) distribution with parameters \(k-1\) and \((b-1)(k-1)\) as in the Friedman test.

The **hypotheses** are the same as for the Friedman test, and the \(p\)-value is just like that of \(T_2\).

When the null hypothesis is rejected, **multiple comparisons** pairwise tests proceed in a manner similar to the Friedman test, but with the sum of scores instead of the sum of ranks.

Treatments \(i\) and \(j\) are considered different if the following inequality is satisfied \[ |S_j - S_i| > t^{1-\alpha/2}_{(b-1)(k-1)} \left[\frac{2b(A_2 - B)}{(b-1)(k-1)} \right]^\frac{1}{2}, \] where again \(t^{1-\alpha/2}_{(b-1)(k-1)}\) is the \(1-\alpha/2\) quantile of a Student-\(t\) distribution with \((b-1)(k-1)\) degrees of freedom.

Lets revisit the example above, and we can re-use the ranking calculations.

The first new thing we need is to calculate the ranges.

```
Delta <- apply(tubes, 1, range)
Delta <- Delta[2,] - Delta[1,]
Delta
```

```
## 1 2 3 4
## 10.0 1.6 12.6 8.5
```

Then rank the \(\Delta\)s.

```
Q <- rank(Delta)
Q
```

```
## 1 2 3 4
## 3 1 4 2
```

Next calculate the scores.

```
Stab <- Q * (Rtab - (k+1)/2)
Stab
```

```
## 8500 8700 8900 9100
## 1 1.5 4.5 -1.5 -4.5
## 2 -1.5 0.0 1.5 0.0
## 3 6.0 2.0 -2.0 -6.0
## 4 1.0 3.0 -3.0 -1.0
```

The sums of the columns of scores will be stored in `S`

.

```
S <- colSums(Stab)
S
```

```
## 8500 8700 8900 9100
## 7.0 9.5 -5.0 -11.5
```

Now we can calculate the test statistic \(T_3\) via \(B\) and \(A_2\).

```
A2 <- sum(Stab^2)
B <- sum(S^2)/b
T3 <- (b-1)*B/(A2-B)
c(A2, B, T3)
```

`## [1] 149.500000 74.125000 2.950249`

The \(p\)-value comes from the distribution of \(F\).

`pf(T3, k-1, (b-1)*(k-1), lower.tail=FALSE)`

`## [1] 0.0907929`

- A much lower \(p\)-value than we got from the Friedman test.
- But still there is not enough evidence to reject \(\mathcal{H}_0\) at the 5% level, and we must conclude that there is no difference among pressures.

Here is the outcome of the library function `quade.test`

.

`quade.test(tubes)`

```
##
## Quade test
##
## data: tubes
## Quade F = 2.9502, num df = 3, denom df = 9, p-value = 0.09079
```

- Identical.