Department of Statistics, Virginia Tech

The data consists of \(n'\) **matched pairs**, \((X_1, Y_1), (X_2, Y_2), \dots, (X_{n'}, Y_{n'})\), similar to the correlation or regression context, but this time the goal is to check if \(\mathbb{E}\{X\} = \mathbb{E}\{Y\}\). This is examined through the differences between the \(X_i\) and \(Y_i\), checking those against zero.

Toward that end, let \(D_i = Y_i - X_i\) for \(i=1,\dots, n'\).

- Discard all differences where \(D_i = 0\), i.e. where \(X_i = Y_i\), and let \(n \leq n'\) be the number of differences remaining.
- Assign ranks \(R(|D_i|)\) from 1 to \(n\) to the collection of absolute differences \(|D_i|\), using average ranks in the case of ties in the usual way.

The **Wilcoxon signed ranks test** presumes that the distribution of the \(D_i\) is symmetric, which means (among other things) that the mean of the \(D_i\) is the same as the median, and that all of the \(D_i\)s have the same mean. As usual, we assume mutual independence.

The **test statistic** is defined in terms of so-called *signed ranks*. The signed rank \(R_i\), for each pair \((X_i, Y_i)\), where \(X_i \ne Y_i\), is defined as follows

\[ R_i = \left\{ \begin{array}{rl} R(|D_i|) & \mbox{if } D_i > 0 \\ -R(|D_i|) & \mbox{otherwise.} \end{array} \right. \]

The test statistic, \(T^+\) is the sum of the positive signed ranks:

\[ T^+ = \sum_{i=1}^n R_i \mathbb{I}_{\{D_i > 0\}} \]

The **null distribution** is calculated in a way similar to the Mann-Whitney/Wilcox rank sum test or the squared ranks test: basically count how many (random) positive ranks in a partition of positive and negative ranks sum to \(T^+\), divided by the total number of such partitions, extending the work we did in ranks.R.

- For CDF calculations use the
`psignrank`

function in R.

If there are many ties, or if the table in A12 is failing us because \(n > 50\), a normal approximation based on the CLT can be used. The test statistic uses the sum of all of the signed ranks, with their \(+\) or \(-\) signs.

\[ Z = \frac{\sum_{i=1}^n R_i}{\sqrt{\sum_{i=1}^n R_i^2}}. \]

- If there are no ties, this simplifies to

\[ Z = \frac{\sum_{i=1}^n R_i}{\sqrt{n(n+1)(2n+1)/6}}. \]

The hypotheses follow our usual three cases. A **two-tailed test** is specified as follows.

\[ \begin{aligned} \mathcal{H}_0 &: \; \mathbb{E}\{D\} = 0 \quad \mbox{ (i.e., $\mathbb{E}\{Y_i\} = \mathbb{E}\{X_i\}$)} \\ \mathcal{H}_1 &: \; \mathbb{E}\{D\} \ne 0 \end{aligned} \]

- If the pairs \((X_i, Y_i)\) have the same bivariate distribution, then \(\mathcal{H}_1\) can be written as \(\mathbb{E}\{X_i\} \ne \mathbb{E}\{Y_i\}\).

We calculate the \(p\)-value in the same manner as the Mann-Whitney/Wilcox rank sum test. When using \(T^+\), via the observed value \(t^+\) when there are few ties, the \(p\)-value may be calculated as

\[ \varphi = 2 \left\{ \begin{array}{ll} \mathbb{P}(T^+ \geq t^+-1) & \mbox{ if } t^+ > \frac{n(n+1)}{4}, \mbox{ i.e., above the null mean} \\ \mathbb{P}(T^+ \leq t^+) & \mbox{otherwise} \end{array} \right.. \]

In the case of the normal approximation, the two-tailed \(p\)-value calculation usually deploys a continuity correction.

\[ \varphi \approx 2 \min \left\{ \mathbb{P}\left(Z \leq \frac{\sum_{i=1}^n R_i+1}{\sqrt{\sum_{i=1}^n R_i^2}} \right), \mathbb{P}\left(Z \geq \frac{\sum_{i=1}^n R_i-1}{\sqrt{\sum_{i=1}^n R_i^2}} \right) \right\}. \]

A **lower-tailed** test involves

\[ \begin{aligned} \mathcal{H}_0 &: \; \mathbb{E}\{D\} \geq 0 \quad \mbox{ (i.e., $\mathbb{E}\{Y_i\} \geq \mathbb{E}\{X_i\}$)} \\ \mathcal{H}_1 &: \; \mathbb{E}\{D\} < 0 \end{aligned} \]

- If the pairs \((X_i, Y_i)\) have the same bivariate distribution, then \(\mathcal{H}_1\) can be written as \(\mathbb{E}\{X_i\} < \mathbb{E}\{Y_i\}\).

The \(p\)-value may be calculated as \[ \varphi = \mathbb{P}(T^+ \leq t^+) \quad \mbox{ or } \quad \varphi \approx \mathbb{P}\left(Z \leq \frac{\sum_{i=1}^n R_i+1}{\sqrt{\sum_{i=1}^n R_i^2}} \right) \] depending on whether \(T^+\) or the CLT \(Z\) statistic is being used.

An **upper-tailed** test involves

\[ \begin{aligned} \mathcal{H}_0 &: \; \mathbb{E}\{D\} \leq 0 \quad \mbox{ (i.e., $\mathbb{E}\{Y_i\} \leq \mathbb{E}\{X_i\}$)} \\ \mathcal{H}_1 &: \; \mathbb{E}\{D\} > 0 \end{aligned} \]

- If the pairs \((X_i, Y_i)\) have the same bivariate distribution, then \(\mathcal{H}_1\) can be written as \(\mathbb{E}\{X_i\} > \mathbb{E}\{Y_i\}\).

The \(p\)-value may be calculated as \[ \varphi = \mathbb{P}(T^+ \geq t^+ -1) \quad \mbox{ or } \quad \varphi \approx \mathbb{P}\left(Z \geq \frac{\sum_{i=1}^n R_i-1}{\sqrt{\sum_{i=1}^n R_i^2}} \right) \] depending on whether \(T^+\) or the CLT \(Z\) statistic is being used.

Six deer were given a drug to improve their blood circulation. The circulation was measured at the time the drug was administered, and then again thirty minutes later. Is there any difference between the circulation before and after the treatment?

```
circ <- data.frame(before=c(2.76, 5.18, 2.68, 7.30, 4.10, 7.05),
after=c(7.02, 3.10, 5.44, 4.85, 5.21, 10.26))
```

We shall treat the `before`

column as \(X\) and `after`

as \(Y\). The hypotheses are:

\[ \begin{aligned} \mathcal{H}_0 &: \; \mathbb{E}\{D\} = 0 \quad \mbox{ (i.e., $\mu_\mathrm{before} = \mu_\mathrm{after}$)} \\ \mathcal{H}_1 &: \; \mathbb{E}\{D\} \ne 0. \end{aligned} \]

The first step is to calculate the differences.

```
d <- circ$after - circ$before
d
```

`## [1] 4.26 -2.08 2.76 -2.45 1.11 3.21`

There are no zero differences, so nothing to remove: \(n = n'\).

```
n <- length(d)
n
```

`## [1] 6`

Then rank the absolute differences.

```
r <- rank(abs(d))
r
```

`## [1] 6 2 4 3 1 5`

And then calculate the signed ranks.

```
sr <- r * sign(d)
sr
```

`## [1] 6 -2 4 -3 1 5`

There are no ties in the ranks, so we shall calculate \(t^+\), the observed value of \(T^+\).

```
tplus <- sum(sr[sr > 0])
tplus
```

`## [1] 16`

The observed value of the test statistic is above the null mean

`n*(n+1)/4`

`## [1] 10.5`

so the \(p\)-value of this two-tailed test may be calculated as follows.

```
phi <- 2*psignrank(tplus-1, n, lower.tail=FALSE)
phi
```

`## [1] 0.3125`

- We fail to reject the null hypothesis at the 5% level and conclude that there is not enough evidence to say there is a difference in blood circulation after the treatment.

We can use the `wilcox.test`

library function with `piared=TRUE`

as follows.

`wilcox.test(circ$before, circ$after, paired=TRUE)`

```
##
## Wilcoxon signed rank test
##
## data: circ$before and circ$after
## V = 5, p-value = 0.3125
## alternative hypothesis: true location shift is not equal to 0
```

- Identical result.

A study measured the time it takes volunteers to perform a certain task before and after a rigorous exercise regimen; i.e., pre- and post-fatigue. The times in both scenarios are listed below. Is there enough evidence to conclude that the time post-fatigue is higher than the time pre-fatigue.

```
time <- data.frame(pre=c(158, 92, 65, 98, 33, 89, 148),
post=c(91, 148, 215, 226, 223, 91, 92))
```

We shall treat the `pre`

column as \(X\) and `post`

as \(Y\). The hypotheses are:

\[ \begin{aligned} \mathcal{H}_0 &: \; \mathbb{E}\{D\} \leq 0 \quad \mbox{ (i.e., $\mu_\mathrm{post} \leq \mu_\mathrm{pre}$)} \\ \mathcal{H}_1 &: \; \mathbb{E}\{D\} > 0. \end{aligned} \]

```
d <- time$post - time$pre
d
```

`## [1] -67 56 150 128 190 2 -56`

There are no zero differences, so nothing to remove: \(n = n'\).

```
n <- length(d)
n
```

`## [1] 7`

Then rank the absolute differences.

```
r <- rank(abs(d))
r
```

`## [1] 4.0 2.5 6.0 5.0 7.0 1.0 2.5`

- Observe that there are ties in the ranks.

The signed ranks are calculated as follows.

```
sr <- r * sign(d)
sr
```

`## [1] -4.0 2.5 6.0 5.0 7.0 1.0 -2.5`

Since there are ties, we will use the CLT approximation, with the continuity correction. The statistic is …

```
z.cc <- (sum(sr)-1)/sqrt(sum(sr^2))
z.cc
```

`## [1] 1.185335`

… and the \(p\)-value of this upper-tailed test is:

`pnorm(z.cc, lower.tail=FALSE)`

`## [1] 0.1179426`

- Fail to reject \(\mathcal{H}_0\) at the 5% level and conclude that there is not enough evidence to say that the post-fatigue time is higher on average.

We can invoke the library routine as follows. However, notice that the library routine reverses the roles of \(X\) and \(Y\) compared to the way it is presented above.

`wilcox.test(time$post, time$pre, paired=TRUE, alternative="greater", exact=FALSE)`

```
##
## Wilcoxon signed rank test with continuity correction
##
## data: time$post and time$pre
## V = 21.5, p-value = 0.1179
## alternative hypothesis: true location shift is greater than 0
```

- Perfect.

Lets revisit the nucleus study from the first week of class. Recall that the claim was that SAP’s ROE was less than the industry at large. We can test that claim with the signed ranks test.

\[ \begin{aligned} \mathcal{H}_0 &: \; \mathbb{E}\{D\} \geq 0 \quad \mbox{ (i.e., $\mu_\mathrm{SAP} \geq \mu_\mathrm{Ind}$)} \\ \mathcal{H}_1 &: \; \mathbb{E}\{D\} < 0. \end{aligned} \]

Lets read in the data and extract the relevant columns, treating SAP ROE as \(Y\) and Industry ROE as \(X\).

```
nucleus <- read.csv("../data/nucleus.csv")
y <- nucleus$ROE
x <- nucleus$IndustryROE
```

First, lets calculate the differences.

```
d <- y - x
d
```

```
## [1] 12.4 -3.7 -18.3 39.2 -4.4 4.2 56.3 1.7 -7.3 -17.3 -26.3
## [12] 9.3 1.3 -0.4 13.6 1.4 -49.9 17.7 69.3 -2.1 -8.4 -9.7
## [23] 13.2 -3.7 -2.4 -67.0 13.6 -13.8 18.1 -4.0 -0.1 -4.6 4.0
## [34] 2.7 -25.2 -4.2 -4.9 3.9 -7.8 0.5 -1.7 -1.1 2.2 -7.6
## [45] 8.9 -95.5 -1.3 0.0 5.6 10.9 -27.2 0.4 -24.5 -75.9 -1.5
## [56] 15.4 -17.7 -2.8 7.5 4.9 15.4 -5.3 -21.9 -26.8 -50.4 -5.9
## [67] -5.7 -0.3 -3.6 -1.1 -1.1 -9.9 5.1 -8.9 31.0 8.1 2.1
## [78] -9.1 32.2 11.8 0.4
```

There are zero differences, so lets remove them.

```
d <- d[d != 0]
n <- length(d)
n
```

`## [1] 80`

Then rank the absolute differences.

```
r <- rank(abs(d))
r
```

```
## [1] 53.0 23.0 64.0 73.0 30.0 28.0 76.0 14.0 39.0 60.0 68.0 48.0 10.0 4.5
## [15] 55.5 12.0 74.0 61.5 78.0 16.0 44.0 49.0 54.0 24.0 19.0 77.0 55.5 57.0
## [29] 63.0 26.5 1.0 31.0 26.5 20.0 67.0 29.0 32.0 25.0 42.0 6.0 15.0 8.0
## [43] 18.0 41.0 46.0 80.0 11.0 36.0 51.0 70.0 3.0 66.0 79.0 13.0 58.0 61.5
## [57] 21.0 40.0 33.0 59.0 35.0 65.0 69.0 75.0 38.0 37.0 2.0 22.0 8.0 8.0
## [71] 50.0 34.0 45.0 71.0 43.0 17.0 47.0 72.0 52.0 4.5
```

- Observe that there are ties in the ranks.

The signed ranks are calculated as follows.

```
sr <- r * sign(d)
sr
```

```
## [1] 53.0 -23.0 -64.0 73.0 -30.0 28.0 76.0 14.0 -39.0 -60.0 -68.0
## [12] 48.0 10.0 -4.5 55.5 12.0 -74.0 61.5 78.0 -16.0 -44.0 -49.0
## [23] 54.0 -24.0 -19.0 -77.0 55.5 -57.0 63.0 -26.5 -1.0 -31.0 26.5
## [34] 20.0 -67.0 -29.0 -32.0 25.0 -42.0 6.0 -15.0 -8.0 18.0 -41.0
## [45] 46.0 -80.0 -11.0 36.0 51.0 -70.0 3.0 -66.0 -79.0 -13.0 58.0
## [56] -61.5 -21.0 40.0 33.0 59.0 -35.0 -65.0 -69.0 -75.0 -38.0 -37.0
## [67] -2.0 -22.0 -8.0 -8.0 -50.0 34.0 -45.0 71.0 43.0 17.0 -47.0
## [78] 72.0 52.0 4.5
```

Since there are ties, we will use the CLT approximation, with the continuity correction. The statistic is …

```
z.cc <- (sum(sr)+1)/sqrt(sum(sr^2))
z.cc
```

`## [1] -1.069584`

… and the \(p\)-value of this lower-tailed test is:

`pnorm(z.cc)`

`## [1] 0.1424032`

- Not enough evidence to reject \(\mathcal{H}_0\). We conclude that SAP ROE could well be larger than Industry ROE.

What does the library routine say?

`wilcox.test(y, x, paired=TRUE, alternative="less", exact=FALSE)`

```
##
## Wilcoxon signed rank test with continuity correction
##
## data: y and x
## V = 1396.5, p-value = 0.1424
## alternative hypothesis: true location shift is less than 0
```

- Identical.

The Wilcoxon signed ranks test is equally appropriate a median test, where the data consist of a single random sample of size \(n'\), \(Y_1, Y_2, \dots, Y_{n'}\), simply by pairing the \(Y_i\) with an \(X_i = m\) for some median value \(m\) specified in the null hypothesis.

- Here, “median” can be replaced by “mean” since we’re assuming symmetry (the two are the same).

A **two-tailed test** involves

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{The median of $Y$ equals m} \\ \mathcal{H}_1 &: \; \mbox{The median of $Y$ is not m}. \end{aligned} \]

A **lower-tailed test** is described by

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{The median of $Y$ is $\geq$ m} \\ \mathcal{H}_1 &: \; \mbox{The median of $Y$ is $<$ m}. \end{aligned} \]

Finally, an **upper-tailed test** is described by

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{The median of $Y$ is $\leq$ m} \\ \mathcal{H}_1 &: \; \mbox{The median of $Y$ is $>$ m}. \end{aligned} \]

As mentioned above, simply construct pairs \((m, Y_1), (m, Y_2), \dots, (m, Y_{n'})\) and carry on with the analogous signed ranks test described above.

Consider the following measurements of radon, a gas hazardous to human health. Does this data suggest that the population median of radon readings differs from 100.

`radon <- c(105.6, 90.9, 91.2, 96.9, 100.1, 105.0, 99.6, 107.7)`

The hypotheses being entertained are

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{The median of $Y$ equals 100} \\ \mathcal{H}_1 &: \; \mbox{The median of $Y$ is not 100}. \end{aligned} \]

First calculate the differences.

```
d <- 100 - radon
d
```

`## [1] -5.6 9.1 8.8 3.1 -0.1 -5.0 0.4 -7.7`

There are no zero differences, so nothing to remove: \(n = n'\).

```
n <- length(d)
n
```

`## [1] 8`

Then rank the absolute differences.

```
r <- rank(abs(d))
r
```

`## [1] 5 8 7 3 1 4 2 6`

And then calculate the signed ranks.

```
sr <- r * sign(d)
sr
```

`## [1] -5 8 7 3 -1 -4 2 -6`

There are no ties in the ranks, so we shall calculate \(t^+\), the observed value of \(T^+\).

```
tplus <- sum(sr[sr > 0])
tplus
```

`## [1] 20`

The observed value of the test statistic is above the null mean

`n*(n+1)/4`

`## [1] 18`

so the \(p\)-value of this two-tailed test may be calculated as follows.

```
phi <- 2*psignrank(tplus-1, n, lower.tail=FALSE)
phi
```

`## [1] 0.84375`

- We fail to reject the null hypothesis at the 5% level and conclude there is not enough evidence to suggest the median radon level is not 100.

We can use the `wilcox.test`

library function with `piared=TRUE`

as follows.

`wilcox.test(radon, rep(100, length(radon)), paired=TRUE)`

```
##
## Wilcoxon signed rank test
##
## data: radon and rep(100, length(radon))
## V = 16, p-value = 0.8438
## alternative hypothesis: true location shift is not equal to 0
```

- Almost identical result.

Another way to get the same thing, with a little less typing, is to specify the `mu`

argument in `wilcox.test`

.

`wilcox.test(radon, mu=100)`

```
##
## Wilcoxon signed rank test
##
## data: radon
## V = 16, p-value = 0.8438
## alternative hypothesis: true location is not equal to 100
```