Signed ranks test

The data consists of \(n'\) matched pairs, \((X_1, Y_1), (X_2, Y_2), \dots, (X_{n'}, Y_{n'})\), similar to the correlation or regression context, but this time the goal is to check if \(\mathbb{E}\{X\} = \mathbb{E}\{Y\}\). This is examined through the differences between the \(X_i\) and \(Y_i\), checking those against zero.

Toward that end, let \(D_i = Y_i - X_i\) for \(i=1,\dots, n'\).

The Wilcoxon signed ranks test presumes that the distribution of the \(D_i\) is symmetric, which means (among other things) that the mean of the \(D_i\) is the same as the median, and that all of the \(D_i\)s have the same mean. As usual, we assume mutual independence.

The test statistic is defined in terms of so-called signed ranks. The signed rank \(R_i\), for each pair \((X_i, Y_i)\), where \(X_i \ne Y_i\), is defined as follows

\[ R_i = \left\{ \begin{array}{rl} R(|D_i|) & \mbox{if } D_i > 0 \\ -R(|D_i|) & \mbox{otherwise.} \end{array} \right. \]

The test statistic, \(T^+\) is the sum of the positive signed ranks:

\[ T^+ = \sum_{i=1}^n R_i \mathbb{I}_{\{D_i > 0\}} \]

The null distribution is calculated in a way similar to the Mann-Whitney/Wilcox rank sum test or the squared ranks test: basically count how many (random) positive ranks in a partition of positive and negative ranks sum to \(T^+\), divided by the total number of such partitions, extending the work we did in ranks.R.

If there are many ties, or if the table in A12 is failing us because \(n > 50\), a normal approximation based on the CLT can be used. The test statistic uses the sum of all of the signed ranks, with their \(+\) or \(-\) signs.

\[ Z = \frac{\sum_{i=1}^n R_i}{\sqrt{\sum_{i=1}^n R_i^2}}. \]

\[ Z = \frac{\sum_{i=1}^n R_i}{\sqrt{n(n+1)(2n+1)/6}}. \]

The hypotheses follow our usual three cases. A two-tailed test is specified as follows.

\[ \begin{aligned} \mathcal{H}_0 &: \; \mathbb{E}\{D\} = 0 \quad \mbox{ (i.e., $\mathbb{E}\{Y_i\} = \mathbb{E}\{X_i\}$)} \\ \mathcal{H}_1 &: \; \mathbb{E}\{D\} \ne 0 \end{aligned} \]

We calculate the \(p\)-value in the same manner as the Mann-Whitney/Wilcox rank sum test. When using \(T^+\), via the observed value \(t^+\) when there are few ties, the \(p\)-value may be calculated as

\[ \varphi = 2 \left\{ \begin{array}{ll} \mathbb{P}(T^+ \geq t^+-1) & \mbox{ if } t^+ > \frac{n(n+1)}{4}, \mbox{ i.e., above the null mean} \\ \mathbb{P}(T^+ \leq t^+) & \mbox{otherwise} \end{array} \right.. \]

In the case of the normal approximation, the two-tailed \(p\)-value calculation usually deploys a continuity correction.

\[ \varphi \approx 2 \min \left\{ \mathbb{P}\left(Z \leq \frac{\sum_{i=1}^n R_i+1}{\sqrt{\sum_{i=1}^n R_i^2}} \right), \mathbb{P}\left(Z \geq \frac{\sum_{i=1}^n R_i-1}{\sqrt{\sum_{i=1}^n R_i^2}} \right) \right\}. \]

A lower-tailed test involves

\[ \begin{aligned} \mathcal{H}_0 &: \; \mathbb{E}\{D\} \geq 0 \quad \mbox{ (i.e., $\mathbb{E}\{Y_i\} \geq \mathbb{E}\{X_i\}$)} \\ \mathcal{H}_1 &: \; \mathbb{E}\{D\} < 0 \end{aligned} \]

The \(p\)-value may be calculated as \[ \varphi = \mathbb{P}(T^+ \leq t^+) \quad \mbox{ or } \quad \varphi \approx \mathbb{P}\left(Z \leq \frac{\sum_{i=1}^n R_i+1}{\sqrt{\sum_{i=1}^n R_i^2}} \right) \] depending on whether \(T^+\) or the CLT \(Z\) statistic is being used.

An upper-tailed test involves

\[ \begin{aligned} \mathcal{H}_0 &: \; \mathbb{E}\{D\} \leq 0 \quad \mbox{ (i.e., $\mathbb{E}\{Y_i\} \leq \mathbb{E}\{X_i\}$)} \\ \mathcal{H}_1 &: \; \mathbb{E}\{D\} > 0 \end{aligned} \]

The \(p\)-value may be calculated as \[ \varphi = \mathbb{P}(T^+ \geq t^+ -1) \quad \mbox{ or } \quad \varphi \approx \mathbb{P}\left(Z \geq \frac{\sum_{i=1}^n R_i-1}{\sqrt{\sum_{i=1}^n R_i^2}} \right) \] depending on whether \(T^+\) or the CLT \(Z\) statistic is being used.

Blood circulation

Six deer were given a drug to improve their blood circulation. The circulation was measured at the time the drug was administered, and then again thirty minutes later. Is there any difference between the circulation before and after the treatment?

circ <- data.frame(before=c(2.76, 5.18, 2.68, 7.30, 4.10, 7.05),
    after=c(7.02, 3.10, 5.44, 4.85, 5.21, 10.26))

We shall treat the before column as \(X\) and after as \(Y\). The hypotheses are:

\[ \begin{aligned} \mathcal{H}_0 &: \; \mathbb{E}\{D\} = 0 \quad \mbox{ (i.e., $\mu_\mathrm{before} = \mu_\mathrm{after}$)} \\ \mathcal{H}_1 &: \; \mathbb{E}\{D\} \ne 0. \end{aligned} \]

The first step is to calculate the differences.

d <- circ$after - circ$before
d
## [1]  4.26 -2.08  2.76 -2.45  1.11  3.21

There are no zero differences, so nothing to remove: \(n = n'\).

n <- length(d)
n
## [1] 6

Then rank the absolute differences.

r <- rank(abs(d))
r
## [1] 6 2 4 3 1 5

And then calculate the signed ranks.

sr <- r * sign(d)
sr
## [1]  6 -2  4 -3  1  5

There are no ties in the ranks, so we shall calculate \(t^+\), the observed value of \(T^+\).

tplus <- sum(sr[sr > 0])
tplus
## [1] 16

The observed value of the test statistic is above the null mean

n*(n+1)/4
## [1] 10.5

so the \(p\)-value of this two-tailed test may be calculated as follows.

phi <- 2*psignrank(tplus-1, n, lower.tail=FALSE)
phi
## [1] 0.3125
  • We fail to reject the null hypothesis at the 5% level and conclude that there is not enough evidence to say there is a difference in blood circulation after the treatment.

We can use the wilcox.test library function with piared=TRUE as follows.

wilcox.test(circ$before, circ$after, paired=TRUE)
## 
##  Wilcoxon signed rank test
## 
## data:  circ$before and circ$after
## V = 5, p-value = 0.3125
## alternative hypothesis: true location shift is not equal to 0
  • Identical result.

Performance under fatigue

A study measured the time it takes volunteers to perform a certain task before and after a rigorous exercise regimen; i.e., pre- and post-fatigue. The times in both scenarios are listed below. Is there enough evidence to conclude that the time post-fatigue is higher than the time pre-fatigue.

time <- data.frame(pre=c(158, 92, 65, 98, 33, 89, 148),
    post=c(91, 148, 215, 226, 223, 91, 92))

We shall treat the pre column as \(X\) and post as \(Y\). The hypotheses are:

\[ \begin{aligned} \mathcal{H}_0 &: \; \mathbb{E}\{D\} \leq 0 \quad \mbox{ (i.e., $\mu_\mathrm{post} \leq \mu_\mathrm{pre}$)} \\ \mathcal{H}_1 &: \; \mathbb{E}\{D\} > 0. \end{aligned} \]

d <- time$post - time$pre
d
## [1] -67  56 150 128 190   2 -56

There are no zero differences, so nothing to remove: \(n = n'\).

n <- length(d)
n
## [1] 7

Then rank the absolute differences.

r <- rank(abs(d))
r
## [1] 4.0 2.5 6.0 5.0 7.0 1.0 2.5
  • Observe that there are ties in the ranks.

The signed ranks are calculated as follows.

sr <- r * sign(d)
sr
## [1] -4.0  2.5  6.0  5.0  7.0  1.0 -2.5

Since there are ties, we will use the CLT approximation, with the continuity correction. The statistic is …

z.cc <- (sum(sr)-1)/sqrt(sum(sr^2))
z.cc
## [1] 1.185335

… and the \(p\)-value of this upper-tailed test is:

pnorm(z.cc, lower.tail=FALSE)
## [1] 0.1179426
  • Fail to reject \(\mathcal{H}_0\) at the 5% level and conclude that there is not enough evidence to say that the post-fatigue time is higher on average.

We can invoke the library routine as follows. However, notice that the library routine reverses the roles of \(X\) and \(Y\) compared to the way it is presented above.

wilcox.test(time$post, time$pre, paired=TRUE, alternative="greater", exact=FALSE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  time$post and time$pre
## V = 21.5, p-value = 0.1179
## alternative hypothesis: true location shift is greater than 0
  • Perfect.

Nucleus study

Lets revisit the nucleus study from the first week of class. Recall that the claim was that SAP’s ROE was less than the industry at large. We can test that claim with the signed ranks test.

\[ \begin{aligned} \mathcal{H}_0 &: \; \mathbb{E}\{D\} \geq 0 \quad \mbox{ (i.e., $\mu_\mathrm{SAP} \geq \mu_\mathrm{Ind}$)} \\ \mathcal{H}_1 &: \; \mathbb{E}\{D\} < 0. \end{aligned} \]

Lets read in the data and extract the relevant columns, treating SAP ROE as \(Y\) and Industry ROE as \(X\).

nucleus <- read.csv("../data/nucleus.csv")
y <- nucleus$ROE
x <- nucleus$IndustryROE

First, lets calculate the differences.

d <- y - x
d
##  [1]  12.4  -3.7 -18.3  39.2  -4.4   4.2  56.3   1.7  -7.3 -17.3 -26.3
## [12]   9.3   1.3  -0.4  13.6   1.4 -49.9  17.7  69.3  -2.1  -8.4  -9.7
## [23]  13.2  -3.7  -2.4 -67.0  13.6 -13.8  18.1  -4.0  -0.1  -4.6   4.0
## [34]   2.7 -25.2  -4.2  -4.9   3.9  -7.8   0.5  -1.7  -1.1   2.2  -7.6
## [45]   8.9 -95.5  -1.3   0.0   5.6  10.9 -27.2   0.4 -24.5 -75.9  -1.5
## [56]  15.4 -17.7  -2.8   7.5   4.9  15.4  -5.3 -21.9 -26.8 -50.4  -5.9
## [67]  -5.7  -0.3  -3.6  -1.1  -1.1  -9.9   5.1  -8.9  31.0   8.1   2.1
## [78]  -9.1  32.2  11.8   0.4

There are zero differences, so lets remove them.

d <- d[d != 0]
n <- length(d)
n
## [1] 80

Then rank the absolute differences.

r <- rank(abs(d))
r
##  [1] 53.0 23.0 64.0 73.0 30.0 28.0 76.0 14.0 39.0 60.0 68.0 48.0 10.0  4.5
## [15] 55.5 12.0 74.0 61.5 78.0 16.0 44.0 49.0 54.0 24.0 19.0 77.0 55.5 57.0
## [29] 63.0 26.5  1.0 31.0 26.5 20.0 67.0 29.0 32.0 25.0 42.0  6.0 15.0  8.0
## [43] 18.0 41.0 46.0 80.0 11.0 36.0 51.0 70.0  3.0 66.0 79.0 13.0 58.0 61.5
## [57] 21.0 40.0 33.0 59.0 35.0 65.0 69.0 75.0 38.0 37.0  2.0 22.0  8.0  8.0
## [71] 50.0 34.0 45.0 71.0 43.0 17.0 47.0 72.0 52.0  4.5
  • Observe that there are ties in the ranks.

The signed ranks are calculated as follows.

sr <- r * sign(d)
sr
##  [1]  53.0 -23.0 -64.0  73.0 -30.0  28.0  76.0  14.0 -39.0 -60.0 -68.0
## [12]  48.0  10.0  -4.5  55.5  12.0 -74.0  61.5  78.0 -16.0 -44.0 -49.0
## [23]  54.0 -24.0 -19.0 -77.0  55.5 -57.0  63.0 -26.5  -1.0 -31.0  26.5
## [34]  20.0 -67.0 -29.0 -32.0  25.0 -42.0   6.0 -15.0  -8.0  18.0 -41.0
## [45]  46.0 -80.0 -11.0  36.0  51.0 -70.0   3.0 -66.0 -79.0 -13.0  58.0
## [56] -61.5 -21.0  40.0  33.0  59.0 -35.0 -65.0 -69.0 -75.0 -38.0 -37.0
## [67]  -2.0 -22.0  -8.0  -8.0 -50.0  34.0 -45.0  71.0  43.0  17.0 -47.0
## [78]  72.0  52.0   4.5

Since there are ties, we will use the CLT approximation, with the continuity correction. The statistic is …

z.cc <- (sum(sr)+1)/sqrt(sum(sr^2))
z.cc
## [1] -1.069584

… and the \(p\)-value of this lower-tailed test is:

pnorm(z.cc)
## [1] 0.1424032
  • Not enough evidence to reject \(\mathcal{H}_0\). We conclude that SAP ROE could well be larger than Industry ROE.

What does the library routine say?

wilcox.test(y, x, paired=TRUE, alternative="less", exact=FALSE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  y and x
## V = 1396.5, p-value = 0.1424
## alternative hypothesis: true location shift is less than 0
  • Identical.

As a median test

The Wilcoxon signed ranks test is equally appropriate a median test, where the data consist of a single random sample of size \(n'\), \(Y_1, Y_2, \dots, Y_{n'}\), simply by pairing the \(Y_i\) with an \(X_i = m\) for some median value \(m\) specified in the null hypothesis.

A two-tailed test involves

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{The median of $Y$ equals m} \\ \mathcal{H}_1 &: \; \mbox{The median of $Y$ is not m}. \end{aligned} \]

A lower-tailed test is described by

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{The median of $Y$ is $\geq$ m} \\ \mathcal{H}_1 &: \; \mbox{The median of $Y$ is $<$ m}. \end{aligned} \]

Finally, an upper-tailed test is described by

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{The median of $Y$ is $\leq$ m} \\ \mathcal{H}_1 &: \; \mbox{The median of $Y$ is $>$ m}. \end{aligned} \]

As mentioned above, simply construct pairs \((m, Y_1), (m, Y_2), \dots, (m, Y_{n'})\) and carry on with the analogous signed ranks test described above.

Radon readings

Consider the following measurements of radon, a gas hazardous to human health. Does this data suggest that the population median of radon readings differs from 100.

radon <- c(105.6, 90.9, 91.2, 96.9, 100.1, 105.0, 99.6, 107.7)

The hypotheses being entertained are

\[ \begin{aligned} \mathcal{H}_0 &: \; \mbox{The median of $Y$ equals 100} \\ \mathcal{H}_1 &: \; \mbox{The median of $Y$ is not 100}. \end{aligned} \]

First calculate the differences.

d <- 100 - radon
d
## [1] -5.6  9.1  8.8  3.1 -0.1 -5.0  0.4 -7.7

There are no zero differences, so nothing to remove: \(n = n'\).

n <- length(d)
n
## [1] 8

Then rank the absolute differences.

r <- rank(abs(d))
r
## [1] 5 8 7 3 1 4 2 6

And then calculate the signed ranks.

sr <- r * sign(d)
sr
## [1] -5  8  7  3 -1 -4  2 -6

There are no ties in the ranks, so we shall calculate \(t^+\), the observed value of \(T^+\).

tplus <- sum(sr[sr > 0])
tplus
## [1] 20

The observed value of the test statistic is above the null mean

n*(n+1)/4
## [1] 18

so the \(p\)-value of this two-tailed test may be calculated as follows.

phi <- 2*psignrank(tplus-1, n, lower.tail=FALSE)
phi
## [1] 0.84375
  • We fail to reject the null hypothesis at the 5% level and conclude there is not enough evidence to suggest the median radon level is not 100.

We can use the wilcox.test library function with piared=TRUE as follows.

wilcox.test(radon, rep(100, length(radon)), paired=TRUE)
## 
##  Wilcoxon signed rank test
## 
## data:  radon and rep(100, length(radon))
## V = 16, p-value = 0.8438
## alternative hypothesis: true location shift is not equal to 0
  • Almost identical result.

Another way to get the same thing, with a little less typing, is to specify the mu argument in wilcox.test.

wilcox.test(radon, mu=100)
## 
##  Wilcoxon signed rank test
## 
## data:  radon
## V = 16, p-value = 0.8438
## alternative hypothesis: true location is not equal to 100