McNemar Test for Signifigance of Changes

Consider mutually independent bivariate random variables \((X_1, Y_1), \dots, (X_{n'}, Y_{n'})\) on an ordinal measurement scale, i.e., \((X_i, Y_i) \in \{0,1\}^2\). The data can be summarized in a \(2 \times 2\) contingency table.

  \(Y_i = 0\) \(Y_i = 1\)
\(X_i = 0\) a b
\(X_i = 1\) c d
  1. the number of pairs where \(X_i = 0\) and \(Y_i = 0\)
  2. the number of pairs where \(X_i = 0\) and \(Y_i = 1\)
  3. the number of pairs where \(X_i = 1\) and \(Y_i = 0\)
  4. the number of pairs where \(X_i = 1\) and \(Y_i = 1\)

Two test statistics, which have the following null distributions, are common.

\[ \begin{aligned} T_1 &= \frac{(b-c)^2}{b+c} && \mbox{when} & n &= b+c > 20, && \mbox{in which case} & T_1 &\sim \chi^2_1 \\ T_2 &= b && \mbox{when} & n&=b+c \leq 20, && \mbox{in which case} & T_2 &\sim \mathrm{Bin}(n, 1/2) \end{aligned} \]

Metro complaints

There were many complaints from a Metro survey stating that commuting on Metro trains during the night at non-peak hours takes longer than commuting during the day at non-peak hours.

At both hours, trains run every 15 minutes. Officials use a sample of ten passengers traveling at both times of the day. Each passenger was asked to record the amount of time it takes for the next train to arrive after waiting for one to pass, and these numbers are recorded in the data frame below.

time <- data.frame(
    X=c(17, 22, 12, 14, 15, 16, 12, 10, 9, 14),
    Y=c(15, 24, 12, 19, 21, 17, 13, 16, 17, 17))

At \(\alpha = 0.05\), is there is enough evidence to suggest that the chance of experiencing a late train is different for these two non-peak periods?

Here we will label as 0 the trains which are early or on-time, i.e., if their time was \(\leq 15\) minutes. Otherwise we will label them as 1 for late.

ontime <- as.data.frame(!(time <= 15))
ontime
##        X     Y
## 1   TRUE FALSE
## 2   TRUE  TRUE
## 3  FALSE FALSE
## 4  FALSE  TRUE
## 5  FALSE  TRUE
## 6   TRUE  TRUE
## 7  FALSE FALSE
## 8  FALSE  TRUE
## 9  FALSE  TRUE
## 10 FALSE  TRUE
  • Careful! Our labeling is the opposite of the logical labeling.

R can put that into contingency table format with the table function.

ct <- table(ontime$X, ontime$Y)
ct
##        
##         FALSE TRUE
##   FALSE     2    5
##   TRUE      1    2

From this we can extract out \(b\) and \(c\) as required.

b <- ct[1,2]
c <- ct[2,1]
n <- b + c
c(b, c, n)
## [1] 5 1 6

Since \(n < 20\) we will calculate \(t_2\) and leverage that \(T_2 \sim \mathrm{Bin}(6, 1/2)\) to perform a two-tailed test.

pstar <- 1/2
t2 <- b
c(t2, n*pstar)
## [1] 5 3

Since \(t_2\) is greater than the null mean we have the following, which is cut-and-paste from our sign test examples.

lhs <- seq(0, floor(pstar*n))
eps <- sqrt(.Machine$double.eps) ## numerical wiggle room
tL <- sum(dbinom(lhs, n, pstar) <= dbinom(t2, n, pstar) * (1 + eps))  
pbinom(tL - 1, n, pstar) + pbinom(t2 - 1, n, pstar, lower.tail=FALSE)
## [1] 0.21875

Or, we can double check with the library.

binom.test(t2, n, pstar)
## 
##  Exact binomial test
## 
## data:  t2 and n
## number of successes = 5, number of trials = 6, p-value = 0.2188
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.3587654 0.9957893
## sample estimates:
## probability of success 
##              0.8333333
  • Either way, we fail to reject \(\mathcal{H}_0\) at the \(\alpha=0.05\) level and conclude that the chance of experiencing a late train is equal for these two non-peak periods.

Metro security

New security measures have been adopted by Metro officials after survey responses stated that they have seen questionable behavior in the stations.

Metro used two companies to train their station managers on these security measures. Below is a table showing how many managers passed the certification exam that measured their knowledge learned.

A 0 = Fail 1 = Pass
0 = Fail 113 92
1 = Pass 57 17

Does the table show evidence that the proportion of managers that pass the exam is significantly differentiated by the company that taught them?

b <- 92 
c <- 57
n <- b + c
c(b, c, n)
## [1]  92  57 149

Here we have \(n = 149 \gg 20\) so we calculate the observed value \(t_1\) of \(T_1 \sim \chi^2_1\).

t1 <- (b-c)^2/(b+c)
t1
## [1] 8.221477

The \(p\)-value may be calculated as \(P(T_1 > t_1)\).

pchisq(t1, 1, lower.tail=FALSE)
## [1] 0.00413975

This is an easy reject of \(\mathcal{H}_0\) at the 5% level. We conclude that the proportion of station managers that pass the exam is significantly differentiated by the company that taught them.

Since \(n \gg 20\) we can use a library function too, which gives identical results when the continuity correction is turned off.

mcnemar.test(matrix(c(113,92,57,17), nrow=2), correct=FALSE)
## 
##  McNemar's Chi-squared test
## 
## data:  matrix(c(113, 92, 57, 17), nrow = 2)
## McNemar's chi-squared = 8.2215, df = 1, p-value = 0.00414
  • Note that this library function does not apply on the previous question because \(n \ll 20\). However, if you supply correct = TRUE you will get an answer that is pretty close to the one we calculated.

Cox and Stuart Test for Trend

The Cox and Stuart test is just the sign test with pairs \((X_1, X_{1+c}), (X_2, X_{2+c}), \dots, (X_{n'-c}, X_{n'})\) where \(c=n'/2\) if \(n'\) is even and \(c=(n'+1)/2\) if \(n'\) is odd, in which case we drop the middle value \(X_c\). As with the sign test, \(n\) is the number of untied pairs, \(t\) is the observed number of “+”’s and \(T \sim \mathrm{Bin}(n,1/2)\), in which case we simply have a Binomial test.

Crowded metro

In a Metro survey, passengers complained about the crowdedness of the Richmond Express bus in the morning, which has fifteen stops.

Metro officials are considering adding more buses if they see enough evidence that an increasing trend exists with the number of passengers on the bus by the end of the line. Below is sample data collected at the fifteen bus stops.

passengers <- c(43, 38, 42, 49, 51, 57, 33, 36, 27, 47, 57, 61, 65, 71, 72)

At the five percent level of significance, does a increasing trend exist?

Before jumping into testing it is worth making a plot to get some intuition about potential trend.

plot(passengers, main="checking for trend", xlab="stop #", ylab="passenger count")

  • It does look plausible, although perhaps a more nuanced pattern is at play.
  • There is a “dead zone” from stop 7 to 10, roughly.

Now we perform an upper-tail test of increasing trend where we create pairs in the following way.

x <- passengers[1:floor(length(passengers)/2)]
y <- passengers[(ceiling(length(passengers)/2)+1):length(passengers)]
rbind(x,y)
##   [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## x   43   38   42   49   51   57   33
## y   27   47   57   61   65   71   72

The observed value of the test statistic is the count of the number of pairs where \(x < y\); and \(n\) is the number of unequal pairs.

t <- sum(x < y)
n <- sum(x != y)
c(t, n)
## [1] 6 7

Under the null hypothesis, which is that the probability of plusses, \(p\) is at most \(1/2\), \(\mathcal{H}_0: p \leq 1/2\), the null distribution is \(T \sim \mathrm{Bin}(7, 1/2)\). The alternative hypothesis that \(\mathcal{H}_1: p > 1/2\) gives that the \(p\)-value may be calculated as.

pbinom(t-1, n, 1/2, lower.tail=FALSE)
## [1] 0.0625

With this result we fail to reject the null hypothesis at the 5% level, and conclude that the trend may not be increasing.

There is a routine called cox.stuart.test in the randtests library.

library(randtests)
cox.stuart.test(passengers, alternative="right.sided")
## 
##  Cox Stuart test
## 
## data:  passengers
## statistic = 6, n = 7, p-value = 0.0625
## alternative hypothesis: increasing trend
  • Identical result.

Detecting correlation

The test for trend can be used to detect correlation in a pair of samples, which basically means that detecting whether high values of one random variable tend to be paired with high values of another (positive correlation), or low values of one tend to be paired with high values of another (negative correlation).

The test involves arranging the pairs (the pairs remain intact) so that one member of the pair (usually the variable with fewer ties) is arranged in increasing order. If there is correlation, the other member of the pair will exhibit a trend, upward if the correlation is positive and downward if it is negative.

Consider the reactions of several patients with each of two drugs to see if there is a positive correlation between the two reactions for each patient.

reactions <- data.frame(
    drug1=c(0.7,-1.6,-0.2,-1.2,-0.1,3.4,3.7,0.8,0.0,2.0),
    drug2=c(1.9,0.8,1.1,0.1,-0.1,4.4,5.5,1.6,4.6,3.4))

Ordering the pairs according to the reaction from drug 1 gives the following result.

or <- reactions[order(reactions$drug1),]
or
##    drug1 drug2
## 2   -1.6   0.8
## 4   -1.2   0.1
## 3   -0.2   1.1
## 5   -0.1  -0.1
## 9    0.0   4.6
## 1    0.7   1.9
## 8    0.8   1.6
## 10   2.0   3.4
## 6    3.4   4.4
## 7    3.7   5.5

Now, we take the drug2 column and perform a test for upward trend which results in the overall null hypothesis that there is not positive correlation versus the alternative that there is positive correlation.

d2o <- or$drug2

The plot below gives a quick visual of potential for trend.

plot(d2o, main="checking for trend", xlab="order of drug 1", ylab="drug 2")

  • Indeed, a plausibly increasing trend.

Now we can perform the Cox & Stuart test with the following calculations.

x <- d2o[1:floor(length(d2o)/2)]
y <- d2o[(ceiling(length(d2o)/2)+1):length(d2o)]
t <- sum(x < y)
n <- sum(x != y)
c(t, n)
## [1] 5 5

The null distribution is \(T \sim \mathrm{Bin}(5, 1/2)\) and therefore the \(p\)-value may be calculated as follows.

pbinom(t-1, n, 1/2, lower.tail=FALSE)
## [1] 0.03125

Based on that result we may reject the null hypothesis and conclude that there is a positive correlation between reactions to the two drugs.

Using the library gives the same result.

cox.stuart.test(d2o, alternative="right.sided")
## 
##  Cox Stuart test
## 
## data:  d2o
## statistic = 5, n = 5, p-value = 0.03125
## alternative hypothesis: increasing trend

Testing for a particular pattern

The number if eggs laid by a group of insects in a laboratory is counted on an hourly basis during a 24 hour experiment, to test

\[ \begin{aligned} \mathcal{H}_0:& \mbox{ The 24 egg counts constitute observations on 24 IID random variables} \\ \mathcal{H}_1: & \mbox{ The number of eggs tends to be a minimum at 2:15pm,}\\ & \mbox{ increasing for a maximum at 2:15am, and decreasing again until 2:15pm} \end{aligned} \]

The hourly counts are as follows, where the numbers in the time column start at 9am and end at 8am.

eggs <- data.frame(
    time=c(9:24,1:8),
    counts=c(151,119,146,111,63,84,60,109,83,166,143,116,163,208,283,296,286,235,223,176,176,174,139,137))

If the alternative hypothesis is true, the egg counts nearest 2:15pm should tend to be the smallest and those nearest 2:15am should be the largest. Lets take a look at the raw data and see if that is true.

plot(eggs$time, eggs$counts, main="checking for trend")

  • Indeed, looks pretty compelling.

To test formally we will first need to re-order the observations in order of their distance from 2:15pm

o <- order(sin(((1:24)-0.25) * 2*pi/24), decreasing=TRUE)
eo <- eggs[o,]
eo
##    time counts
## 6    14     84
## 7    15     60
## 5    13     63
## 8    16    109
## 4    12    111
## 9    17     83
## 3    11    146
## 10   18    166
## 2    10    119
## 11   19    143
## 1     9    151
## 12   20    116
## 24    8    137
## 13   21    163
## 23    7    139
## 14   22    208
## 22    6    174
## 15   23    283
## 21    5    176
## 16   24    296
## 20    4    176
## 17    1    286
## 19    3    223
## 18    2    235

these number should exhibit an upward trend.

BTW, why did that “sine trick” work? I needed a periodic function with a period of 24 whose maximum value was at one-quarter (0.25) past the fourteenth hour. The plot below shows that that is indeed the case with the function I chose.

plot(eggs$time, sin(((1:24)-0.25) * 2*pi/24), main="order from 14:15",
    xlab="time", ylab="ranking")

  • Of course, you could have constructed the ordering by hand.

Again, it makes sense to visually check for trend in the re-ordered values.

plot(eo$counts, main="checking for trend",
    xlab="order of distance from 14:15pm", ylab="counts")

  • Again, retty much a smoking gun, but we’ll continue running through the formalities anyways.

Alright, now we to construct a test for trend. That means collecting \(x\) values from the first half and \(y\) values from the second.

x <- eo$counts[1:floor(nrow(eo)/2)]
y <- eo$counts[(ceiling(nrow(eo)/2)+1):nrow(eo)]
rbind(x,y)
##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## x   84   60   63  109  111   83  146  166  119   143   151   116
## y  137  163  139  208  174  283  176  296  176   286   223   235

Next, we extract the relevant information from the pairs.

t <- sum(x < y)
n <- sum(x != y)
c(t, n)
## [1] 12 12

The null distribution is \(T \sim \mathrm{Bin}(12, 1/2)\) and therefore the \(p\)-value may be calculated as follows.

pbinom(t-1, n, 1/2, lower.tail=FALSE)
## [1] 0.0002441406

Based on that result this is an easy reject of \(\mathcal{H}_0\) at the 5% level, and we conclude that the predicted pattern does indeed seem to be present.

Finally, using the library …

cox.stuart.test(eo$counts, "right.sided")
## 
##  Cox Stuart test
## 
## data:  eo$counts
## statistic = 12, n = 12, p-value = 0.0002441
## alternative hypothesis: increasing trend
  • … gives the same result.