Pearson’s \(\chi^2\) test

Let \(n_i\) represent the number of observations sampled from the \(i^\mathrm{th}\) population, for \(i=1, \dots, r\) populations. Each observation in each sample may be classified into one of \(c\) different categories. Let \(O_{ij}\) be the number of observations from the \(i^\mathrm{th}\) sample that fall into category \(j\), for \(j=1,\dots,c\). So

\[ n_i = O_{i1} + O_{i2} + \cdots + O_{ic}, \quad \mbox{for all } i. \]

The data are arranged in the following \(r \times c\) contingency table.

  Class 1 Class 2 \(\cdots\) Class \(c\) Total
Population 1 \(O_{11}\) \(O_{12}\) \(\cdots\) \(O_{1c}\) \(n_1\)
Population 2 \(O_{21}\) \(O_{22}\) \(\cdots\) \(O_{2c}\) \(n_2\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
Population \(r\) \(O_{r1}\) \(O_{r2}\) \(\cdots\) \(O_{rc}\) \(n_2\)
Total \(C_1\) \(C_2\) \(\cdots\) \(C_c\) \(N\)

The total number of observations from all samples is denoted by \(N\)

\[ N = n_1 + n_2 + \cdots + n_r. \]

The number of observations in the \(j^\mathrm{th}\) column is denoted by \(C_j\), tallying the total number of observations in the \(j^\mathrm{th}\) category, or class, from all samples combined.

The test statistic \(T\) is given by

\[ T = \sum_{i=1}^r \sum_{j=1}^c \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad \mbox{ where } \quad E_{ij} = \frac{n_i C_j}{N}. \]

Note that in the \(2 \times 2\) case this \(T\) equals \(T_1^2\) from the two-sided proportion test, which you may recall has a \(\chi^2_1\) distribution. More generally, the null distribution of \(T\) is approximately \(\chi^2_{(r-1)(c-1)}\), which agrees with the specific \(2 \times 2\) case.

We consider the following two-tailed hypothesis tests. Let the probability of a randomly selected value from the \(i^\mathrm{th}\) population being classified in the \(j^\mathrm{th}\) class be denoted by \(p_{ij}\), for \(i=1,\dots,r\) and \(j=1,\dots,c\).

\[ \begin{aligned} \mathcal{H}_0: & \; \mbox{All of the probabilities in the same column are equal to each other} \\ & \; \mbox{(i.e., $p_{1j} = p_{2j} = \cdots = p_{rj}$, for all $j$)} \\ \mathcal{H}_1: & \; \mbox{At least two of the probabilities in the same column are not equal} \\ & \; \mbox{(i.e., $p_{ij} \ne p_{kj}$ for some $j$ and for some pair $i$ and $k$)} \end{aligned} \]

The \(p\)-value is calculated using the upper-tail of \(\chi^2_{(r-1)(c-1)}\) just as in the two tailed proportion test: \(\varphi = \mathbb{P}(T \geq t)\).

Income and job satisfaction

The following table presents the variables income and job satisfaction, measured for a group of males in a national (U.S.) sample. We are interested in determining if the satisfaction level is the same for people in the different categories of income.

  Dissatisfied Little Dissatisfied Moderately Satisfied Satisfied
< 15K 1 3 10 6
15-25K 2 3 10 7
25-40K 1 6 14 15
> 40K 0 1 9 11

Lets collect that data into R.

O <- rbind(
    c(1, 3, 10, 6),
    c(2, 3, 10, 7),
    c(1,  6,  14, 15),
    c(0, 1, 9, 11))

Now we can calculate the row and column sums as follows.

n <- rowSums(O)
C <- colSums(O)
N <- sum(n)

And if we really want to, we can visualize the completed table.

tab <- rbind(O, C)
tab <- cbind(tab, c(n, N))
colnames(tab) <- c("diss", "little diss", "mod sat", "sat", "tot") 
rownames(tab) <- c("<15", "15-25", "25-40", ">40", "tot")
tab
##       diss little diss mod sat sat tot
## <15      1           3      10   6  20
## 15-25    2           3      10   7  22
## 25-40    1           6      14  15  36
## >40      0           1       9  11  21
## tot      4          13      43  39  99

Now, we can calculate the test statistic through two steps.

  1. First calculate the expected counts under the null hypothesis for the sample from each population.
E <- outer(n, C/N)
E
##           [,1]     [,2]      [,3]      [,4]
## [1,] 0.8080808 2.626263  8.686869  7.878788
## [2,] 0.8888889 2.888889  9.555556  8.666667
## [3,] 1.4545455 4.727273 15.636364 14.181818
## [4,] 0.8484848 2.757576  9.121212  8.272727
  1. Then calculate the normalized (squared) discrepancy between observed and expected counts, and add them up.
t <- sum((O - E)^2/E)
t
## [1] 6.052191

Now to complete the test, calculate the \(p\)-value by referring to a \(\chi^2\) distribution with the following degrees of freedom.

df <- (length(n) - 1) * (length(C) - 1)
df
## [1] 9

Here we go: \(\varphi = \mathbb{P}(T \geq t)\).

pchisq(t, df, lower.tail=FALSE)
## [1] 0.7346836
  • Fail to reject the null hypothesis at the 5% level, and conclude that the satisfaction level is the same for people in the different categories of income.

As usual, R has a library that will do it all for you.

chisq.test(O)
## Warning in chisq.test(O): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  O
## X-squared = 6.0522, df = 9, p-value = 0.7347
  • Same result.
  • Note the warning that the approximation may not be fantastic, but don’t let it bother you.

\(\chi^2\) test for independence

Somewhat more generally than the multi-population situation above, suppose that a random sample of size \(N\) was obtained, and then classified according to two criteria, sorting the observations into \(r\) rows and \(c\) columns. Define \(O_{ij}\) as before, and create the following contingency table.

  Column 1 Column 2 \(\cdots\) Column \(c\) Total
Row 1 \(O_{11}\) \(O_{12}\) \(\cdots\) \(O_{1c}\) \(R_1\)
Row 2 \(O_{21}\) \(O_{22}\) \(\cdots\) \(O_{2c}\) \(R_2\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
Row \(r\) \(O_{r1}\) \(O_{r2}\) \(\cdots\) \(O_{rc}\) \(R_r\)
Total \(C_1\) \(C_2\) \(\cdots\) \(C_c\) \(N\)

The test statistic \(T\) is define exactly as above, and it has the same null distribution.

The hypotheses are framed a little differently in this context.

\[ \begin{aligned} \mathcal{H}_0: & \; \mbox{The event "an observation is in row i" is independent of} \\ & \; \mbox{the event that "that same observation is in column $j$", for all $i$ and $j$} \\ \mathcal{H}_1: & \; \mbox{The event "an observation is in row i" is depends on} \\ & \; \mbox{the event that "that same observation is in column $j$", for at least one $i$ and $j$ combination} \end{aligned} \]

By definition of independence of events, the hypotheses may be restated (in symbols) as follows.

\[ \begin{aligned} \mathcal{H}_0 : & \mathbb{P}(\mbox{row } i, \mbox{column } j) = \mathbb{P}(\mbox{row } i) \cdot \mathbb{P}(\mbox{column } j) \; \mbox{ for all } i,j\\ \mathcal{H}_1 : & \mathbb{P}(\mbox{row } i, \mbox{column } j) \ne \mathbb{P}(\mbox{row } i) \cdot \mathbb{P}(\mbox{column } j) \; \mbox{ for some } i,j \end{aligned} \]

How often is sex fun?

The following table summarizes responses of 91 married couples in Arizona to a question about how often sex is fun. Are the wife’s rating and the husband’s rating independent from each other?

Husband \ Wife Almost Never Fairly Often Very Often Almost Always
Almost Never 7 7 2 3
Fairly Often 2 8 3 7
Very Often 1 5 4 9
Almost Always 2 8 9 14

Lets collect that data into R.

O <- rbind(
    c(7, 7, 2, 3),
    c(2, 8, 3, 7),
    c(1, 5, 4, 9), 
    c(2, 8, 9, 14))

Now we can calculate the row and column sums as follows.

R <- rowSums(O)
C <- colSums(O)
N <- sum(R)

And if we really want to, we can visualize the completed table.

tab <- rbind(O, C)
tab <- cbind(tab, c(R, N))
colnames(tab) <- c("never", "often", "very", "always", "tot") 
rownames(tab) <- c("never", "often", "very", "always", "tot")
tab
##        never often very always tot
## never      7     7    2      3  19
## often      2     8    3      7  20
## very       1     5    4      9  19
## always     2     8    9     14  33
## tot       12    28   18     33  91

Now, we can calculate the test statistic through two steps. (This is cut-and-paste from above, but be careful to replace n with R.)

  1. First calculate the expected counts under the null hypothesis for the sample from each population.
E <- outer(R, C/N)
E
##          [,1]      [,2]     [,3]      [,4]
## [1,] 2.505495  5.846154 3.758242  6.890110
## [2,] 2.637363  6.153846 3.956044  7.252747
## [3,] 2.505495  5.846154 3.758242  6.890110
## [4,] 4.351648 10.153846 6.527473 11.967033
  1. Then calculate the normalized (squared) discrepancy between observed and expected counts, and add them up.
t <- sum((O - E)^2/E)
t
## [1] 16.95524

Now to complete the test, calculate the \(p\)-value by referring to a \(\chi^2\) distribution with the following degrees of freedom.

df <- (length(R) - 1) * (length(C) - 1)
df
## [1] 9

Here we go: \(\varphi = \mathbb{P}(T \geq t)\).

pchisq(t, df, lower.tail=FALSE)
## [1] 0.0494215
  • Reject the null hypothesis (woah, just barely!) at the 5% level and conclude that the wife’s rating and the husband’s rating are NOT independent from each other.

Since everything else is basically the same, it makes sense that we would use the same library calculations.

chisq.test(O)
## Warning in chisq.test(O): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  O
## X-squared = 16.955, df = 9, p-value = 0.04942
  • Same result; same warning.