Department of Statistics, Virginia Tech

Let \(n_i\) represent the number of observations sampled from the \(i^\mathrm{th}\) population, for \(i=1, \dots, r\) populations. Each observation in each sample may be classified into one of \(c\) different categories. Let \(O_{ij}\) be the number of observations from the \(i^\mathrm{th}\) sample that fall into category \(j\), for \(j=1,\dots,c\). So

\[ n_i = O_{i1} + O_{i2} + \cdots + O_{ic}, \quad \mbox{for all } i. \]

The data are arranged in the following \(r \times c\) contingency table.

Class 1 | Class 2 | \(\cdots\) | Class \(c\) | Total | |
---|---|---|---|---|---|

Population 1 |
\(O_{11}\) | \(O_{12}\) | \(\cdots\) | \(O_{1c}\) | \(n_1\) |

Population 2 |
\(O_{21}\) | \(O_{22}\) | \(\cdots\) | \(O_{2c}\) | \(n_2\) |

\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |

Population \(r\) |
\(O_{r1}\) | \(O_{r2}\) | \(\cdots\) | \(O_{rc}\) | \(n_2\) |

Total |
\(C_1\) | \(C_2\) | \(\cdots\) | \(C_c\) | \(N\) |

The total number of observations from all samples is denoted by \(N\)

\[ N = n_1 + n_2 + \cdots + n_r. \]

The number of observations in the \(j^\mathrm{th}\) column is denoted by \(C_j\), tallying the total number of observations in the \(j^\mathrm{th}\) category, or class, from all samples combined.

The test statistic \(T\) is given by

\[ T = \sum_{i=1}^r \sum_{j=1}^c \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad \mbox{ where } \quad E_{ij} = \frac{n_i C_j}{N}. \]

Note that in the \(2 \times 2\) case this \(T\) equals \(T_1^2\) from the two-sided proportion test, which you may recall has a \(\chi^2_1\) distribution. More generally, the null distribution of \(T\) is approximately \(\chi^2_{(r-1)(c-1)}\), which agrees with the specific \(2 \times 2\) case.

We consider the following two-tailed hypothesis tests. Let the probability of a randomly selected value from the \(i^\mathrm{th}\) population being classified in the \(j^\mathrm{th}\) class be denoted by \(p_{ij}\), for \(i=1,\dots,r\) and \(j=1,\dots,c\).

\[ \begin{aligned} \mathcal{H}_0: & \; \mbox{All of the probabilities in the same column are equal to each other} \\ & \; \mbox{(i.e., $p_{1j} = p_{2j} = \cdots = p_{rj}$, for all $j$)} \\ \mathcal{H}_1: & \; \mbox{At least two of the probabilities in the same column are not equal} \\ & \; \mbox{(i.e., $p_{ij} \ne p_{kj}$ for some $j$ and for some pair $i$ and $k$)} \end{aligned} \]

- One-tailed tests don’t make sense when there are \(r > 2\) populations; so all tests are two-tailed.

The \(p\)-value is calculated using the upper-tail of \(\chi^2_{(r-1)(c-1)}\) just as in the two tailed proportion test: \(\varphi = \mathbb{P}(T \geq t)\).

The following table presents the variables income and job satisfaction, measured for a group of males in a national (U.S.) sample. We are interested in determining if the satisfaction level is the same for people in the different categories of income.

Dissatisfied | Little Dissatisfied | Moderately Satisfied | Satisfied | |
---|---|---|---|---|

< 15K | 1 | 3 | 10 | 6 |

15-25K | 2 | 3 | 10 | 7 |

25-40K | 1 | 6 | 14 | 15 |

> 40K | 0 | 1 | 9 | 11 |

Lets collect that data into R.

```
O <- rbind(
c(1, 3, 10, 6),
c(2, 3, 10, 7),
c(1, 6, 14, 15),
c(0, 1, 9, 11))
```

Now we can calculate the row and column sums as follows.

```
n <- rowSums(O)
C <- colSums(O)
N <- sum(n)
```

And if we really want to, we can visualize the completed table.

```
tab <- rbind(O, C)
tab <- cbind(tab, c(n, N))
colnames(tab) <- c("diss", "little diss", "mod sat", "sat", "tot")
rownames(tab) <- c("<15", "15-25", "25-40", ">40", "tot")
tab
```

```
## diss little diss mod sat sat tot
## <15 1 3 10 6 20
## 15-25 2 3 10 7 22
## 25-40 1 6 14 15 36
## >40 0 1 9 11 21
## tot 4 13 43 39 99
```

Now, we can calculate the test statistic through two steps.

- First calculate the expected counts under the null hypothesis for the sample from each population.

```
E <- outer(n, C/N)
E
```

```
## [,1] [,2] [,3] [,4]
## [1,] 0.8080808 2.626263 8.686869 7.878788
## [2,] 0.8888889 2.888889 9.555556 8.666667
## [3,] 1.4545455 4.727273 15.636364 14.181818
## [4,] 0.8484848 2.757576 9.121212 8.272727
```

- Then calculate the normalized (squared) discrepancy between observed and expected counts, and add them up.

```
t <- sum((O - E)^2/E)
t
```

`## [1] 6.052191`

Now to complete the test, calculate the \(p\)-value by referring to a \(\chi^2\) distribution with the following degrees of freedom.

```
df <- (length(n) - 1) * (length(C) - 1)
df
```

`## [1] 9`

Here we go: \(\varphi = \mathbb{P}(T \geq t)\).

`pchisq(t, df, lower.tail=FALSE)`

`## [1] 0.7346836`

- Fail to reject the null hypothesis at the 5% level, and conclude that the satisfaction level is the same for people in the different categories of income.

As usual, R has a library that will do it all for you.

`chisq.test(O)`

`## Warning in chisq.test(O): Chi-squared approximation may be incorrect`

```
##
## Pearson's Chi-squared test
##
## data: O
## X-squared = 6.0522, df = 9, p-value = 0.7347
```

- Same result.
- Note the warning that the approximation may not be fantastic, but don’t let it bother you.

Somewhat more generally than the multi-population situation above, suppose that a random sample of size \(N\) was obtained, and then classified according to two criteria, sorting the observations into \(r\) rows and \(c\) columns. Define \(O_{ij}\) as before, and create the following contingency table.

Column 1 | Column 2 | \(\cdots\) | Column \(c\) | Total | |
---|---|---|---|---|---|

Row 1 |
\(O_{11}\) | \(O_{12}\) | \(\cdots\) | \(O_{1c}\) | \(R_1\) |

Row 2 |
\(O_{21}\) | \(O_{22}\) | \(\cdots\) | \(O_{2c}\) | \(R_2\) |

\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |

Row \(r\) |
\(O_{r1}\) | \(O_{r2}\) | \(\cdots\) | \(O_{rc}\) | \(R_r\) |

Total |
\(C_1\) | \(C_2\) | \(\cdots\) | \(C_c\) | \(N\) |

- The only real difference is notational, with \(R_i\) in place of \(n_i\), to convey that the sorting into rows (as well as columns) is random.

The test statistic \(T\) is define exactly as above, and it has the same null distribution.

The hypotheses are framed a little differently in this context.

\[ \begin{aligned} \mathcal{H}_0: & \; \mbox{The event "an observation is in row i" is independent of} \\ & \; \mbox{the event that "that same observation is in column $j$", for all $i$ and $j$} \\ \mathcal{H}_1: & \; \mbox{The event "an observation is in row i" is depends on} \\ & \; \mbox{the event that "that same observation is in column $j$", for at least one $i$ and $j$ combination} \end{aligned} \]

By definition of independence of events, the hypotheses may be restated (in symbols) as follows.

\[ \begin{aligned} \mathcal{H}_0 : & \mathbb{P}(\mbox{row } i, \mbox{column } j) = \mathbb{P}(\mbox{row } i) \cdot \mathbb{P}(\mbox{column } j) \; \mbox{ for all } i,j\\ \mathcal{H}_1 : & \mathbb{P}(\mbox{row } i, \mbox{column } j) \ne \mathbb{P}(\mbox{row } i) \cdot \mathbb{P}(\mbox{column } j) \; \mbox{ for some } i,j \end{aligned} \]

- As above, the \(p\)-value is calculated using the upper-tail of \(\chi^2_{(r-1)(c-1)}\) just as in the two tailed proportion test: \(\varphi = \mathbb{P}(T \geq t)\).

The following table summarizes responses of 91 married couples in Arizona to a question about how often sex is fun. Are the wife’s rating and the husband’s rating independent from each other?

Husband \ Wife | Almost Never | Fairly Often | Very Often | Almost Always |
---|---|---|---|---|

Almost Never |
7 | 7 | 2 | 3 |

Fairly Often |
2 | 8 | 3 | 7 |

Very Often |
1 | 5 | 4 | 9 |

Almost Always |
2 | 8 | 9 | 14 |

Lets collect that data into R.

```
O <- rbind(
c(7, 7, 2, 3),
c(2, 8, 3, 7),
c(1, 5, 4, 9),
c(2, 8, 9, 14))
```

Now we can calculate the row and column sums as follows.

```
R <- rowSums(O)
C <- colSums(O)
N <- sum(R)
```

And if we really want to, we can visualize the completed table.

```
tab <- rbind(O, C)
tab <- cbind(tab, c(R, N))
colnames(tab) <- c("never", "often", "very", "always", "tot")
rownames(tab) <- c("never", "often", "very", "always", "tot")
tab
```

```
## never often very always tot
## never 7 7 2 3 19
## often 2 8 3 7 20
## very 1 5 4 9 19
## always 2 8 9 14 33
## tot 12 28 18 33 91
```

Now, we can calculate the test statistic through two steps. *(This is cut-and-paste from above, but be careful to replace n with R.)*

- First calculate the expected counts under the null hypothesis for the sample from each population.

```
E <- outer(R, C/N)
E
```

```
## [,1] [,2] [,3] [,4]
## [1,] 2.505495 5.846154 3.758242 6.890110
## [2,] 2.637363 6.153846 3.956044 7.252747
## [3,] 2.505495 5.846154 3.758242 6.890110
## [4,] 4.351648 10.153846 6.527473 11.967033
```

- Then calculate the normalized (squared) discrepancy between observed and expected counts, and add them up.

```
t <- sum((O - E)^2/E)
t
```

`## [1] 16.95524`

Now to complete the test, calculate the \(p\)-value by referring to a \(\chi^2\) distribution with the following degrees of freedom.

```
df <- (length(R) - 1) * (length(C) - 1)
df
```

`## [1] 9`

Here we go: \(\varphi = \mathbb{P}(T \geq t)\).

`pchisq(t, df, lower.tail=FALSE)`

`## [1] 0.0494215`

- Reject the null hypothesis (woah, just barely!) at the 5% level and conclude that the wife’s rating and the husband’s rating are NOT independent from each other.

Since everything else is basically the same, it makes sense that we would use the same library calculations.

`chisq.test(O)`

`## Warning in chisq.test(O): Chi-squared approximation may be incorrect`

```
##
## Pearson's Chi-squared test
##
## data: O
## X-squared = 16.955, df = 9, p-value = 0.04942
```

- Same result; same warning.