When dealing with contingency tables, one option is to perform a test of independence (in the rows/columns). Another, perhaps more intuitive option is to just report on the level of dependence, and let the “client” decide for themselves. This could be especially valuable if making a relative comparisons between multiple studies, resulting in contingency tables, but otherwise of disparate nature. In that case, a bunch of hypothesis tests doesn’t do a good job at relaying a corpus of evidence.

The test statistic \(T\) captures the discrepancy between the observed data and what we expect under the null hypothesis of homogeneity, i.e., of low dependence between rows and columns.

\[ T = \sum_{i=1}^r \sum_{j=1}^c \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]

Consequently, high row–column dependence is synonymous with large \(T\) values which cause us to reject the null. This suggests using \(T\) as a measure of dependence, but unfortunately it overlooks the degrees-of-freedom aspect, which says “how large is large” relative to a \(\chi^2\) distribution, in order to determine the final outcome of the test (via the \(p\)-value). The following contingency coefficients attempt to work around this in various ways.

Cramér’s contingency coefficient

Cramér’s contingency coefficient involves normalizing \(T\) by the largest value that it can take on in any contingency table with the same dimensions (\(r\) and \(c\)) and same total sample size (\(N\)). That number is \(T_{\max} = N(\min\{r,c\}-1)\), yielding \[ R_1=\frac{T}{N(\min\{r,c\}-1)}, \quad \mbox{ such that } 0 \leq R_1 \leq 1. \] However, the units of that number are in squared differences, and usually we prefer ordinary (un-squared) differences from an interpretation perspective. Therefore Cramér’s contingency coefficient is provided by the following formula.

\[ \mbox{Cramér's contingency coefficient} = C_{\mathrm{cc}} = \sqrt{R_1} = \sqrt{\frac{T}{N(\min\{r,c\}-1)}} \]

Cramér’s coefficient, like all good measures of dependence, is “scale invariant”. That is, if the scale of the experiment becomes much larger, the measure of dependence doesn’t change as long as all the observations scale identically relative to each other.

Voting preferences

A public opinion poll surveyed a random sample of 1000 voters. Respondents were classified by gender (male or female) and voting preference (Republican, Democrat and Independent).

Gender   \   Preference Republican Democrat Independent
Male 200 150 50
Female 250 300 50

Lets assess dependence using Cramér’s contingency coefficient.

First, extract the data in R.

O <- rbind(c(200, 150, 50), c(250, 300, 50))
colnames(O) <- c("R", "D", "I")
rownames(O) <- c("M", "F")

Then fill out the table.

r <- rowSums(O)
c <- colSums(O)
N <- sum(r)
tab <- rbind(O, c)
tab <- cbind(tab, c(r, N))
colnames(tab)[ncol(tab)] <- rownames(tab)[nrow(tab)] <- "tot"
##       R   D   I  tot
## M   200 150  50  400
## F   250 300  50  600
## tot 450 450 100 1000

Now we can calculate the expected counts.

E <- outer(r, c/N)
##     R   D  I
## M 180 180 40
## F 270 270 60

Finally, we can evaluate the test statistic, \(T\).

t <- sum((O-E)^2/E)

The coefficient \(R_1\) normalizes \(T\) by the largest value that it can take on in a table of this form.

r1 <- t/(N*(min(length(r),length(c))-1))
## [1] 0.0162037

Finally, taking the square root determines Cramér’s contingency coefficient.

Ccc <- sqrt(r1)
## [1] 0.1272938
  • Quite low, but not negligible dependence between rows and columns. We conclude that the dependence between gender and voting preference is weak.

To see what it means that the statistic is scale invariant, consider a table that summarizes an experiment that was \(1/10\)th of the size, but otherwise had the same relative counts.

O2 <- O/10
r2 <- rowSums(O2)
c2 <- colSums(O2)
N2 <- sum(r2)
tab2 <- rbind(O2, c2)
tab2 <- cbind(tab2, c(r2, N2))
colnames(tab2)[ncol(tab)] <- rownames(tab2)[nrow(tab)] <- "tot"
##      R  D  I tot
## M   20 15  5  40
## F   25 30  5  60
## tot 45 45 10 100
E2 <- outer(r2, c2/N2)
t2 <- sum((O2-E2)^2/E2)
Ccc2 <- sqrt(t2/(N2*(min(length(r2),length(c2))-1)))
## [1] 0.1272938
  • Same result! The same would be true of a larger experiment.

Pearson’s contingency coefficients

There are two coefficients attributed to Karl Pearson, which as you can see are simplifications of Cramér’s contingency coefficient.

\[ \begin{aligned} R_2 &= \sqrt{\frac{T}{N+T}} && \mbox{and} & R_3 = \frac{T}{N}. \end{aligned} \]

\(R_3\) sometimes goes by the name Pearson’s mean-square contingency coefficient, to distinguish it from \(R_2\) which just goes by “contingency coefficient”.

As we discussed above, the maximum of \(T\) for a particular table is \(N(\min\{r,c\}-1)\), so we have that

\[ \begin{aligned} 0 \leq R_2 &\leq \sqrt{\frac{\min\{r,c\}-1}{\min\{r,c\}}} < 1 && \mbox{and} & 0 \leq & R_3 \leq \min\{r,c\}-1. \end{aligned} \]

Voting preferences continued

So for \(R_2\) we have

r2 <- sqrt(t/(N+t))
## [1] 0.1262748

with a maximum value of

q <- min(length(r), length(c))
r2max <- sqrt((q-1)/q)
## [1] 0.7071068

And for \(R_3\)

r3 <- t/N
## [1] 0.0162037

with a maximum value of

r3max <- q-1
## [1] 1
  • As with the Cramér’s contingency coefficient, both are low relative to their maximum values.

Observe that both of these coefficients are also scale invariant. E.g., we get the same coefficients in the case of an experiment \(1/10\)th of the size.

## [1] 0.1262748
## [1] 0.0162037

Phi coefficient

In the special case of \(2 \times 2\) tables, written as

  Column 1 Column 2 Totals
Row 1 \(a\) \(b\) \(r_1\)
Row 2 \(c\) \(d\) \(r_2\)
Totals \(c_1\) \(c_2\) \(N\)

the \(T\) statistic takes on a simplified form.

\[ T = \frac{N(ad-bc)^2}{r_1 r_2 c_1 c_2} \]

Since in this case \(\min\{r,c\} = 2\), both Pearson’s mean square coefficient (\(R_3\)) and Cramér’s coefficient (before the square-root, \(R_1\)) reduce to

\[ R_1 = R_3 = \frac{(ad - bc)^2}{r_1 r_2 c_1 c_2} \]

Sometimes this is brought up because it is easer to follow these formulas by hand, but nowadays we have good computing software so that’s a non-issue. However, it can sometimes be meaningful to distinguish between a positive and negative association, and the form above shows how one might do that: by removing the square!

The resulting formula

\[ R_\phi = \frac{ad - bc}{\sqrt{r_1 r_2 c_1 c_2}} \]

is sometimes called the phi-coefficient. And if you look at the Wikipedia page, it is also sometimes called the mean-square contingency coefficient, which can be a source of confusion when comparing to the more general Pearson’s (\(r \times c\)) mean-square contingency coefficient (\(R_3\)), as defined above.

It is easy to see that \(-1 \leq R_\phi \leq 1\), showing both direction and strength of dependence.

Voting preferences concluded

Here, to obtain a \(2 \times 2\) table, we just restrict to the Republican and Democrat columns, ignoring Independent.

Ord <- O[,-3]
rrd <- rowSums(Ord)
crd <- colSums(Ord)
Nrd <- sum(rrd)
tabrd <- rbind(Ord, crd)
tabrd <- cbind(tabrd, c(rrd, Nrd))
colnames(tabrd)[3] <- rownames(tabrd)[3] <- "tot"
##       R   D tot
## M   200 150 350
## F   250 300 550
## tot 450 450 900

Now we can calculate \(R_\phi\).

a <- Ord[1,1]; b <- Ord[1,2]; c <- Ord[2,1]; d <- Ord[2,2]
rphi <- (a*d - b*c)/ sqrt(rrd[1]*rrd[2]*crd[1]*crd[2])
##         M 
## 0.1139606
  • Low positive dependence.


The calculation of \(R_1\) and \(R_2\) are automated in a function assocstats in the vcd library for R.

library(vcd, quietly=TRUE)
##                     X^2 df   P(> X^2)
## Likelihood Ratio 16.266  2 0.00029373
## Pearson          16.204  2 0.00030298
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.126 
## Cramer's V        : 0.127
  • It used to be that this library also returned Pearson’s mean-squared contingency coefficient, \(R_3\), but for some reason it does not (today).
  • If you give it a \(2 \times 2\) table instead, it will report a \(\phi\)-coefficient.
##                     X^2 df   P(> X^2)
## Likelihood Ratio 11.719  1 0.00061862
## Pearson          11.688  1 0.00062894
## Phi-Coefficient   : 0.114 
## Contingency Coeff.: 0.113 
## Cramer's V        : 0.114
  • And this agrees with what we calculated.

Another option for calculating the \(\phi\) coefficient comes from the psych library.

phi(Ord, digits=3)
## [1] 0.114