Contingency tables are commonly used to summarize frequencies of two or more categorical variables. The are many situations where you can utilize a contingency table. For example,
\(\quad\) | Cured | No Cured | Total |
---|---|---|---|
Treatment | 830 | 170 | 1000 |
Control | 640 | 360 | 1000 |
Total | 1470 | 530 | 2000 |
\(\quad\) | Have Cancer | No Cancer | Total |
---|---|---|---|
Genotype AA | 690 | 436 | 1126 |
Genotype Aa/aa | 310 | 564 | 874 |
Total | 1000 | 1000 | 2000 |
\(\quad\) | Trisomy | No Trisomy | Total |
---|---|---|---|
Test Positive | 15 | 68 | 83 |
Test Negative | 12 | 879 | 891 |
Total | 27 | 947 | 974 |
In all of these examples, we may be interested in whether the condition (row) is associated with the outcome (column). However, be very careful that the design of these studies are different, which may not allow some quantities being calculated. In particular, we are interested in two quantities: the Relative Risk and the Odds Ratio.
The relative risk is defined as the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. It is also called Risk Ratio. And often times, RR is used.
\(\quad\) | Event | No Event | Total |
---|---|---|---|
Exposed | A | B | A+B |
Unexposed | C | D | C+D |
Total | A+C | B+D | A+B+C+D |
The RR can be calculated as
\[\frac{A / (A+B)}{C / (C+D)}\] If RR is significantly different from 1 (the risks from both groups are the same), then we can conclude that the group is a significant factor. It is very important to know that RR is only valid for a prospective study, meaning that the samples in the exposed and unexposed groups are defined first, then the event are observed later. This is reasonable because otherwise, \(A / (A+B)\) cannot be interpreted as the probability of the event in a given group.
Let’s use our previous artificial data as an example. The two risks
are 830 / 1000 and 640 / 1000, making RR 1.296875. We will use the
R
package “epitools”. Note that this is an example when you
do not have the original data, but only the summary frequency table.
Also, be very careful that when specifying the table using
matrix()
function, R
requires the input
column-wise.
library(epitools)
# we need to specify the data of the first column, then second column
freqtable = matrix(c(830, 640, 170, 360),nrow = 2, ncol = 2)
# this is for naming the items properly (not a necessary step)
# the R function will automatically assign them some name if you leave them empty.
rownames(freqtable) = c("Treatment", "Control")
colnames(freqtable) = c("Cured", "Not Cured")
freqtable
## Cured Not Cured
## Treatment 830 170
## Control 640 360
# use the risk ratio function
riskratio(freqtable)
## $data
## Cured Not Cured Total
## Treatment 830 170 1000
## Control 640 360 1000
## Total 1470 530 2000
##
## $measure
## NA
## risk ratio with 95% C.I. estimate lower upper
## Treatment 1.000000 NA NA
## Control 2.117647 1.804627 2.484962
##
## $p.value
## NA
## two-sided midp.exact fisher.exact chi.square
## Treatment NA NA NA
## Control 0 4.597469e-22 6.175089e-22
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
To understand the results:
data
simply restate the data and the totals for rows
and columnsmeasure
provides the risk ratio calculation. the
estimate
column is the estimated RR, with
lower
and upper
as the 95% confidence
interval. Note that the result uses the first row as a reference
group. Hence, the the second row Control
represent
the RR of control vs. treatment. And we can see the estimated RR is
2.118 with a confidence interval (1.8046, 2.4849). Since this interval
does not include, we know that the risks are significantly different.
But we noticed that this is different from our own calculation 1.296875.
This is because the function requires the Event to be specified
in the second column of the data. Based on the construction of
freqtable
, the risks (of uncured) are 360 / 1000 and 170 /
1000, making the risk ratio 2.118.p.value
provides significance from three different test
statistics: mid-p
, fisher's exact
and
chi square
. In our case, since the sample size is very
large, they should provide similar results. When the sample size is very
small, fisher's exact
should be used.Now we re-organize the freqtable
so that it is properly
orientated to calculate the quantities we are interested in. Now this
results is the calculating the RR of the event, and it is the RR of the
treatment vs. control instead of the other way around. The
conclusion of significance will not be any different
since it is essentially the same data.
# switch columns
freqtable = freqtable[, c(2,1)]
# switch rows
freqtable = freqtable[c(2,1),]
riskratio(freqtable)
## $data
## Not Cured Cured Total
## Control 360 640 1000
## Treatment 170 830 1000
## Total 530 1470 2000
##
## $measure
## NA
## risk ratio with 95% C.I. estimate lower upper
## Control 1.000000 NA NA
## Treatment 1.296875 1.228342 1.369231
##
## $p.value
## NA
## two-sided midp.exact fisher.exact chi.square
## Control NA NA NA
## Treatment 0 4.597469e-22 6.175089e-22
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
A chickenpox outbreak started in an Oregon elementary school in October 2001. Tugwell et al. (2021) investigated students who were at the risk of chickenpox prior the event and separated the subjects into vaccinated and unvaccinated groups. This study data were also used in the CDC Principles of Epidemiology Guild. The following data were observed:
\(\quad\) | Varicella | Non-case |
---|---|---|
Vaccinated | 18 | 134 |
Unvaccinated | 3 | 4 |
What is the effectiveness of the vaccine? Note that the definition of
the effect size of a vaccine is defined as 1 - RR.
Hence, let’s focus on calculating the RR and its confidence interval and
significance. We need to convert this to the R
data:
# Please be careful about the table construction
Chickenpox = matrix(c(4, 134, 3, 18),nrow = 2, ncol = 2)
rownames(Chickenpox) = c("Unvaccinated", "Vaccinated")
colnames(Chickenpox) = c("Non-case", "Varicella")
# use the risk ratio function
riskratio(Chickenpox)
## Warning in chisq.test(xx, correct = correction): Chi-squared approximation may be incorrect
## $data
## Non-case Varicella Total
## Unvaccinated 4 3 7
## Vaccinated 134 18 152
## Total 138 21 159
##
## $measure
## NA
## risk ratio with 95% C.I. estimate lower upper
## Unvaccinated 1.0000000 NA NA
## Vaccinated 0.2763158 0.105896 0.7209945
##
## $p.value
## NA
## two-sided midp.exact fisher.exact chi.square
## Unvaccinated NA NA NA
## Vaccinated 0.05554613 0.04934509 0.01780275
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
Hence the estimated RR is 27.6%, with a 95% confidence interval of (10.6%, 72.1%). This is significantly different from 1, with p-value 0.049. Note that, we use the Fisher’s exact test since the sample size is relatively small. The effect size of the vaccine is 1 - 27.6% = 72.4% with confidence interval (27.9%, 89.4%).
\(\quad\) | Infected | Not Infected |
---|---|---|
Right Dormitory | 82 | 36 |
Left Dormitory | 22 | 93 |
From this table, replicate their results of RR (right against left) and confidence interval.
# Please be careful about the table construction
MT = matrix(c(93, 36, 22, 82),nrow = 2, ncol = 2)
rownames(MT) = c("Left", "Right")
colnames(MT) = c("Non-Infected", "Infected")
# use the risk ratio function
riskratio(MT)
table()
function. Try the RR calculation using the
following data. However, be careful about your interpretation of the
result due to the table orientation. newdata = data.frame("Infected" = rbinom(100, 1, prob = 0.3),
"Vaccinated" = rbinom(100, 1, prob = 0.5))
datatable = table("Vaccinated"= newdata$Vaccinated, "Infected" = newdata$Infected)
riskratio(datatable)