Relative Risk is only appropriate for prospective studies in which we have a clear definition of the two study groups and observe their outcome over a certain period of time. Having all these data is necessary to calculate the probability of the event in each group. However, when we have a case-control study, this is not valid anymore.
\(\quad\) | Have Cancer | No Cancer | Total |
---|---|---|---|
Genotype AA | 690 | 436 | 1126 |
Genotype Aa/aa | 310 | 564 | 874 |
Total | 1000 | 1000 | 2000 |
Why this is not valid for calculating the RR? Because we can arbitorily decide the number of events and non-events. Recall that the definition of RR is
\[\frac{A / (A+B)}{C / (C+D)}\]
\(\quad\) | Event | No Event | Total |
---|---|---|---|
Exposed | A | B | A+B |
Unexposed | C | D | C+D |
Total | A+C | B+D | A+B+C+D |
If we are going to double the sample size of the No Event group, this will double the size of both \(B\) and \(D\). Then, this becomes
\[\frac{A / (A+2 \times B)}{C / (C+2 \times D)}\] Which is usually not the same as the original one. Hence, we may want to use a different quantify to define the difference of probably across the groups.
For a binary variable with probability \(p\) of being 1, Odds is defined as \(p / (1-p)\). The Odds Ratio is simply the ratio of odds from the two groups (exposed / unexposed). Based on our previous table, we would have
\[\text{OR} = \frac{A / B}{C / D} = \frac{AD}{BC}\] Suppose we again want to double the sample size of the No Event group, we would have
\[\text{OR} = \frac{A / (2 \times B)}{C / (2 \times D)} = \frac{AD}{BC}\] Hence, odds ratio (OR) can be used in the case-control study. We can perform OR using R. We will use this artificial gene cancer association table as an example. Not again that the orientation of the data will only affect the numerical value, but not the conclusion of the significance (think about why?).
library(epitools)
# we need to specify the data of the first column, then second column
freqtable = matrix(c(690, 310, 436, 565),nrow = 2, ncol = 2)
# this is for naming the items properly (not a necessary step)
# the R function will automatically assign them some name if you leave them empty.
rownames(freqtable) = c("Treatment", "Control")
colnames(freqtable) = c("Cured", "Not Cured")
freqtable
## Cured Not Cured
## Treatment 690 436
## Control 310 565
# use the odds ratio function
oddsratio(freqtable)
## $data
## Cured Not Cured Total
## Treatment 690 436 1126
## Control 310 565 875
## Total 1000 1001 2001
##
## $measure
## NA
## odds ratio with 95% C.I. estimate lower upper
## Treatment 1.000000 NA NA
## Control 2.882189 2.401274 3.464739
##
## $p.value
## NA
## two-sided midp.exact fisher.exact chi.square
## Treatment NA NA NA
## Control 0 1.143742e-30 1.820584e-30
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "median-unbiased estimate & mid-p exact CI"
The result has a similar structure as the RR calculation
(riskratio
). And it is clear that the OR is significantly
different from 1. If we would like to get the OR of treatment
vs. control, simply take 1/2.882189 = 0.3469585, and the confidence
interval would be 1 / (2.401274, 3.464739) = (0.4164456, 0.2886220). You
can validate this using the following code:
oddsratio(freqtable[c(2,1), ])$measure
## NA
## odds ratio with 95% C.I. estimate lower upper
## Control 1.0000000 NA NA
## Treatment 0.3469584 0.288622 0.4164455
Although the effectiveness of a vaccine is not measured using OR, we can still calculate the quantity based on the observed data, since OR is valid for prospective and retrospective studies.
\(\quad\) | Varicella | Non-case |
---|---|---|
Vaccinated | 18 | 134 |
Unvaccinated | 3 | 4 |
Chickenpox = matrix(c(4, 134, 3, 18),nrow = 2, ncol = 2)
oddsratio(Chickenpox)
## Warning in chisq.test(xx, correct = correction): Chi-squared approximation may be incorrect
## $data
## Outcome
## Predictor Disease1 Disease2 Total
## Exposed1 4 3 7
## Exposed2 134 18 152
## Total 138 21 159
##
## $measure
## odds ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed1 1.0000000 NA NA
## Exposed2 0.1804134 0.03492272 1.046236
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed1 NA NA NA
## Exposed2 0.05554613 0.04934509 0.01780275
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "median-unbiased estimate & mid-p exact CI"
Among all mentioned studies (randomized trial, case-control and observational), which is valid for Relative Risk and which is valid for Odds Ratio?
From Table 2 in McLaughlin (2003), we can also calculate the OR. Based on this information, can you directly calculate the OR of Left vs. Right (instead of Right vs. Left) and it confidence interval?
# Please be careful about the table construction
MT = matrix(c(36, 93, 82, 22),nrow = 2, ncol = 2)
rownames(MT) = c("Right", "Left")
colnames(MT) = c("Non-Infected", "Infected")
MT
## Non-Infected Infected
## Right 36 82
## Left 93 22
# use the risk ratio function
oddsratio(MT)$measure
## NA
## odds ratio with 95% C.I. estimate lower upper
## Right 1.0000000 NA NA
## Left 0.1055596 0.0562827 0.1911488