When is Relative Risk not appropriate?

Relative Risk is only appropriate for prospective studies in which we have a clear definition of the two study groups and observe their outcome over a certain period of time. Having all these data is necessary to calculate the probability of the event in each group. However, when we have a case-control study, this is not valid anymore.

  • In a (retrospective) case control study for detecting cancer-associated genetic markers, we may collect the following data. Note that in this case, the total number in each column is pre-defined by the researcher.
\(\quad\) Have Cancer No Cancer Total
Genotype AA 690 436 1126
Genotype Aa/aa 310 564 874
Total 1000 1000 2000

Why this is not valid for calculating the RR? Because we can arbitorily decide the number of events and non-events. Recall that the definition of RR is

\[\frac{A / (A+B)}{C / (C+D)}\]

\(\quad\) Event No Event Total
Exposed A B A+B
Unexposed C D C+D
Total A+C B+D A+B+C+D

If we are going to double the sample size of the No Event group, this will double the size of both \(B\) and \(D\). Then, this becomes

\[\frac{A / (A+2 \times B)}{C / (C+2 \times D)}\] Which is usually not the same as the original one. Hence, we may want to use a different quantify to define the difference of probably across the groups.

Odds Ratio

For a binary variable with probability \(p\) of being 1, Odds is defined as \(p / (1-p)\). The Odds Ratio is simply the ratio of odds from the two groups (exposed / unexposed). Based on our previous table, we would have

\[\text{OR} = \frac{A / B}{C / D} = \frac{AD}{BC}\] Suppose we again want to double the sample size of the No Event group, we would have

\[\text{OR} = \frac{A / (2 \times B)}{C / (2 \times D)} = \frac{AD}{BC}\] Hence, odds ratio (OR) can be used in the case-control study. We can perform OR using R. We will use this artificial gene cancer association table as an example. Not again that the orientation of the data will only affect the numerical value, but not the conclusion of the significance (think about why?).

  library(epitools)
  
  # we need to specify the data of the first column, then second column
  freqtable = matrix(c(690, 310, 436, 565),nrow = 2, ncol = 2)
  
  # this is for naming the items properly (not a necessary step)
  # the R function will automatically assign them some name if you leave them empty. 
  rownames(freqtable) = c("Treatment", "Control")
  colnames(freqtable) = c("Cured", "Not Cured")  
  
  freqtable
##           Cured Not Cured
## Treatment   690       436
## Control     310       565
  
  # use the odds ratio function
  oddsratio(freqtable)
## $data
##           Cured Not Cured Total
## Treatment   690       436  1126
## Control     310       565   875
## Total      1000      1001  2001
## 
## $measure
##                         NA
## odds ratio with 95% C.I. estimate    lower    upper
##                Treatment 1.000000       NA       NA
##                Control   2.882189 2.401274 3.464739
## 
## $p.value
##            NA
## two-sided   midp.exact fisher.exact   chi.square
##   Treatment         NA           NA           NA
##   Control            0 1.143742e-30 1.820584e-30
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "median-unbiased estimate & mid-p exact CI"

The result has a similar structure as the RR calculation (riskratio). And it is clear that the OR is significantly different from 1. If we would like to get the OR of treatment vs. control, simply take 1/2.882189 = 0.3469585, and the confidence interval would be 1 / (2.401274, 3.464739) = (0.4164456, 0.2886220). You can validate this using the following code:

  oddsratio(freqtable[c(2,1), ])$measure
##                         NA
## odds ratio with 95% C.I.  estimate    lower     upper
##                Control   1.0000000       NA        NA
##                Treatment 0.3469584 0.288622 0.4164455

Example: Vaccine Data

Although the effectiveness of a vaccine is not measured using OR, we can still calculate the quantity based on the observed data, since OR is valid for prospective and retrospective studies.

\(\quad\) Varicella Non-case
Vaccinated 18 134
Unvaccinated 3 4
  Chickenpox = matrix(c(4, 134, 3, 18),nrow = 2, ncol = 2)
  oddsratio(Chickenpox)
## Warning in chisq.test(xx, correct = correction): Chi-squared approximation may be incorrect
## $data
##           Outcome
## Predictor  Disease1 Disease2 Total
##   Exposed1        4        3     7
##   Exposed2      134       18   152
##   Total         138       21   159
## 
## $measure
##           odds ratio with 95% C.I.
## Predictor   estimate      lower    upper
##   Exposed1 1.0000000         NA       NA
##   Exposed2 0.1804134 0.03492272 1.046236
## 
## $p.value
##           two-sided
## Predictor  midp.exact fisher.exact chi.square
##   Exposed1         NA           NA         NA
##   Exposed2 0.05554613   0.04934509 0.01780275
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "median-unbiased estimate & mid-p exact CI"

Practice questions

  1. Among all mentioned studies (randomized trial, case-control and observational), which is valid for Relative Risk and which is valid for Odds Ratio?

  2. From Table 2 in McLaughlin (2003), we can also calculate the OR. Based on this information, can you directly calculate the OR of Left vs. Right (instead of Right vs. Left) and it confidence interval?

  # Please be careful about the table construction
  MT = matrix(c(36, 93, 82, 22),nrow = 2, ncol = 2)
  rownames(MT) = c("Right", "Left")
  colnames(MT) = c("Non-Infected", "Infected")
  MT
##       Non-Infected Infected
## Right           36       82
## Left            93       22
  
  # use the risk ratio function
  oddsratio(MT)$measure
##                         NA
## odds ratio with 95% C.I.  estimate     lower     upper
##                    Right 1.0000000        NA        NA
##                    Left  0.1055596 0.0562827 0.1911488