Statistics in Population Health: Relative Risk

When is Relative Risk not appropriate?

Relative Risk is only appropriate for prospective studies in which we have a clear definition of the two study groups and observe their outcome over a certain period of time. Having all these data is necessary to calculate the probability of the event in each group. However, when we have a case-control study, this is not valid anymore.

In a (retrospective) case control study for detecting cancer-associated genetic markers, we may collect the following data. Note that in this case, the total number in each column is pre-defined by the researcher.

\(\quad\)	Have Cancer	No Cancer	Total
Genotype AA	690	436	1126
Genotype Aa/aa	310	564	874
Total	1000	1000	2000

Why this is not valid for calculating the RR? Because we can arbitorily decide the number of events and non-events. Recall that the definition of RR is

\[\frac{A / (A+B)}{C / (C+D)}\]

\(\quad\)	Event	No Event	Total
Exposed	A	B	A+B
Unexposed	C	D	C+D
Total	A+C	B+D	A+B+C+D

If we are going to double the sample size of the No Event group, this will double the size of both \(B\) and \(D\). Then, this becomes

\[\frac{A / (A+2 \times B)}{C / (C+2 \times D)}\] Which is usually not the same as the original one. Hence, we may want to use a different quantify to define the difference of probably across the groups.

Odds Ratio

For a binary variable with probability \(p\) of being 1, Odds is defined as \(p / (1-p)\). The Odds Ratio is simply the ratio of odds from the two groups (exposed / unexposed). Based on our previous table, we would have

\[\text{OR} = \frac{A / B}{C / D} = \frac{AD}{BC}\] Suppose we again want to double the sample size of the No Event group, we would have

\[\text{OR} = \frac{A / (2 \times B)}{C / (2 \times D)} = \frac{AD}{BC}\] Hence, odds ratio (OR) can be used in the case-control study. We can perform OR using R. We will use this artificial gene cancer association table as an example. Not again that the orientation of the data will only affect the numerical value, but not the conclusion of the significance (think about why?).

  library(epitools)
  
  # we need to specify the data of the first column, then second column
  freqtable = matrix(c(690, 310, 436, 565),nrow = 2, ncol = 2)
  
  # this is for naming the items properly (not a necessary step)
  # the R function will automatically assign them some name if you leave them empty. 
  rownames(freqtable) = c("Treatment", "Control")
  colnames(freqtable) = c("Cured", "Not Cured")  
  
  freqtable
##           Cured Not Cured
## Treatment   690       436
## Control     310       565
  
  # use the odds ratio function
  oddsratio(freqtable)
## $data
##           Cured Not Cured Total
## Treatment   690       436  1126
## Control     310       565   875
## Total      1000      1001  2001
## 
## $measure
##                         NA
## odds ratio with 95% C.I. estimate    lower    upper
##                Treatment 1.000000       NA       NA
##                Control   2.882189 2.401274 3.464739
## 
## $p.value
##            NA
## two-sided   midp.exact fisher.exact   chi.square
##   Treatment         NA           NA           NA
##   Control            0 1.143742e-30 1.820584e-30
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "median-unbiased estimate & mid-p exact CI"

The result has a similar structure as the RR calculation (riskratio). And it is clear that the OR is significantly different from 1. If we would like to get the OR of treatment vs. control, simply take 1/2.882189 = 0.3469585, and the confidence interval would be 1 / (2.401274, 3.464739) = (0.4164456, 0.2886220). You can validate this using the following code:

  oddsratio(freqtable[c(2,1), ])$measure
##                         NA
## odds ratio with 95% C.I.  estimate    lower     upper
##                Control   1.0000000       NA        NA
##                Treatment 0.3469584 0.288622 0.4164455

Example: Vaccine Data

Although the effectiveness of a vaccine is not measured using OR, we can still calculate the quantity based on the observed data, since OR is valid for prospective and retrospective studies.

\(\quad\)	Varicella	Non-case
Vaccinated	18	134
Unvaccinated	3	4

  Chickenpox = matrix(c(4, 134, 3, 18),nrow = 2, ncol = 2)
  oddsratio(Chickenpox)
## Warning in chisq.test(xx, correct = correction): Chi-squared approximation may be incorrect
## $data
##           Outcome
## Predictor  Disease1 Disease2 Total
##   Exposed1        4        3     7
##   Exposed2      134       18   152
##   Total         138       21   159
## 
## $measure
##           odds ratio with 95% C.I.
## Predictor   estimate      lower    upper
##   Exposed1 1.0000000         NA       NA
##   Exposed2 0.1804134 0.03492272 1.046236
## 
## $p.value
##           two-sided
## Predictor  midp.exact fisher.exact chi.square
##   Exposed1         NA           NA         NA
##   Exposed2 0.05554613   0.04934509 0.01780275
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "median-unbiased estimate & mid-p exact CI"

Practice questions

Among all mentioned studies (randomized trial, case-control and observational), which is valid for Relative Risk and which is valid for Odds Ratio?
From Table 2 in McLaughlin (2003), we can also calculate the OR. Based on this information, can you directly calculate the OR of Left vs. Right (instead of Right vs. Left) and it confidence interval?

  # Please be careful about the table construction
  MT = matrix(c(36, 93, 82, 22),nrow = 2, ncol = 2)
  rownames(MT) = c("Right", "Left")
  colnames(MT) = c("Non-Infected", "Infected")
  MT
##       Non-Infected Infected
## Right           36       82
## Left            93       22
  
  # use the risk ratio function
  oddsratio(MT)$measure
##                         NA
## odds ratio with 95% C.I.  estimate     lower     upper
##                    Right 1.0000000        NA        NA
##                    Left  0.1055596 0.0562827 0.1911488

Statistics in Population Health: Relative Risk

Ruoqing Zhu

Last Updated: July 19, 2022

When is Relative Risk not appropriate?

Odds Ratio

Example: Vaccine Data

Practice questions