We use the bmiData
data from the R package DynTxRegime
. The dataset is generated to mimic data from a two-stage randomized clinical trial that studied the effect of meal replacement shakes on adolescent obesity.
CD
refers to conventional diet, and MR
refers to meal replacements.gender
, race
, parentBMI
, and baselineBMI
are observed, and the randomized treatment A1
is assigned to each patient.month4BMI
is observed, and the randomized treatment A2
is assigned to each patient.month12BMI
is observed.\[ \text{Reward} = -100*\frac{\text{month12BMI}-\text{baselineBMI}}{\text{baselineBMI}}. \]
Note that the original Q-learning can also be done by assuming that all the \(R_t\) are zero, and only the last stage reward (the cumulative reward) is used. But here, to be consistent with our lecture, we will calculate an intermediate reward at 4 month.
library(DynTxRegime)
data(bmiData)
head(bmiData)
# Define the cumulative reward over two stages
R <- -100*(bmiData[,6] - bmiData[,4])/bmiData[,4]
# Define the reward at the first stage
R1 <- -100*(bmiData[,5] - bmiData[,4])/bmiData[,4]
# Define the reward at the second stage
R2 <- R - R1
Following our lecture note, the idea of batch Q-learning is a backward induction process. We first fit the model at the second stage, then use the estimated reward as the response to fit the model at the first stage.
At the second stage, we have already observed the results by month 4. Besides the treatment assignment A2
, we also have the covariates parentBMI
, month4BMI
, and Race
. We fit a model to predict the final reward based on these covariates.
mod_sec = lm(R2 ~ parentBMI + month4BMI + A2 + A2*(race + parentBMI + month4BMI),
data=bmiData)
summary(mod_sec)
##
## Call:
## lm(formula = R2 ~ parentBMI + month4BMI + A2 + A2 * (race + parentBMI +
## month4BMI), data = bmiData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.40198 -0.08702 0.00148 0.08869 0.31937
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.778e+00 1.182e-01 15.041 <2e-16 ***
## parentBMI -9.284e-05 2.952e-03 -0.031 0.975
## month4BMI -5.130e-02 4.519e-03 -11.351 <2e-16 ***
## A2MR -3.902e+00 1.690e-01 -23.087 <2e-16 ***
## race 9.858e-03 2.732e-02 0.361 0.719
## A2MR:race -1.601e-02 3.928e-02 -0.407 0.684
## parentBMI:A2MR 6.402e-03 4.377e-03 1.463 0.145
## month4BMI:A2MR 1.041e-01 5.984e-03 17.396 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1373 on 202 degrees of freedom
## Multiple R-squared: 0.7333, Adjusted R-squared: 0.724
## F-statistic: 79.33 on 7 and 202 DF, p-value: < 2.2e-16
Then the next task is to calculate the Q-function values. To do this, we need to predict each patients’ outcome under each treatment option. We can then compare the predicted outcomes and assign the treatment with the highest predicted outcome as the optimal treatment for each patient.
# construct new data for prediction
new_data1 = bmiData
new_data2 = bmiData
new_data1$A2 = rep('CD', dim(bmiData)[1])
new_data2$A2 = rep('MR', dim(bmiData)[1])
# predict the outcome of each patient under each treatment
# using the model fitted previously
qvalue2 = matrix(NA, nrow = dim(bmiData)[1], ncol=2)
qvalue2[,1] = predict(mod_sec, new_data1)
qvalue2[,2] = predict(mod_sec, new_data2)
qvalue2 = data.frame(qvalue2)
# the fitted values
colnames(qvalue2) = c('CD','MR')
head(qvalue2)
# optimal treatment for each patient
tmp = apply(as.matrix(qvalue2), 1, which.max)
optimalTx2 = as.factor(ifelse(tmp == 1, 'CD', ifelse(tmp == 2, 'MR', tmp)))
# the first 6 optimal treatment
optimalTx2[1:6]
## [1] CD MR CD CD CD CD
## Levels: CD MR
We then fit the model at the first stage. At here, we have the covariates parentBMI
, baselineBMI
, Race
, and the treatment assignment A1
. Recall our iterative formula
\[ \theta_t = \argmin_{\theta} \E_n \Big[ \underbrace{R_t + \max_{a_{t+1}} \bQ_{t+1}(\bS_{t+1}, \bA_t, A_{t+1} = a_{t+1}; \theta_{t+1})}_{\text{pseudo outcome at Stage $t$}} - \bQ_t(\bS_t, \bA_t; \theta) \Big]^2 \]
We first compute the best reward based on model from the second stage. Then we add the reward from the first stage, and threat the sum as the response to fit a model using covariates parentBMI
, baselineBMI
, Race
and A1
.
# Calculate the estimated reward based on model from the second stage.
pseudoR = apply(as.matrix(qvalue2),1,max) + R1
# fit the Q model that predicts the pseudo reward at the first stage
mod_fir = lm(pseudoR ~ parentBMI + baselineBMI + A1 + A1:(race + parentBMI + baselineBMI),
data=bmiData)
# summary of the model
summary(mod_fir)
##
## Call:
## lm(formula = pseudoR ~ parentBMI + baselineBMI + A1 + A1:(race +
## parentBMI + baselineBMI), data = bmiData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.4362 -3.9274 0.4742 4.0112 13.7624
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.80152 5.05898 -0.356 0.722135
## parentBMI -0.06477 0.10837 -0.598 0.550731
## baselineBMI 0.32096 0.14482 2.216 0.027784 *
## A1MR 9.96829 7.74214 1.288 0.199381
## A1CD:race -0.41011 1.06335 -0.386 0.700140
## A1MR:race -0.62478 1.07367 -0.582 0.561274
## parentBMI:A1MR -1.25129 0.14316 -8.740 8.99e-16 ***
## baselineBMI:A1MR 0.78765 0.21580 3.650 0.000334 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.371 on 202 degrees of freedom
## Multiple R-squared: 0.5311, Adjusted R-squared: 0.5148
## F-statistic: 32.68 on 7 and 202 DF, p-value: < 2.2e-16
# q-value for each patient
new_data1 = bmiData
new_data2 = bmiData
new_data1$A1 = rep('CD',dim(bmiData)[1])
new_data2$A1 = rep('MR',dim(bmiData)[1])
# predict the outcome of each patient under each treatment
# using the model fitted previously
qvalue1 = matrix(NA, nrow = dim(bmiData)[1], ncol=2)
qvalue1[,1] = predict(mod_fir, new_data1)
qvalue1[,2] = predict(mod_fir, new_data2)
qvalue1 = data.frame(qvalue1)
# the fitted values
colnames(qvalue1)=c('CD','MR')
head(qvalue1)
# optimal treatment for each patient at the first stage
tmp = apply(as.matrix(qvalue1), 1, which.max)
optimalTx1 = as.factor(ifelse(tmp == 1, 'CD', ifelse(tmp == 2, 'MR', tmp)))
optimalTx1[1:6]
## [1] CD MR MR MR MR MR
## Levels: CD MR
# the estimated optimal reward for each patient
Rbest = apply(qvalue1, 1, max)
# comparing the estimated optimal reward with the observed reward
mean(Rbest)
## [1] 9.45116
mean(R)
## [1] 6.455876
DynTxRegime
packageThe above backward induction can be implemented via the R package DynTxRegime
, please see reference manual for more details.
In this code, we consider a slightly different setting, in which all the immediate rewards \(R_t\) are zero and we simply observe the final reward at the last stage.
\[ \text{Reward} = -100*\frac{\text{month12BMI}-\text{baselineBMI}}{\text{baselineBMI}}. \]
This is also a very popular case to run Q-learning, if we do not have or don’t know how to define the immediate rewards at each stage. However, doing so does not change our estimation procedure. We first specify models at the second stage using functions defined in the DynTxRegime
package.
moMain
specifies the main effect covariates (without interaction to the treatment)moCont
specifies covariates with interaction to the treatment at current cageWe then use the qLearn
function to fit the model at the final stage. qLearn
performs a single step of Q-learning algorithm. moMain
and moCont
together determines the covariates of the model, response
refers to the final reward, and txName
refers to the treatment variable name assigned at this stage. We can see that the obtained result is the same as our own model.
library(DynTxRegime)
# define the aggregated reward over two stages
y12 <- -100*(bmiData[,6L] - bmiData[,4L])/bmiData[,4L]
# Specify models at the second stage
moMain <- buildModelObj(model = ~ parentBMI + month4BMI,
solver.method = 'lm') # specify main effect
moCont <- buildModelObj(model = ~ race + parentBMI+month4BMI,
solver.method = 'lm') # specify interaction with treatment
# Second-Stage Analysis
fitSS <- qLearn(moMain = moMain, moCont = moCont,
data = bmiData,
response = y12,
txName = 'A2')
## First step of the Q-Learning Algorithm.
##
## Outcome regression.
## Combined outcome regression model: ~ parentBMI+month4BMI + A2 + A2:(race+parentBMI+month4BMI) .
## Regression analysis for Combined:
##
## Call:
## lm(formula = YinternalY ~ parentBMI + month4BMI + A2 + A2:race +
## parentBMI:A2 + month4BMI:A2, data = data)
##
## Coefficients:
## (Intercept) parentBMI month4BMI A2MR A2CD:race
## 48.04514 -0.35715 -0.83827 -14.54904 -0.25018
## A2MR:race parentBMI:A2MR month4BMI:A2MR
## 1.64329 0.39890 0.02297
##
##
## Recommended Treatments:
## CD MR
## 98 112
##
## Estimated value: 7.54403
show(fitSS)
## Q-Learning: step 1
## Outcome Regression Analysis
## Combined
##
## Call:
## lm(formula = YinternalY ~ parentBMI + month4BMI + A2 + A2:race +
## parentBMI:A2 + month4BMI:A2, data = data)
##
## Coefficients:
## (Intercept) parentBMI month4BMI A2MR A2CD:race
## 48.04514 -0.35715 -0.83827 -14.54904 -0.25018
## A2MR:race parentBMI:A2MR month4BMI:A2MR
## 1.64329 0.39890 0.02297
##
## Recommended Treatments:
## CD MR
## 98 112
##
## Estimated value: 7.54403
# you can extract the optimal treatment and decision function
optTx(fitSS)$optimalTx[1:6]
## [1] MR CD CD CD CD CD
## Levels: CD MR
optTx(fitSS)$decisionFunc[1:6,]
## CD MR
## [1,] 7.818515 8.553214
## [2,] 6.771519 5.095286
## [3,] 8.376248 6.696326
## [4,] 10.965522 8.130337
## [5,] 10.083604 8.743228
## [6,] 10.697010 8.576252
We then fit the model at the first stage. The only difference comparing with the second stage is that we should assign the model we gained from the previous stage fitSS
to response
, and the treatment assigned is A1
here. We can also check that the result we gained at this step is the same as our own model.
# First-stage Analysis
moMain <- buildModelObj(model = ~ parentBMI + baselineBMI,
solver.method = 'lm') # main effect
moCont <- buildModelObj(model = ~ race + parentBMI + baselineBMI,
solver.method = 'lm') # interaction effects
# fit the model at the first stage
fitFS <- qLearn(moMain = moMain, moCont = moCont,
data = bmiData,
response = fitSS,
txName = 'A1')
## Step 2 of the Q-Learning Algorithm.
##
## Outcome regression.
## Combined outcome regression model: ~ parentBMI+baselineBMI + A1 + A1:(race+parentBMI+baselineBMI) .
## Regression analysis for Combined:
##
## Call:
## lm(formula = YinternalY ~ parentBMI + baselineBMI + A1 + A1:race +
## parentBMI:A1 + baselineBMI:A1, data = data)
##
## Coefficients:
## (Intercept) parentBMI baselineBMI A1MR
## 36.73123 -0.08373 -0.68951 2.78926
## A1CD:race A1MR:race parentBMI:A1MR baselineBMI:A1MR
## 0.59410 0.65941 -0.36938 0.23110
##
##
## Recommended Treatments:
## CD MR
## 121 89
##
## Estimated value: 8.374786
optTx(fitFS)$optimalTx[1:6]
## [1] CD MR MR MR MR MR
## Levels: CD MR
optTx(fitFS)$decisionFunc[1:6,]
## CD MR
## [1,] 9.967447 9.433248
## [2,] 8.482970 8.746729
## [3,] 8.794969 8.913004
## [4,] 9.119389 10.236224
## [5,] 11.088749 12.234857
## [6,] 8.981291 9.422195
summary(fitFS)
## $outcome
## $outcome$Combined
##
## Call:
## lm(formula = YinternalY ~ parentBMI + baselineBMI + A1 + A1:race +
## parentBMI:A1 + baselineBMI:A1, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1295 -1.1803 -0.0718 1.3649 4.1127
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.73123 1.74405 21.061 < 2e-16 ***
## parentBMI -0.08373 0.03736 -2.241 0.02609 *
## baselineBMI -0.68951 0.04992 -13.811 < 2e-16 ***
## A1MR 2.78926 2.66905 1.045 0.29725
## A1CD:race 0.59410 0.36658 1.621 0.10666
## A1MR:race 0.65941 0.37014 1.782 0.07633 .
## parentBMI:A1MR -0.36938 0.04935 -7.484 2.17e-12 ***
## baselineBMI:A1MR 0.23110 0.07440 3.106 0.00217 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.851 on 202 degrees of freedom
## Multiple R-squared: 0.768, Adjusted R-squared: 0.7599
## F-statistic: 95.51 on 7 and 202 DF, p-value: < 2.2e-16
##
##
##
## $optTx
## CD MR
## 121 89
##
## $value
## [1] 8.374786