\(\newcommand{\ci}{\perp\!\!\!\perp}\) \(\newcommand{\cA}{\mathcal{A}}\) \(\newcommand{\cB}{\mathcal{B}}\) \(\newcommand{\cC}{\mathcal{C}}\) \(\newcommand{\cD}{\mathcal{D}}\) \(\newcommand{\cE}{\mathcal{E}}\) \(\newcommand{\cF}{\mathcal{F}}\) \(\newcommand{\cG}{\mathcal{G}}\) \(\newcommand{\cH}{\mathcal{H}}\) \(\newcommand{\cI}{\mathcal{I}}\) \(\newcommand{\cJ}{\mathcal{J}}\) \(\newcommand{\cK}{\mathcal{K}}\) \(\newcommand{\cL}{\mathcal{L}}\) \(\newcommand{\cM}{\mathcal{M}}\) \(\newcommand{\cN}{\mathcal{N}}\) \(\newcommand{\cO}{\mathcal{O}}\) \(\newcommand{\cP}{\mathcal{P}}\) \(\newcommand{\cQ}{\mathcal{Q}}\) \(\newcommand{\cR}{\mathcal{R}}\) \(\newcommand{\cS}{\mathcal{S}}\) \(\newcommand{\cT}{\mathcal{T}}\) \(\newcommand{\cU}{\mathcal{U}}\) \(\newcommand{\cV}{\mathcal{V}}\) \(\newcommand{\cW}{\mathcal{W}}\) \(\newcommand{\cX}{\mathcal{X}}\) \(\newcommand{\cY}{\mathcal{Y}}\) \(\newcommand{\cZ}{\mathcal{Z}}\) \(\newcommand{\bA}{\mathbf{A}}\) \(\newcommand{\bB}{\mathbf{B}}\) \(\newcommand{\bC}{\mathbf{C}}\) \(\newcommand{\bD}{\mathbf{D}}\) \(\newcommand{\bE}{\mathbf{E}}\) \(\newcommand{\bF}{\mathbf{F}}\) \(\newcommand{\bG}{\mathbf{G}}\) \(\newcommand{\bH}{\mathbf{H}}\) \(\newcommand{\bI}{\mathbf{I}}\) \(\newcommand{\bJ}{\mathbf{J}}\) \(\newcommand{\bK}{\mathbf{K}}\) \(\newcommand{\bL}{\mathbf{L}}\) \(\newcommand{\bM}{\mathbf{M}}\) \(\newcommand{\bN}{\mathbf{N}}\) \(\newcommand{\bO}{\mathbf{O}}\) \(\newcommand{\bP}{\mathbf{P}}\) \(\newcommand{\bQ}{\mathbf{Q}}\) \(\newcommand{\bR}{\mathbf{R}}\) \(\newcommand{\bS}{\mathbf{S}}\) \(\newcommand{\bT}{\mathbf{T}}\) \(\newcommand{\bU}{\mathbf{U}}\) \(\newcommand{\bV}{\mathbf{V}}\) \(\newcommand{\bW}{\mathbf{W}}\) \(\newcommand{\bX}{\mathbf{X}}\) \(\newcommand{\bY}{\mathbf{Y}}\) \(\newcommand{\bZ}{\mathbf{Z}}\) \(\newcommand{\ba}{\mathbf{a}}\) \(\newcommand{\bb}{\mathbf{b}}\) \(\newcommand{\bc}{\mathbf{c}}\) \(\newcommand{\bd}{\mathbf{d}}\) \(\newcommand{\be}{\mathbf{e}}\) \(\newcommand{\bg}{\mathbf{g}}\) \(\newcommand{\bh}{\mathbf{h}}\) \(\newcommand{\bi}{\mathbf{i}}\) \(\newcommand{\bj}{\mathbf{j}}\) \(\newcommand{\bk}{\mathbf{k}}\) \(\newcommand{\bl}{\mathbf{l}}\) \(\newcommand{\bm}{\mathbf{m}}\) \(\newcommand{\bn}{\mathbf{n}}\) \(\newcommand{\bo}{\mathbf{o}}\) \(\newcommand{\bp}{\mathbf{p}}\) \(\newcommand{\bq}{\mathbf{q}}\) \(\newcommand{\br}{\mathbf{r}}\) \(\newcommand{\bs}{\mathbf{s}}\) \(\newcommand{\bt}{\mathbf{t}}\) \(\newcommand{\bu}{\mathbf{u}}\) \(\newcommand{\bv}{\mathbf{v}}\) \(\newcommand{\bw}{\mathbf{w}}\) \(\newcommand{\bx}{\mathbf{x}}\) \(\newcommand{\by}{\mathbf{y}}\) \(\newcommand{\bz}{\mathbf{z}}\) \(\newcommand{\balpha}{\boldsymbol{\alpha}}\) \(\newcommand{\bbeta}{\boldsymbol{\beta}}\) \(\newcommand{\btheta}{\boldsymbol{\theta}}\) \(\newcommand{\bpi}{\boldsymbol{\pi}}\) \(\newcommand{\hbpi}{\widehat{\boldsymbol{\pi}}}\) \(\newcommand{\bxi}{\boldsymbol{\xi}}\) \(\newcommand{\bmu}{\boldsymbol{\mu}}\) \(\newcommand{\bepsilon}{\boldsymbol{\epsilon}}\) \(\newcommand{\bzero}{\mathbf{0}}\) \(\newcommand{\T}{\text{T}}\) \(\newcommand{\Trace}{\text{Trace}}\) \(\newcommand{\Cov}{\text{Cov}}\) \(\newcommand{\Var}{\text{Var}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\Pr}{\text{Pr}}\) \(\newcommand{\pr}{\text{pr}}\) \(\newcommand{\pdf}{\text{pdf}}\) \(\newcommand{\P}{\text{P}}\) \(\newcommand{\p}{\text{p}}\) \(\newcommand{\One}{\mathbf{1}}\) \(\newcommand{\argmin}{\operatorname*{arg\,min}}\) \(\newcommand{\argmax}{\operatorname*{arg\,max}}\)

Data Processing

We use the same bmiData dataset for this example. Difference from Q-learning, Backwards Outcome Weighted Learning (BOWL) requires immediate reward at each decision point (in contrast to only the final rewards for Q-learning). Therefore, we define the immediate reward at each stage as the BMI percentage decrease from the previous stage.

  library(DynTxRegime)
## Loading required package: modelObj
  # Load and process data set
  data(bmiData)
  # define the negative 12 month change in BMI from baseline
  y12 <- -100*(bmiData[,6] - bmiData[,4])/bmiData[,4]
  # define the negative 4 month change in BMI from baseline
  rewardFS <- -100*(bmiData[,5] - bmiData[,4])/bmiData[,4]
  # reward for second stage
  rewardSS <- -100*(bmiData[,6] - bmiData[,5])/bmiData[,4]

Second Stage Analysis

We first fit the model at the second stage, moPropen is used to estimate \(\pi_2(A_j,H_j)=P(A_j=1|H=H_j)\). Since the dataset comes from a clinical trial, we fit a constant propensity score model to estimate \(\pi_2\).

We then proceed to fit the model for the second stage, where we first fill in the estimated propensity score gained from moPropen, and treat parentBMI, month4BMI as covariates, and rewardSS as the immediate reward at the second stage. surrogate specifies the surrogate 0-1 loss function, which can be chosen from ‘logit’, ‘exp’, ‘hinge’, etc. kernel specifies the kernel selected in solving the weighted SVM problem, where we can choose from linear, polynomial and radial kernels. Note that we can also select tunning parameters (i.e., scale of the penalty term and bandwidth of kernel) using cross-validation in the bowl function. See reference manual for more details.

  #### Second-stage regression
  # Constant propensity model
  moPropen <- buildModelObj(model =~1,
  solver.method = 'glm',
  solver.args = list('family'='binomial'),
  predict.method = 'predict.glm',
  predict.args = list(type='response'))
  
  fitSS <- bowl(moPropen = moPropen,data = bmiData, reward = rewardSS, txName = 'A2',surrogate='hinge',kernel='linear',
  regime = ~ parentBMI + month4BMI)
  # Value object returned by propensity score regression method
  summary(propen(fitSS))
## 
## Call:
## glm(formula = YinternalY ~ 1, family = "binomial", data = data)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.177  -1.177   0.000   1.177   1.177  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)    0.000      0.138       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 291.12  on 209  degrees of freedom
## Residual deviance: 291.12  on 209  degrees of freedom
## AIC: 293.12
## 
## Number of Fisher Scoring iterations: 2
  # Value object returned by propensity score regression method
  fitObject(fitSS)
## $propensity
## 
## Call:  glm(formula = YinternalY ~ 1, family = "binomial", data = data)
## 
## Coefficients:
## (Intercept)  
##           0  
## 
## Degrees of Freedom: 209 Total (i.e. Null);  209 Residual
## Null Deviance:       291.1 
## Residual Deviance: 291.1     AIC: 293.1
  # Estimated optimal treatment for training data
  optTx(fitSS)$optimalTx[1:10]
##  [1] "CD" "MR" "CD" "CD" "CD" "CD" "CD" "CD" "CD" "CD"
  optTx(fitSS)$decisionFunc[1:10]
##  [1] -0.6400218  0.9054282 -0.4962320 -1.8470729 -0.9782030 -2.1782005
##  [7] -1.8268140 -1.2021232 -0.1627973 -3.0233377
  # Estimated optimal treatment for new data
  optTx(fitSS, bmiData)$optimalTx[1:10]
##  [1] CD MR CD CD CD CD CD CD CD CD
## Levels: CD MR
  # Parameter estimates for decision function
  regimeCoef(fitSS)
## [1] -25.019351239  -0.003500141   0.715511239
  # Show summary results of method
  summary(fitSS)
## $propensity
## 
## Call:
## glm(formula = YinternalY ~ 1, family = "binomial", data = data)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.177  -1.177   0.000   1.177   1.177  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)    0.000      0.138       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 291.12  on 209  degrees of freedom
## Residual deviance: 291.12  on 209  degrees of freedom
## AIC: 293.12
## 
## Number of Fisher Scoring iterations: 2
## 
## 
## $outcome
## [1] NA
## 
## $optim
## $optim$par
## [1] -25.019351239  -0.003500141   0.715511239
## 
## $optim$convergence
## [1] 0
## 
## $optim$primal
##   [1] 5.073564e-01 4.145508e-02 7.827498e-02 4.207971e-02 7.170820e-02
##   [6] 6.649226e-10 2.860356e-01 4.438675e-09 4.181620e-02 3.017986e-10
##  [11] 5.749866e-11 5.268453e-10 1.132211e-09 2.395636e-10 4.430248e-11
##  [16] 3.311505e-09 3.579927e-09 2.956539e-02 5.250161e-11 1.077683e-09
##  [21] 7.027733e-01 8.355553e-02 7.860901e-02 8.967158e-11 3.522426e-10
##  [26] 3.732178e-02 3.871641e-01 2.080689e-02 4.503060e-10 3.103633e-10
##  [31] 5.767089e-11 4.674901e-10 4.097952e-10 4.577969e-10 9.118967e-02
##  [36] 2.937429e-01 1.124543e-10 1.788083e-10 1.030778e-08 1.134554e-10
##  [41] 3.565151e-10 2.962889e-01 1.962633e-03 4.394564e-10 1.013497e-08
##  [46] 6.107631e-01 7.683128e-11 6.502251e-11 6.570406e-10 7.443372e-10
##  [51] 7.064316e-01 5.316705e-01 1.413925e-09 4.304185e-11 6.207124e-01
##  [56] 1.968139e-10 5.359933e-10 1.004599e-09 4.581818e-10 7.891880e-10
##  [61] 4.468420e-10 9.546624e-10 1.943628e-10 1.279521e-10 5.439040e-01
##  [66] 1.039111e-09 2.419617e-10 5.009482e-10 2.100690e-10 4.544968e-10
##  [71] 6.957380e-10 1.232245e+00 3.321635e-10 1.409676e-10 5.582670e-10
##  [76] 1.636784e-01 8.254066e-02 1.024641e+00 5.865755e-01 2.882309e-09
##  [81] 1.900542e-10 5.615334e-10 2.664041e-10 5.292221e-01 2.825472e-10
##  [86] 2.814455e-01 6.273725e-10 6.543033e-10 2.808327e-10 3.946223e-01
##  [91] 1.653484e-10 5.477319e-10 8.333694e-11 5.138660e-01 1.489704e-09
##  [96] 4.923285e-01 6.251358e-01 1.015599e-10 2.092713e-10 3.083729e-10
## [101] 7.866167e-10 1.241218e-01 6.479165e-10 1.169472e+00 1.477159e-09
## [106] 6.951857e-10 1.458573e-10 4.399648e-11 2.153222e-10 5.086849e-10
## [111] 2.835800e-01 3.533252e-11 4.106247e-10 7.137385e-10 3.932829e-10
## [116] 1.325054e-09 2.574529e-10 2.731328e-02 2.856766e-01 5.358750e-01
## [121] 1.571512e-10 7.029169e-01 2.011908e-10 1.120904e-10 4.086537e-01
## [126] 1.121950e-09 1.879271e-10 2.446637e-01 2.702194e-10 2.777686e-10
## [131] 5.466790e-01 7.318744e-10 5.268547e-01 1.105102e-09 9.174907e-10
## [136] 8.585966e-10 3.176628e-10 3.007613e-11 6.305671e-10 3.225260e-11
## [141] 6.256892e-01 7.493946e-13 2.041249e-11 6.824285e-09 1.224050e-08
## [146] 4.852118e-10 8.389825e-10 8.067577e-11 1.695781e-09 3.868593e-02
## [151] 5.717661e-11 5.614777e-11 9.570607e-10 4.917331e-10 1.456901e-10
## [156] 1.906993e-10 2.253053e-09 5.336779e-08 9.038503e-10 4.008922e-10
## [161] 3.274602e-11 6.992749e-02 7.944668e-11 2.380440e-02 3.110805e-01
## [166] 1.108148e+00 1.371072e-10 5.868567e-01 9.938146e-11 2.914025e-10
## [171] 7.700556e-10 1.868332e-11 6.493064e-01 8.874167e-10 3.191684e-10
## [176] 4.402337e-10 4.931493e-10 4.954045e-10 5.070981e-10 4.569233e-10
## [181] 6.580895e-11 3.911502e-02 1.218280e-09 2.295520e-02 2.013072e-01
## [186] 1.465011e-09 4.525506e-01 1.209763e-09 1.039583e-09 8.688905e-01
## [191] 3.999977e-09 2.833162e-10 2.600771e-11 6.437644e-01 2.703993e-10
## [196] 7.699934e-04 2.005464e-09 9.023960e-12 2.083989e-01 1.126962e-10
## [201] 3.715691e-10 6.075642e-10 7.587926e-10 6.234873e-11 8.205569e-01
## [206] 5.171291e-01 9.429150e-11 1.900830e-10 6.427163e-10 1.814079e-09
## 
## $optim$dual
## [1] 25.01935
## 
## $optim$how
## [1] "converged"
## 
## $optim$lambda
## [1] 2
## 
## $optim$surrogate
## [1] "HingeSurrogate"
## 
## $optim$kernel
## [1] "linear"
## 
## $optim$kernelModel
## ~parentBMI + month4BMI - 1
## 
## 
## $optTx
##  CD  MR 
## 101 109 
## 
## $value
## [1] 0.1334694

First Stage Analysis

We then fit the model for the first stage. We use cross-validation to select the tuning parameter \(\lambda\) here.

  #### First-stage regression
  fitFS <- bowl(moPropen = moPropen,
  data = bmiData, reward = rewardFS, txName = 'A1',
  regime = ~ gender + parentBMI,
  BOWLObj = fitSS, lambdas = c(0.5, 2.0), cvFolds = 3)
  summary(fitFS)
## $propensity
## 
## Call:
## glm(formula = YinternalY ~ 1, family = "binomial", data = data)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.145  -1.145  -1.145   1.210   1.210  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.07623    0.13811  -0.552    0.581
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 290.82  on 209  degrees of freedom
## Residual deviance: 290.82  on 209  degrees of freedom
## AIC: 292.82
## 
## Number of Fisher Scoring iterations: 3
## 
## 
## $outcome
## [1] NA
## 
## $cvInfo
##      0.5        2 
## 19.03031 19.03031 
## 
## $optim
## $optim$par
## [1]  5.1227826  0.4836900 -0.1757596
## 
## $optim$convergence
## [1] 0
## 
## $optim$primal
##  [1] 4.751206e+00 1.368356e+01 2.195407e+01 6.119015e+00 1.898289e+01
##  [6] 3.910764e+00 1.294954e+01 1.096445e-06 8.481638e+00 2.685032e+01
## [11] 1.020338e+01 1.690145e+01 1.704372e+01 2.318597e+01 2.366513e+01
## [16] 1.696050e+01 6.860285e+00 2.529598e+01 1.493880e+01 1.746045e+01
## [21] 5.612182e-01 6.281782e+00 3.403030e+01 2.843757e+01 2.912612e+01
## [26] 3.111511e+01 1.234018e+01 2.754619e+01 1.101230e+01 3.443064e+01
## [31] 9.784252e+00 2.918273e+01 1.608592e+01 3.075954e+01 2.681805e+01
## [36] 8.054330e-07 3.080011e+00 5.848288e+00 8.069679e+00 1.529948e+01
## [41] 4.279468e+01 1.549430e+01 9.055217e+00 2.270853e-06 6.600296e+00
## [46] 5.946123e+00 9.418748e+00 2.417117e-01 2.126060e+01 2.519616e+01
## [51] 1.173683e+01 1.657749e-07 5.372867e-07 1.816776e+01 1.578392e-07
## [56] 2.248644e+01 2.042593e+01 1.766385e-07 6.869425e+00 1.586976e+01
## [61] 1.022026e+01 9.963288e-08 5.427320e-07 1.435757e-07 1.486555e-07
## [66] 1.910711e-07 1.194384e-07 5.900898e-07 3.622585e+01 7.947419e-07
## [71] 2.101276e+01 2.096299e-07 2.163136e-07 8.009782e-08 1.405454e+01
## [76] 3.138074e+01 4.098871e-07 2.159630e+01 1.881424e+00 4.620934e+00
## [81] 4.047320e+00 1.454811e-06 5.270578e-07 4.991825e+01 2.425881e-07
## [86] 9.088186e+00 2.298398e+01 4.385076e+00 2.015365e+00 3.160904e-07
## [91] 1.064321e+00 1.038251e-07 1.896968e+01 8.214160e+00 3.308658e+01
## [96] 1.965011e+01 2.686732e+01 1.329212e+00
## 
## $optim$dual
## [1] -5.122783
## 
## $optim$how
## [1] "converged"
## 
## $optim$lambda
## [1] 0.5
## 
## $optim$surrogate
## [1] "HingeSurrogate"
## 
## $optim$kernel
## [1] "linear"
## 
## $optim$kernelModel
## ~gender + parentBMI - 1
## 
## 
## $optTx
##   CD   MR <NA> 
##   58   40  112 
## 
## $value
## [1] 9.149605
  # Value object returned by propensity score regression method
  fitObject(fitFS)
## $propensity
## 
## Call:  glm(formula = YinternalY ~ 1, family = "binomial", data = data)
## 
## Coefficients:
## (Intercept)  
##    -0.07623  
## 
## Degrees of Freedom: 209 Total (i.e. Null);  209 Residual
## Null Deviance:       290.8 
## Residual Deviance: 290.8     AIC: 292.8
  # Estimated optimal treatment for new data
  optTx(fitFS, bmiData)$optimalTx[1:10]
##  [1] CD MR MR MR MR CD MR MR MR MR
## Levels: CD MR