STAT 546 Homework 3

Instruction
Overview
Question 1: Direct Learning via Linear Model
Question 2: Outcome Weighted Learning
Question 3: Using the grf package
Question 4: Radon–Nikodym Theorem

Instruction

Please remove this section when submitting your homework.

Students are encouraged to work together on homework and/or utilize advanced AI tools. However, sharing, copying, or providing any part of a homework solution or code to others is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. For late submission policy and grading rubrics, please refer to the course website.

You are required to submit the rendered file HWx_yourNetID.pdf. For example, HW01_rqzhu.pdf. Please note that this must be a .pdf file. .html format cannot be accepted. Make all of your R code chunks visible for grading.
Include your Name and NetID in the report.
If you use this file or the example homework .Rmd file as a template, be sure to remove this instruction section.
Make sure that you set seed properly so that the results can be replicated if needed.
For some questions, there will be restrictions on what packages/functions you can use. Please read the requirements carefully. As long as the question does not specify such restrictions, you can use anything.
When using AI tools, you are encouraged to document your comment on your experience with AI tools especially when it’s difficult for them to grasp the idea of the question.
On random seed and reproducibility: Make sure the version of your R is $\geq 4.0.0$. This will ensure your random seed generation is the same as everyone else. Please note that updating the R version may require you to re-install all of your packages.

Overview

The first three questions of this homework is quite simple. We will use three different models to estimate the personalized treatment rule, and we will use the Sepsis dataset in all three questions. Keep in mind that the Sepsis dataset contains a BEST variable which is the doctor’s best guess for the treatment. And you should never use this variable in your model fitting. Only use it when evaluating the performance of your model. The data is provided on our course website.

Question 1: Direct Learning via Linear Model

Implement the direct learning approach with linear regression. Since this is an observational study, you should incorporate propensity score weighting when estimating the treatment rule. After obtaining the estimated treatment rule, compare it with the best label Sepsis$BEST and report your results.

Question 2: Outcome Weighted Learning

Use the outcome weighted learning via WeightSVM package with support vector machine to analyze the Sepsis dataset. You may have to consider the following decisions try some of them

How to model the propensity score?
Which kernel to use? And tune bandwidth if needed.

If the performance is not as good as you want, try the DTRlearn2 package to perform this task again. Note that the DTRlearn2 package has already incorporated some rough internal tuning procedures and you only need to try different kernels. Report your results.

Question 3: Using the `grf` package

Use the grf package to find the optimal treatment rule for the Sepsis dataset. Compare the estimated treatment rule with the best label Sepsis$BEST and report your results. Furthermore, report the estimated average treatment effect (ATT) of this entire population (based on our understandings of the relationship between the conditional average treatment effect and the average treatment effect). You may also use the grf package documentation.

Question 4: Radon–Nikodym Theorem

Suppose that I want to estimate $E(X^3)$ when $X$ is a random variable from $N(1,1)$. However, I do not know how to perform integration, and I could do is to generate random samples and perform numerical approximation. In addition, my machine can only generate random samples from $N(2,2)$, but I do know the analytic density functions of both distributions. How can I estimate/approximate $E(X^3)$ numerically using the Radon–Nikodym theorem? Use 5000 samples to make your estimation stable. Validate your answer using simulated data from $N(1,1)$ or analytic integration.

Hint: if you ask GPT or write your prompt correctly with Copilot, it will give you the answer. But make sure to explain your procedure.