Please remove this section when submitting your homework.
Students are encouraged to work together on homework and/or utilize advanced AI tools. However, sharing, copying, or providing any part of a homework solution or code to others is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. For late submission policy and grading rubrics, please refer to the course website.
HWx_yourNetID.pdf
. For example,
HW01_rqzhu.pdf
. Please note that this must be a
.pdf
file. .html
format
cannot be accepted. Make all of your R
code chunks visible for grading..Rmd
file
as a template, be sure to remove this instruction
section.R
is \(\geq
4.0.0\). This will ensure your random seed generation is the same
as everyone else. Please note that updating the R
version
may require you to reinstall all of your packages.During our lecture, we considered a simulation model to analyze the variable selection property of Lasso. Now let’s further investigate the prediction error cased by the \(L1\) penalty under this model, and understand the bias-variance trade-off. For this question, your underlying true data generate should be
\[\begin{align} Y =& X^\text{T} \boldsymbol \beta + \epsilon \\ =& \sum_{j = 1}^p X_j 0.4^\sqrt{j} + \epsilon, \end{align}\]
where p
\(= 30\), each
\(X_j\) is generated independently from
\(\cal{N}(0, 1)\), and \(\epsilon\) also follows a standard normal,
independent from \(X\). The goal is to
predict two target points and investigate how the prediction error
changes under different penalties. The training data and two target
testing points are defined by the following code.
# target testing points
p = 30
xa = xb = rep(0, p)
xa[2] = 1
xb[10] = 1
Perform the following questions:
exp(seq(-5, 5, 0.05))
nsim
\(= 200\) independent runs, with
n
\(= 100\) observations in
each run.
glmnet()
function to fit Lasso on the \(\lambda\) valuesxb
should be much larger than xa
. What are the
corresponding best \(\lambda\) and
Error for each target point?xb
is much
larger than xa
. Hint: pay attention to their covariate
values and the associated \(\widehat
\beta\) parameters. Discuss how their predictions would trade
bias and variance differently.Answer:
set.seed(1)
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-8
n = 100
p = 30
all_lambda = exp(seq(-5, 5, 0.05))
nsim = 200
coefs = 0.4^sqrt(c(1:p))
# testing data
xtest = rbind(xa, xb)
# store the errors
errors = array(NA, dim = c(2, length(all_lambda), nsim))
# repeat the simulation
for (i in 1:nsim){
# the design matrix
x = matrix(rnorm(n*p), n, p)
y = x %*% coefs + rnorm(n)
# outcome
ytest = as.vector(xtest %*% coefs)
# fit the lasso and predict
glmnetfit = glmnet(x, y, lambda = all_lambda)
errors[,,i] = (predict(glmnetfit, xtest, all_lambda) - ytest)^2
}
# average the errors over all simulation runs
mean_errors = apply(errors, c(1, 2), mean)
xa_error = mean_errors[1, ]
xb_error = mean_errors[2, ]
# plot the errors
plot(log(all_lambda), mean_errors[1, ], type = "l", xlab = "lambda",
ylab = "error", ylim = c(0, 0.1))
points(log(all_lambda), mean_errors[2, ], col = "red", type = "l")
legend("topleft", legend = c("xa", "xb"), col = c("black", "red"), lty = 1)
# optimal lambda and error
lambda_a = all_lambda[which.min(xa_error)]
best_err_a = min(xa_error)
lambda_b = all_lambda[which.min(xb_error)]
best_err_b = min(xb_error)
lambda_err = matrix(c(lambda_a, lambda_b, best_err_a, best_err_b), 2, 2)
rownames(lambda_err) = c("xa", "xb")
colnames(lambda_err) = c("optimal lambda", "optimal error")
knitr::kable(data.frame(lambda_err), table.attr = "style='width:40%;'",
format = "html", digits = 5)
optimal.lambda | optimal.error | |
---|---|---|
xa | 0.02872 | 0.02193 |
xb | 0.10026 | 0.01161 |
The prediction for xa
relies heavily on the estimation
of a large parameter (the second entry), and a small amount of penalty
would cause a relatively larger bias. As contrast, prediction of
xb
does not use most of the parameters, and even applying a
large parameter will simply set its estimate to 0. Hence, using a larger
penalty is only reducing the variance further, but does not introduce
additional bias. This can also be seen in the figure that the maximum
error for xa
is much bigger than xb
, because
there is a lot of bias to introduce if we use a large penalty. Hence its
preferably to use a large \(\lambda\)
when predicting xb
and a smaller \(\lambda\) for xa
to better
balance the bias-variance trade-off.
In HW3, we used golub
dataset from the
multtest
package. This dataset contains 3051 genes from 38
tumor mRNA samples from the leukemia microarray study Golub et
al. (1999). The outcome golub.cl
is an indicator for two
leukemia types: Acute Lymphoblastic Leukemia (ALL) or Acute Myeloid
Leukemia (AML). In genetic analysis, many gene expressions are highly
correlated. Hence we could consider the Elastic net model for both
sparsity and correlation.
Answer:
# Load required packages
library(glmnet)
library(multtest)
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
# Load Golub leukemia data
data(golub)
# Extract expression data (transposed for glmnet) and class labels
X = t(as.matrix(golub))
y = as.factor(golub.cl)
# Initialize variables to keep track of results
best_alpha = NULL
best_lambda = NULL
min_cv_error = Inf
# List of alpha values to try
alpha_values = seq(0, 1, by=0.1)
# Loop over alpha values
for (alpha in alpha_values) {
# Fit elastic net model with current alpha value using cv.glmnet()
cvfit = cv.glmnet(X, y, family="binomial", alpha=alpha, folds = 19)
# Retrieve the mean cross-validated error for the optimal lambda
cv_error = min(cvfit$cvm)
# Update best_alpha and best_lambda if this model is better than previous best
if (cv_error < min_cv_error) {
best_alpha = alpha
best_lambda = cvfit$lambda.min
min_cv_error = cv_error
best_model = cvfit
}
}
# Output the best alpha and lambda
cat("Best alpha:", best_alpha, "\n")
## Best alpha: 0.2
cat("Best lambda:", best_lambda, "\n")
## Best lambda: 0.01957254