Please remove this section when submitting your homework.
Students are encouraged to work together on homework and/or utilize advanced AI tools. However, there are two basic rules:
Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. Please refer to the course website for late submission policy and grading rubrics.
HWx_yourNetID.pdf. For example,
HW01_rqzhu.pdf. Please note that this must be a
.pdf file. .html format will not be
accepted because they are often not readable on gradescope.
Make all of your R code chunks visible for
grading.R is \(\geq 4.0.0\). This
will ensure your random seed generation is the same as everyone
else..Rmd file
as a template, be sure to remove this instruction
section.Load the same MNIST dataset from HW5, the same way as previous HW.
load("mnist_first2000.RData")
dim(mnist)
## [1] 2000 785
[25 pts] We aim to fit an LDA (Linear Discriminant Analysis) model with our own defined function following our understanding of the LDA. An issue with this dataset, as we saw earlier, is that some pixels display little or no variation across all observations. This zero variance issue poses a problem when inverting the estimated covariance matrix. Do the following to address this issue and fit the LDA model.
[20 pts] The result may not be ideal. At least compared with our previous HW using SVM one-vs-one model, this is probably worse. Let’s try to improve it. One issue could be that the inverse of the covariance matrix is not very stable. As we discussed in class, one possible choice is to add a ridge penalty to \(\Sigma\). Carry out this approach using \(\lambda = 1\) and re-calculate the confusion matrix and prediction error. Then try a few different penalty values of \(\lambda\) to observe how the prediction error changes. Comment on the effect of \(\lambda\), specifically under the context of this model. Is this related to the bias-variance trade-off?
[20 pts] Another approach we could do is to perform PCA at the very beginning of this analysis, instead of screening for the top 300 variables. And then we can perform the same type of analysis as in part a. but with PCA as your variables (in both training and testing data).
mnist data, and take
digits 1, 6, and 7. Perform PCA on the pixels.Comment on why do you think this approach would work well.
We will use the same PCA data you constructed in Question 1(c). Use
the training data for model fitting and the testing data for model
evaluation. Fit a 5-fold cross-validation CART model and answer the
following question. There are many packages that can do this, you could
consider the rpart package which provides cross-validation
functionality and easy plotting. Do not print excessive output
when using this package.
lambda.1sd in glmnet, see lecture note),
obtain and plot the tree corresponding to that cp value. What is the
optimal tree size (number of terminal nodes)?