Please remove this section when submitting your homework.
Students are encouraged to work together on homework and/or utilize advanced AI tools. However, there are two basic rules:
Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. Please refer to the course website for late submission policy and grading rubrics.
HWx_yourNetID.pdf. For example,
HW01_rqzhu.pdf. Please note that this must be a
.pdf file. .html format will not be
accepted because they are often not readable on gradescope.
Make all of your R code chunks visible for
grading.R is \(\geq 4.0.0\). This
will ensure your random seed generation is the same as everyone
else..Rmd file
as a template, be sure to remove this instruction
section.Load the same MNIST dataset from HW5, the same way as previous HW.
load("mnist_first2000.RData")
dim(mnist)
## [1] 2000 785
[25 pts] We aim to fit an LDA (Linear Discriminant Analysis) model with our own defined function following our understanding of the LDA. An issue with this dataset, as we saw earlier, is that some pixels display little or no variation across all observations. This zero variance issue poses a problem when inverting the estimated covariance matrix. Do the following to address this issue and fit the LDA model.
Answer:
# Part (a)
# your code here
train <- mnist[1:1000,]
train <- train[train$Digit == 1 | train$Digit == 4 | train$Digit == 9,]
test <- mnist[1001:2000,]
test <- test[test$Digit == 1 | test$Digit == 4 | test$Digit == 9,]
# perform marginal screening
var = apply(train[, -1], 2, var)
varuse = order(var, decreasing = TRUE)[1:300]
train = train[, c(1, varuse + 1)]
test = test[, c(1, varuse + 1)]
# estimate the parameters
# separate datasets
digit1 = train[train$Digit == 1, -1]
digit4 = train[train$Digit == 4, -1]
digit9 = train[train$Digit == 9, -1]
# prior
pik = c(nrow(digit1), nrow(digit4), nrow(digit9)) / nrow(train)
# mu
mu1 = colMeans(digit1)
mu4 = colMeans(digit4)
mu9 = colMeans(digit9)
# the centered data
C1_centered = scale(digit1, center = TRUE, scale = FALSE)
C4_centered = scale(digit4, center = TRUE, scale = FALSE)
C9_centered = scale(digit9, center = TRUE, scale = FALSE)
# pooled covariance matrix
Sigma = (t(C1_centered) %*% C1_centered +
t(C4_centered) %*% C4_centered +
t(C9_centered) %*% C9_centered) / (nrow(train) - 3)
# calculate wk
w1 = solve(Sigma) %*% mu1
w4 = solve(Sigma) %*% mu4
w9 = solve(Sigma) %*% mu9
# calculating bk
b1 = - 0.5 * t(mu1) %*% solve(Sigma) %*% mu1 + log(pik[1])
b4 = - 0.5 * t(mu4) %*% solve(Sigma) %*% mu4 + log(pik[2])
b9 = - 0.5 * t(mu9) %*% solve(Sigma) %*% mu9 + log(pik[3])
# predicting new data
# calculate and compare the posteriori probability
f1 = as.matrix(test[, -1]) %*% w1 + as.numeric(b1)
f4 = as.matrix(test[, -1]) %*% w4 + as.numeric(b4)
f9 = as.matrix(test[, -1]) %*% w9 + as.numeric(b9)
# accuracy
pred = c(1, 4, 9)[apply(cbind(f1, f4, f9), 1, which.max)]
conf_tab = table(pred, test[, 1])
conf_tab
##
## pred 1 4 9
## 1 93 22 16
## 4 7 61 35
## 9 4 26 59
Answer:
# Part (b): lambda = 1
Sigma_pen = Sigma + diag(nrow(Sigma)) * 1
# calculate wk
w1 = solve(Sigma_pen) %*% mu1
w4 = solve(Sigma_pen) %*% mu4
w9 = solve(Sigma_pen) %*% mu9
# calculating bk
b1 = - 0.5 * t(mu1) %*% solve(Sigma_pen) %*% mu1 + log(pik[1])
b4 = - 0.5 * t(mu4) %*% solve(Sigma_pen) %*% mu4 + log(pik[2])
b9 = - 0.5 * t(mu9) %*% solve(Sigma_pen) %*% mu9 + log(pik[3])
# predicting new data
# calculate and compare the posteriori probability
f1 = as.matrix(test[, -1]) %*% w1 + as.numeric(b1)
f4 = as.matrix(test[, -1]) %*% w4 + as.numeric(b4)
f9 = as.matrix(test[, -1]) %*% w9 + as.numeric(b9)
# accuracy
pred = c(1, 4, 9)[apply(cbind(f1, f4, f9), 1, which.max)]
conf_tab = table(pred, test[, 1])
conf_tab
##
## pred 1 4 9
## 1 98 10 9
## 4 3 70 22
## 9 3 29 79
# Part (b): lambda = 10
Sigma_pen = Sigma + diag(nrow(Sigma)) * 10
# calculate wk
w1 = solve(Sigma_pen) %*% mu1
w4 = solve(Sigma_pen) %*% mu4
w9 = solve(Sigma_pen) %*% mu9
# calculating bk
b1 = - 0.5 * t(mu1) %*% solve(Sigma_pen) %*% mu1 + log(pik[1])
b4 = - 0.5 * t(mu4) %*% solve(Sigma_pen) %*% mu4 + log(pik[2])
b9 = - 0.5 * t(mu9) %*% solve(Sigma_pen) %*% mu9 + log(pik[3])
# predicting new data
# calculate and compare the posteriori probability
f1 = as.matrix(test[, -1]) %*% w1 + as.numeric(b1)
f4 = as.matrix(test[, -1]) %*% w4 + as.numeric(b4)
f9 = as.matrix(test[, -1]) %*% w9 + as.numeric(b9)
# accuracy
pred = c(1, 4, 9)[apply(cbind(f1, f4, f9), 1, which.max)]
conf_tab = table(pred, test[, 1])
conf_tab
##
## pred 1 4 9
## 1 100 2 1
## 4 2 89 22
## 9 2 18 87
\(\lambda\) serves as trading the bias-variance trade-off. When \(\lambda\) is small, the model is accurately estimating the covariance matrix (if the true model is MVN), but the variance is high. When \(\lambda\) is large, the model is estimating the covariance matrix with high bias, but the variance is low. In this case, the model with \(\lambda = 1\) is better than the model with \(\lambda = 10\).
[20 pts] Another approach we could do is to perform PCA at the very beginning of this analysis, instead of screening for the top 300 variables. And then we can perform the same type of analysis as in part a. but with PCA as your variables (in both training and testing data).
mnist data, and take
digits 1, 6, and 7. Perform PCA on the pixels.Comment on why do you think this approach would work well.
Answer:
# Part (c)
# get the first 50 PCs
mnist167 = mnist[mnist$Digit == 1 | mnist$Digit == 4 | mnist$Digit == 9, ]
mnist_pca = prcomp(mnist167[, -1], center = TRUE, scale = FALSE)$x[, 1:50]
# split the data
pca_trainx = mnist_pca[1:nrow(train), ]
pca_testx = mnist_pca[-(1:nrow(train)), ]
# estimate the parameters
# separate datasets
digit1 = pca_trainx[train$Digit == 1, ]
digit4 = pca_trainx[train$Digit == 4, ]
digit9 = pca_trainx[train$Digit == 9, ]
# prior
pik = c(nrow(digit1), nrow(digit4), nrow(digit9)) / nrow(train)
# mu
mu1 = colMeans(digit1)
mu4 = colMeans(digit4)
mu9 = colMeans(digit9)
# the centered data
C1_centered = scale(digit1, center = TRUE, scale = FALSE)
C4_centered = scale(digit4, center = TRUE, scale = FALSE)
C9_centered = scale(digit9, center = TRUE, scale = FALSE)
# pooled covariance matrix
Sigma = (t(C1_centered) %*% C1_centered +
t(C4_centered) %*% C4_centered +
t(C9_centered) %*% C9_centered) / (nrow(train) - 3)
# calculate wk
w1 = solve(Sigma) %*% mu1
w4 = solve(Sigma) %*% mu4
w9 = solve(Sigma) %*% mu9
# calculating bk
b1 = - 0.5 * t(mu1) %*% solve(Sigma) %*% mu1 + log(pik[1])
b4 = - 0.5 * t(mu4) %*% solve(Sigma) %*% mu4 + log(pik[2])
b9 = - 0.5 * t(mu9) %*% solve(Sigma) %*% mu9 + log(pik[3])
# predicting new data
# calculate and compare the posteriori probability
f1 = as.matrix(pca_testx) %*% w1 + as.numeric(b1)
f4 = as.matrix(pca_testx) %*% w4 + as.numeric(b4)
f9 = as.matrix(pca_testx) %*% w9 + as.numeric(b9)
# accuracy
pred = c(1, 4, 9)[apply(cbind(f1, f4, f9), 1, which.max)]
conf_tab = table(pred, test$Digit)
conf_tab
##
## pred 1 4 9
## 1 103 3 0
## 4 1 101 6
## 9 0 5 104
We will use the same PCA data you constructed in Question 1(c). Use
the training data for model fitting and the testing data for model
evaluation. Fit a 5-fold cross-validation CART model and answer the
following question. There are many packages that can do this, you could
consider the rpart package which provides cross-validation
functionality and easy plotting. Do not print excessive output
when using this package.
lambda.1sd in
glmnet, see lecture note), obtain and plot the tree
corresponding to that cp value. What is the optimal tree size (number of
terminal nodes)?Answer:
library(rpart)
library(rpart.plot)
cart.fit <- rpart(as.factor(Digit)~.,
data = data.frame(Digit=as.factor(train$Digit), pca_trainx),
control = rpart.control(xval=5))
# get the cross-validation results
cart.fit$cptable
## CP nsplit rel error xerror xstd
## 1 0.49268293 0 1.0000000 1.0682927 0.04069263
## 2 0.22439024 1 0.5073171 0.5170732 0.04110227
## 3 0.03902439 2 0.2829268 0.3121951 0.03491807
## 4 0.01463415 4 0.2048780 0.2780488 0.03339897
## 5 0.01000000 9 0.1317073 0.2878049 0.03385090
plotcp(cart.fit)
# the 1-SD rule shows a CP value between 3 and 5 terminal nodes
cptarg = sqrt(cart.fit$cptable[4,1]*cart.fit$cptable[5,1])
prunedtree = prune(cart.fit,cp=cptarg)
# the pruned tree has 9 terminal nodes
rpart.plot(prunedtree)
# apply on the testing data
pred = predict(prunedtree, newdata = data.frame(pca_testx), type = "class")
conf_tab = table(pred, test$Digit)
conf_tab
##
## pred 1 4 9
## 1 96 2 0
## 4 0 68 20
## 9 8 39 90