Please remove this section when submitting your homework.
Students are encouraged to work together on homework and/or utilize advanced AI tools. However, sharing, copying, or providing any part of a homework solution or code to others is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. For late submission policy and grading rubrics, please refer to the course website.
HWx_yourNetID.pdf
. For example,
HW01_rqzhu.pdf
. Please note that this must be a
.pdf
file. .html
format
cannot be accepted. Make all of your R
code chunks visible for grading..Rmd
file
as a template, be sure to remove this instruction
section.R
is \(\geq
4.0.0\). This will ensure your random seed generation is the same
as everyone else. Please note that updating the R
version
may require you to reinstall all of your packages.Load the MNIST dataset, the same way as previous HW.
# readin the data
# mnist <- read.csv("https://pjreddie.com/media/files/mnist_train.csv", nrows = 2000)
# colnames(mnist) = c("Digit", paste("Pixel", seq(1:784), sep = ""))
# save(mnist, file = "mnist_first2000.RData")
# you can load the data with the following code
load("mnist_first2000.RData")
dim(mnist)
## [1] 2000 785
We aim to fit an LDA (Linear Discriminant Analysis) model with our own defined function following our understanding of the LDA. An issue with this dataset, as we saw earlier, is that some pixels display little or no variation across all observations. This zero variance issue poses a problem when inverting the estimated covariance matrix. Do the following to address this issue and fit the LDA model.
# your code here
train <- mnist[1:1000,]
train <- train[train$Digit == 1 | train$Digit == 6 | train$Digit == 7,]
test <- mnist[1001:2000,]
test <- test[test$Digit == 1 | test$Digit == 6 | test$Digit == 7,]
# perform marginal screening
var = apply(train[, -1], 2, var)
varuse = order(var, decreasing = TRUE)[1:300]
train = train[, c(1, varuse + 1)]
test = test[, c(1, varuse + 1)]
# estimate the parameters
# separate datasets
digit1 = train[train$Digit == 1, -1]
digit6 = train[train$Digit == 6, -1]
digit7 = train[train$Digit == 7, -1]
# prior
pik = c(nrow(digit1), nrow(digit6), nrow(digit7)) / nrow(train)
# mu
mu1 = colMeans(digit1)
mu6 = colMeans(digit6)
mu7 = colMeans(digit7)
# the centered data
C1_centered = scale(digit1, center = TRUE, scale = FALSE)
C6_centered = scale(digit6, center = TRUE, scale = FALSE)
C7_centered = scale(digit7, center = TRUE, scale = FALSE)
# pooled covariance matrix
Sigma = (t(C1_centered) %*% C1_centered +
t(C6_centered) %*% C6_centered +
t(C7_centered) %*% C7_centered) / (nrow(train) - 3)
# calculate wk
w1 = solve(Sigma) %*% mu1
w6 = solve(Sigma) %*% mu6
w7 = solve(Sigma) %*% mu7
# calculating bk
b1 = - 0.5 * t(mu1) %*% solve(Sigma) %*% mu1 + log(pik[1])
b6 = - 0.5 * t(mu6) %*% solve(Sigma) %*% mu6 + log(pik[2])
b7 = - 0.5 * t(mu7) %*% solve(Sigma) %*% mu7 + log(pik[3])
# predicting new data
# calculate and compare the posteriori probability
f1 = as.matrix(test[, -1]) %*% w1 + as.numeric(b1)
f6 = as.matrix(test[, -1]) %*% w6 + as.numeric(b6)
f7 = as.matrix(test[, -1]) %*% w7 + as.numeric(b7)
# accuracy
pred = c(1, 6, 7)[apply(cbind(f1, f6, f7), 1, which.max)]
conf_tab = table(pred, test[, 1])
conf_tab
##
## pred 1 6 7
## 1 85 14 24
## 6 10 89 13
## 7 9 3 70
Sigma_pen = Sigma + diag(nrow(Sigma)) * 1
# calculate wk
w1 = solve(Sigma_pen) %*% mu1
w6 = solve(Sigma_pen) %*% mu6
w7 = solve(Sigma_pen) %*% mu7
# calculating bk
b1 = - 0.5 * t(mu1) %*% solve(Sigma_pen) %*% mu1 + log(pik[1])
b6 = - 0.5 * t(mu6) %*% solve(Sigma_pen) %*% mu6 + log(pik[2])
b7 = - 0.5 * t(mu7) %*% solve(Sigma_pen) %*% mu7 + log(pik[3])
# predicting new data
# calculate and compare the posteriori probability
f1 = as.matrix(test[, -1]) %*% w1 + as.numeric(b1)
f6 = as.matrix(test[, -1]) %*% w6 + as.numeric(b6)
f7 = as.matrix(test[, -1]) %*% w7 + as.numeric(b7)
# accuracy
pred = c(1, 6, 7)[apply(cbind(f1, f6, f7), 1, which.max)]
conf_tab = table(pred, test[, 1])
conf_tab
##
## pred 1 6 7
## 1 96 12 19
## 6 2 90 7
## 7 6 4 81
Sigma_pen = Sigma + diag(nrow(Sigma)) * 10
# calculate wk
w1 = solve(Sigma_pen) %*% mu1
w6 = solve(Sigma_pen) %*% mu6
w7 = solve(Sigma_pen) %*% mu7
# calculating bk
b1 = - 0.5 * t(mu1) %*% solve(Sigma_pen) %*% mu1 + log(pik[1])
b6 = - 0.5 * t(mu6) %*% solve(Sigma_pen) %*% mu6 + log(pik[2])
b7 = - 0.5 * t(mu7) %*% solve(Sigma_pen) %*% mu7 + log(pik[3])
# predicting new data
# calculate and compare the posteriori probability
f1 = as.matrix(test[, -1]) %*% w1 + as.numeric(b1)
f6 = as.matrix(test[, -1]) %*% w6 + as.numeric(b6)
f7 = as.matrix(test[, -1]) %*% w7 + as.numeric(b7)
# accuracy
pred = c(1, 6, 7)[apply(cbind(f1, f6, f7), 1, which.max)]
conf_tab = table(pred, test[, 1])
conf_tab
##
## pred 1 6 7
## 1 103 11 7
## 6 1 94 4
## 7 0 1 96
\(\lambda\) serves as trading the bias-variance trade-off. When \(\lambda\) is small, the model is accurately estimating the covariance matrix (if the true model is MVN), but the variance is high. When \(\lambda\) is large, the model is estimating the covariance matrix with high bias, but the variance is low. In this case, the model with \(\lambda = 1\) is better than the model with \(\lambda = 10\).
Another approach we could do is to perform PCA at the very beginning of this analysis, instead of screening for the top 300 variables. And then we can perform the same type of analysis as in part a. but with PCA as your variables (in both training and testing data).
mnist
data, and take
digits 1, 6, and 7. Perform PCA on the pixels.Comment on why do you think this approach would work well.
# get the first 50 PCs
mnist167 = mnist[mnist$Digit == 1 | mnist$Digit == 6 | mnist$Digit == 7, ]
mnist_pca = prcomp(mnist167[, -1], center = TRUE, scale = FALSE)$x[, 1:20]
# split the data
pca_trainx = mnist_pca[1:nrow(train), ]
pca_testx = mnist_pca[-(1:nrow(train)), ]
# estimate the parameters
# separate datasets
digit1 = pca_trainx[train$Digit == 1, ]
digit6 = pca_trainx[train$Digit == 6, ]
digit7 = pca_trainx[train$Digit == 7, ]
# prior
pik = c(nrow(digit1), nrow(digit6), nrow(digit7)) / nrow(train)
# mu
mu1 = colMeans(digit1)
mu6 = colMeans(digit6)
mu7 = colMeans(digit7)
# the centered data
C1_centered = scale(digit1, center = TRUE, scale = FALSE)
C6_centered = scale(digit6, center = TRUE, scale = FALSE)
C7_centered = scale(digit7, center = TRUE, scale = FALSE)
# pooled covariance matrix
Sigma = (t(C1_centered) %*% C1_centered +
t(C6_centered) %*% C6_centered +
t(C7_centered) %*% C7_centered) / (nrow(train) - 3)
# calculate wk
w1 = solve(Sigma) %*% mu1
w6 = solve(Sigma) %*% mu6
w7 = solve(Sigma) %*% mu7
# calculating bk
b1 = - 0.5 * t(mu1) %*% solve(Sigma) %*% mu1 + log(pik[1])
b6 = - 0.5 * t(mu6) %*% solve(Sigma) %*% mu6 + log(pik[2])
b7 = - 0.5 * t(mu7) %*% solve(Sigma) %*% mu7 + log(pik[3])
# predicting new data
# calculate and compare the posteriori probability
f1 = as.matrix(pca_testx) %*% w1 + as.numeric(b1)
f6 = as.matrix(pca_testx) %*% w6 + as.numeric(b6)
f7 = as.matrix(pca_testx) %*% w7 + as.numeric(b7)
# accuracy
pred = c(1, 6, 7)[apply(cbind(f1, f6, f7), 1, which.max)]
conf_tab = table(pred, test$Digit)
conf_tab
##
## pred 1 6 7
## 1 104 0 2
## 6 0 106 1
## 7 0 0 104