Instruction

Please remove this section when submitting your homework.

Students are encouraged to work together on homework and/or utilize advanced AI tools. However, sharing, copying, or providing any part of a homework solution or code to others is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. For late submission policy and grading rubrics, please refer to the course website.

Question 1: Linear SVM on Hand Written Digit Data

Load the MNIST dataset, the same way as HW5.

  # readin the data
  mnist <- read.csv("https://pjreddie.com/media/files/mnist_train.csv", nrows = 2000)
  colnames(mnist) = c("Digit", paste("Pixel", seq(1:784), sep = ""))
  save(mnist, file = "mnist_first2000.RData")

  # you can load the data with the following code
  # load("mnist_first2000.RData")
  dim(mnist)
## [1] 2000  785
  1. [15 pts] Since a standard SVM can only be used for binary classification problems, lets fit SVM on digits 1 and 2. Complete the following tasks.
  1. [15 pts] Some researchers might be interested in knowing what pixels are more important in distinguishing the two digits. One way to do this is to look at (calculate) the coefficients of the linear SVM model. Complete the following tasks.
  1. [10 pts] Perform Principal Component Analysis (PCA) on the training data.
  1. [10 pts] Perform a logistic regression with elastic net penalty (\(\alpha =\) 0.5) on the training data.

Question 2: Multi-class SVM

[25 pts] Our current SVM is only applicable to binary classification problems. In this question, we will extend it to multi-class classification problems. A simple idea is called one-vs-one (OVO) classification. For example, if we have 3 classes, we can fit 3 SVMs, each of which is trained on two classes. For a new observation, we throw that into each of the 3 SVMs and obtain three predictions. We then use the majority vote to determine its class. Carry out this approach using digits 1, 6, and 7 in our MNIST data. You still need to select the top pixels with the highest variance to avoid unnecessary warnings. But in this question, use only 100 pixels. For all models, keep the cost parameter \(C = 1\).

Question 3: Nonlinear SVM

[25 pts] Load the spam dataset from the kernlab package. In this is a classification example with consists of 4,601 instances and 57 features. The response variable is whether an email is spam or not. Use a nonlinear SVM with the Radial Basis Function (RBF) kernel. Evaluate the performance of the trained model. Complete the following tasks