Please remove this section when submitting your homework.
Students are encouraged to work together on homework and/or utilize advanced AI tools. However, sharing, copying, or providing any part of a homework solution or code to others is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. For late submission policy and grading rubrics, please refer to the course website.
HWx_yourNetID.pdf
. For example,
HW01_rqzhu.pdf
. Please note that this must be a
.pdf
file. .html
format
cannot be accepted. Make all of your R
code chunks visible for grading..Rmd
file
as a template, be sure to remove this instruction
section.R
is \(\geq
4.0.0\). This will ensure your random seed generation is the same
as everyone else. Please note that updating the R
version
may require you to reinstall all of your packages.We will again use the MNIST
dataset. We will use the
first 2400 observations of it:
# inputs to download file
fileLocation <- "https://pjreddie.com/media/files/mnist_train.csv"
numRowsToDownload <- 2400
localFileName <- paste0("mnist_first", numRowsToDownload, ".RData")
# download the data and add column names
mnist2400 <- read.csv(fileLocation, nrows = numRowsToDownload)
numColsMnist <- dim(mnist2400)[2]
colnames(mnist2400) <- c("Digit", paste("Pixel", seq(1:(numColsMnist - 1)), sep = ""))
# save file
# in the future we can read in from the local copy instead of having to redownload
save(mnist2400, file = localFileName)
# you can load the data with the following code
#load(file = localFileName)
[15 pts] Since a standard SVM can only be used for binary classification problems, let’s fit SVM on digits 4 and 5. Complete the following tasks.
e1071
package. Set the cost parameter \(C =
1\).[15 pts] Some researchers might be interested in knowing what pixels are more important in distinguishing the two digits. One way to do this is to extract the coefficients of the (linear) SVM model (they are fairly comparable in our case since all the variables have the same range). Keep in mind that the coefficients are those \(\beta\) parameter used to define the direction of the separation line, and they are can be recovered from the solution of the Lagrangian. Complete the following tasks.
[15 pts] Perform a logistic regression with elastic net penalty (\(\alpha =\) 0.5) on the training data. Start with the 250 pixels you have used in part a). You do not need to select the best \(\lambda\) value using cross-validation. Instead, select the model with just 30 variables in the solution path (what is this? you can refer to our lecture note on Lasso). What is the \(\lambda\) value corresponding to this model? Extract the pixels being selected by your elastic net model. Do these pixels overlap with the ones selected by the SVM model in part b)? Comment on your findings.
[10 pts] Compare the two 30-variable models you obtained from part b) and c). Use the area under the ROC curve (AUC) on the testing data as the performance metric.
This problem involves the OJ
data set which is part of
the ISLR2
package. We create a training set containing a
random sample of 800 observations, and a test set containing the
remaining observations. In the dataset, Purchase
variable
is the output variable and it indicates whether a customer purchased
Citrus Hill or Minute Maid Orange Juice. For the details of the datset
you can refer to its help file.
library(ISLR2)
data("OJ")
set.seed(7)
id=sample(nrow(OJ),800)
train=OJ[id,]
test=OJ[-id,]
[15 pts]** Fit a (linear) support vector machine by using
svm
function to the training data using
cost
\(= 0.01\) and using
all the input variables. Provide the training and test errors.
[15 pts]** Use the tune()
function to select an
optimal cost, C in the set of \(\{0.01, 0.1,
1, 2, 5, 7, 10\}\). Compute the training and test errors using
the best value for cost.
[15 pts]** Repeat parts 1 and 2 using a support vector machine
with radial
and polynomial
(with degree 2)
kernel. Use the default value for gamma
in the
radial
kernel. Comment on your results from parts b and
c.