Please remove this section when submitting your homework.
Students are encouraged to work together on homework and/or utilize advanced AI tools. However, there are two basic rules:
Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. Please refer to the course website for late submission policy and grading rubrics.
HWx_yourNetID.pdf. For example,
HW01_rqzhu.pdf. Please note that this must be a
.pdf file. .html format will not be
accepted because they are often not readable on gradescope.
Make all of your R code chunks visible for
grading.R is \(\geq 4.0.0\). This
will ensure your random seed generation is the same as everyone
else..Rmd file
as a template, be sure to remove this instruction
section.We will use the Social Network Ads data, available on
Kaggle [link].
The .csv file is also available at our course website. The
goal is to classify the outcome Purchased, and we will only
use the two continuous variables EstimatedSalary and
Age. Scale and center both covariates before you
proceed with these following steps. For this question, you
should use the e1071 package. Complete the following
tasks:
Purchased == 1 and 100 with Purchased == 0 to
form a training data set. The rest of the data will be used as the
testing data. Set seed before you do this.pch = 19 for the
dots.cost = 1. Since your data
is already scaled and centered, set scale = FALSE in the
svm() function.svm() fitted object.cex = 2).predict() function. You need to calculate the predicted
outcome based on the decision line you obtained.ROCR package to
calculate the AUC and plot the ROC curve for the testing data. You need
to calculate \(f(x)\) yourself based on
the quantities you obtained from the fitted object.In this question, we will use the same training and testing data from
the previous question. Complete the following tasks. For this question,
you can use the predict() function to make predictions.
cost = 1 and
coef0 = 1. Since your data is already scaled and centered,
set scale = FALSE in the svm() function.ROCR package to calculate the AUC and
plot the ROC curve for the testing data. Is this better than the linear
SVM?EstimatedSalary and Age that cover the range
of the data. Use the fitted SVM model to predict the outcome for each
point in the grid. Plot these grid points with different colors for
different predicted outcomes so that you can visualize the decision
boundary.Take digits 4 and 9 from zip.train and
zip.test in the ElemStatLearn package. For
this question, you should use the kernlab package, in
combination with the caret package to tune the parameters.
Make sure that you specify the method = argument so that
the correct package/function is used to fit the model. You may consider
reading the details from this
documentation. Complete the following task.
kernlab package, and tune this using caret.
Use 10-fold cross-validation for this question. What is the best
C you obtained based on the accuracy? Predict the testing
data using this model and obtain the confusion table and testing data
accuracy.kernlab package, and tune this using
caret. Use repeated 10-fold cross-validation for this
question, and repeat 3 times. You may need to modify your code get a
good range of tuning parameter. What is the best C and
sigma you obtained based on the accuracy? Predict the
testing data using this model and obtain the confusion table and testing
data accuracy.