Students are encouraged to work together on homework. However, sharing, copying, or providing any part of a homework solution or code is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to compass2g. No email or hardcopy will be accepted. For late submission policy and grading rubrics, please refer to the course website.
What is expected for the submission to Gradescope
HWx_yourNetID.pdf
. For example, HW01_rqzhu.pdf
. Please note that this must be a .pdf
file generated by a .Rmd
file. .html
format cannot be accepted.Please note that your homework file is a PDF report instead of a messy collection of R codes. This report should include:
Ruoqing Zhu (rqzhu)
by your name and NetID if you are using this template).R
code chunks visible for grading.R
code chunks that support your answers.Answer: I fit SVM with the following choice of tuning parameters ...
Requirements regarding the .Rmd
file.
Rmd
files. However, your PDF file should be rendered directly from it.In this HW, we will code both the primal and dual form of SVM and utilize a general quadratic programming (quadprog
package) solve to help us obtain the solution.
Install the quadprog
package. The same package is also available in Python. However, make sure to read their documentations carefully. We will utilize the function solve.QP
to solve SVM. This function is trying to perform the minimization problem: \[\begin{align}
\text{minimize} & \quad \frac{1}{2} b^T \mathbf{D} b - d^T b, \nonumber \\
\text{subject to} & \quad \mathbf{A}^T b \geq b_0, \nonumber
\end{align}\] where \(b\) is the unknown parameter. For more details, read the documentation of the package on CRAN. Use our the provided training data. This is a linearly separable problem.
# this is how I generate the data
# set.seed(3)
# n = 30
# p = 2
# xpos = matrix(rnorm(n*p, mean=0, sd=1), n, p)
# xneg = matrix(rnorm(n*p, mean=3, sd=1), n, p)
# x = rbind(xpos, xneg)
# y = matrix(c(rep(1, n), rep(-1, n)))
#
# train = data.frame(x1 = x[, 1], x2 = x[, 2], y = y)
# write.csv(train, "SVM-Q1.csv", row.names = FALSE)
train = read.csv("SVM-Q1.csv")
x = as.matrix(train[, 1:2])
y = train[, 3]
plot(x, col=ifelse(y>0,"darkorange", "deepskyblue"), pch = 19, xlab = "x1", ylab = "x2")
legend("topleft", c("Positive","Negative"), col=c("darkorange", "deepskyblue"),
pch=c(19, 19), text.col=c("darkorange", "deepskyblue"))
Use the formulation defined on page 13 of the SVM
lecture note. The primal problem is
\[ \begin{align} \quad \underset{\beta_{0}, \boldsymbol{\beta}}{\text{minimize}} &\quad \frac{1}{2} \|\boldsymbol{\beta}\|^{2} \\ \text{subject to} &\quad y_{i}\left(x_{i}^{\top} \boldsymbol{\beta}+\beta_{0}\right) \geq 1, \,\, \text{for} \,\, i=1, \ldots, n \end{align} \]
Perform the following:
solve.QP()
function. Properly define \(\mathbf{D}\), \(d\), \(\mathbf{A}\) and \(b_0\) corresponding to this \(b\) for the linearly separable SVM primal problem.solve.QP()
function.Note: The package requires \(\mathbf{D}\) to be positive definite, while it is not true in our case. To address this problem, add \(10^{-10}\) to the top-left element of your \(\mathbf{D}\) matrix, which is the one corresponding to \(\beta_0\). This will make \(\mathbf{D}\) invertible. This may affect your results slightly. So be careful when plotting your support vectors.
Formulate the SVM dual problem on page 21 the lecture note. The dual problem is
\[ \begin{align} \underset{\boldsymbol \alpha}{\text{maximize}} & \quad \sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i, j=1}^{n} y_{i} y_{j} \alpha_{i} \alpha_{j} x_{i}^{\top} x_{j} \\ \text{subject to} & \quad \alpha_{i} \geq 0, \,\, \text{for} \,\, i=1, \ldots, n \\ \text{and} & \quad \sum_{i=1}^{n} \alpha_{i} y_{i}=0 \end{align} \]
Perform the following:
meq
argument.solve.QP()
function, and convert the solution into \(\boldsymbol \beta\) and \(\beta_0\).You need to report
Note: Again, \(\mathbf{D}\) may not be positive definite. This time, add \(10^{-10}\) to all diagonal elements to \(\mathbf{D}\). This may affect your results slightly. So be careful when plotting your support vectors.
In this question, we will follow the formulation in Page 30 to solve a linearly nonseparable SVM. The dual problem is given by
\[ \begin{align} \underset{\boldsymbol \alpha}{\text{maximize}} & \quad \sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i, j=1}^{n} y_{i} y_{j} \alpha_{i} \alpha_{j} x_{i}^{\top} x_{j} \\ \text{subject to} & \quad 0 \leq \alpha_{i} \leq C, \,\, \text{for} \,\, i=1, \ldots, n \\ \text{and} & \quad \sum_{i=1}^{n} \alpha_{i} y_{i}=0 \end{align} \]
Perform the following:
meq
argument.solve.QP()
function, and convert the solution into \(\boldsymbol \beta\) and \(\beta_0\). Note:
You need to report
# set.seed(20)
# n = 200 # number of data points for each class
# p = 2 # dimension
# Generate the positive and negative examples
# xpos <- matrix(rnorm(n*p,mean=0,sd=1),n,p)
# xneg <- matrix(rnorm(n*p,mean=1.5,sd=1),n,p)
# x <- rbind(xpos,xneg)
# y <- c(rep(-1, n), rep(1, n))
# train = data.frame(x1 = x[, 1], x2 = x[, 2], y = y)
# write.csv(train, "SVM-Q2.csv", row.names = FALSE)
train = read.csv("SVM-Q2.csv")
x = as.matrix(train[, 1:2])
y = train[, 3]
set.seed(20)
n = 200 # number of data points for each class
p = 2 # dimension
# Generate the positive and negative examples
xpos <- matrix(rnorm(n*p,mean=0,sd=1),n,p)
xneg <- matrix(rnorm(n*p,mean=1.5,sd=1),n,p)
x <- rbind(xpos,xneg)
y <- c(rep(-1, n), rep(1, n))
plot(x, col=ifelse(y>0,"darkorange", "deepskyblue"), pch = 19, xlab = "x1", ylab = "x2")
legend("topleft", c("Positive","Negative"), col=c("darkorange", "deepskyblue"),
pch=c(19, 19), text.col=c("darkorange", "deepskyblue"))
We can also perform linear and nonlinear classification using the penalized loss framework. In this question, we will only use the linear version. Use the same dataset in Question 2. Consider the following logistic loss function:
\[L(y, f(x)) = \log(1 + e^{- y f(x)}).\] The rest of the job is to solve this optimization problem if given the functional form of \(f(x)\). To do this, we will utilize the general-purpose optimization package/function. For example, in R
, you can use the optim()
function. Read the documentation of this function (or equivalent ones in Python) and set up the objective function properly to solve for the parameters. If you need an example of how to use the optim()
function, read the corresponding part in the example file provided on our course website here (Section 10).
We let \(f(x)\) is to be a linear function, SVM can be solved by optimizing a penalized loss: \[ \underset{\beta_0, \boldsymbol\beta}{\arg\min} \quad \sum_{i=1}^n L(y_i, \beta_0 + x_i^T \boldsymbol\beta) + \lambda \lVert \beta \rVert^2\] You should use the data from Question 2, and answer these questions:
SVMfn(b, x, y, lambda)
and its gradient SVMgn(b, x, y, lambda)
.optim()
and your objective and gradient functions with \(\lambda = 1\) and BFGS
method. Use 0 as the initialized value.Report the followings:
optim()
without a this gradient function and compare the parameters to your previous ones. Note this will be much slower. You are not required to report this result.