Instruction
About HW8
Question 1 [50 Points] Sovling SVM using Quadratic Programming
- a) [25 points] The Primal Form
- b) [25 points] The Dual Form
Question 2 [20 Points] Linearly nonseparable SVM
Question 3 [30 Points] Penalized Loss Linear SVM

Instruction

Students are encouraged to work together on homework. However, sharing, copying, or providing any part of a homework solution or code is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to compass2g. No email or hardcopy will be accepted. For late submission policy and grading rubrics, please refer to the course website.

What is expected for the submission to Gradescope
- You are required to submit one rendered PDF file HWx_yourNetID.pdf. For example, HW01_rqzhu.pdf. Please note that this must be a .pdf file generated by a .Rmd file. .html format cannot be accepted.
- Please follow the instructions on Gradescope to select corresponding PDF pages for each question.
Please note that your homework file is a PDF report instead of a messy collection of R codes. This report should include:
- Your Name and NetID. (Replace Ruoqing Zhu (rqzhu) by your name and NetID if you are using this template).
- Make all of your R code chunks visible for grading.
- Relevant outputs from your R code chunks that support your answers.
- Provide clear answers or conclusions for each question. For example, you could start with Answer: I fit SVM with the following choice of tuning parameters ...
- Many assignments require your own implementation of algorithms. Basic comments are strongly encouraged to explain the logic to our graders. However, line-by-line code comments are unnecessary.
Requirements regarding the .Rmd file.
- You do NOT need to submit Rmd files. However, your PDF file should be rendered directly from it.
- Make sure that you set random seeds for simulation or randomized algorithms so that the results are reproducible. If a specific seed number is not provided in the homework, you can consider using your NetID.
- For some questions, there will be restrictions on what packages/functions you can use. Please read the requirements carefully. As long as the question does not specify such restrictions, you can use anything.

About HW8

In this HW, we will code both the primal and dual form of SVM and utilize a general quadratic programming (quadprog package) solve to help us obtain the solution.

Question 1 [50 Points] Sovling SVM using Quadratic Programming

Install the quadprog package. The same package is also available in Python. However, make sure to read their documentations carefully. We will utilize the function solve.QP to solve SVM. This function is trying to perform the minimization problem: \[\begin{align} \text{minimize} & \quad \frac{1}{2} b^T \mathbf{D} b - d^T b, \nonumber \\ \text{subject to} & \quad \mathbf{A}^T b \geq b_0, \nonumber \end{align}\] where \(b\) is the unknown parameter. For more details, read the documentation of the package on CRAN. Use our the provided training data. This is a linearly separable problem.

  # this is how I generate the data
  # set.seed(3)
  # n = 30
  # p = 2
  # xpos = matrix(rnorm(n*p, mean=0, sd=1), n, p)
  # xneg = matrix(rnorm(n*p, mean=3, sd=1), n, p)
  # x = rbind(xpos, xneg)
  # y = matrix(c(rep(1, n), rep(-1, n)))
  # 
  # train = data.frame(x1 = x[, 1], x2 = x[, 2], y = y)
  # write.csv(train, "SVM-Q1.csv", row.names = FALSE)

  train = read.csv("SVM-Q1.csv")
  x = as.matrix(train[, 1:2])
  y = train[, 3]
  
  plot(x, col=ifelse(y>0,"darkorange", "deepskyblue"), pch = 19, xlab = "x1", ylab = "x2")
  legend("topleft", c("Positive","Negative"), col=c("darkorange", "deepskyblue"), 
         pch=c(19, 19), text.col=c("darkorange", "deepskyblue"))

a) [25 points] The Primal Form

Use the formulation defined on page 13 of the SVM lecture note. The primal problem is

\[ \begin{align} \quad \underset{\beta_{0}, \boldsymbol{\beta}}{\text{minimize}} &\quad \frac{1}{2} \|\boldsymbol{\beta}\|^{2} \\ \text{subject to} &\quad y_{i}\left(x_{i}^{\top} \boldsymbol{\beta}+\beta_{0}\right) \geq 1, \,\, \text{for} \,\, i=1, \ldots, n \end{align} \]

Perform the following:

Let \(b = (\beta_0, \boldsymbol \beta)\) in the solve.QP() function. Properly define \(\mathbf{D}\), \(d\), \(\mathbf{A}\) and \(b_0\) corresponding to this \(b\) for the linearly separable SVM primal problem.
Calculate the decision function by solving this optimization problem with the solve.QP() function.
Report our \(\beta_0\) and \(\boldsymbol \beta\)
Plot the decision line on top the previous training data scatter plot. Include the two margin lines. Clearly mark the support vectors.

Note: The package requires \(\mathbf{D}\) to be positive definite, while it is not true in our case. To address this problem, add \(10^{-10}\) to the top-left element of your \(\mathbf{D}\) matrix, which is the one corresponding to \(\beta_0\). This will make \(\mathbf{D}\) invertible. This may affect your results slightly. So be careful when plotting your support vectors.

b) [25 points] The Dual Form

Formulate the SVM dual problem on page 21 the lecture note. The dual problem is

\[ \begin{align} \underset{\boldsymbol \alpha}{\text{maximize}} & \quad \sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i, j=1}^{n} y_{i} y_{j} \alpha_{i} \alpha_{j} x_{i}^{\top} x_{j} \\ \text{subject to} & \quad \alpha_{i} \geq 0, \,\, \text{for} \,\, i=1, \ldots, n \\ \text{and} & \quad \sum_{i=1}^{n} \alpha_{i} y_{i}=0 \end{align} \]

Perform the following:

Let \(b = (\alpha_1, \ldots, \alpha_n)^T\). Then properly define \(\mathbf{D}\), \(d\), \(\mathbf{A}\) and \(b_0\) corresponding to this \(b\) for our SVM problem.
Note: Equality constrains can be addressed using the meq argument.
Obtain the solution using the solve.QP() function, and convert the solution into \(\boldsymbol \beta\) and \(\beta_0\).

You need to report

A table including \(\beta_0, \beta_1, \beta_2)\) of both Q1a and Q1b. Only keep first three digits after the decimal point.
Plot the decision line on top of our scatter plot. Include the two margin lines. Clearly mark the support vectors.
Report the \(\ell_1\) norm of \(\beta_{Q1a} - \beta_{Q1b}\), where \(\beta_{Q1a}\) and \(\beta_{Q2b}\) are the 3-dimensional solution obtained in Q1a and Q1b, respectively.

Note: Again, \(\mathbf{D}\) may not be positive definite. This time, add \(10^{-10}\) to all diagonal elements to \(\mathbf{D}\). This may affect your results slightly. So be careful when plotting your support vectors.

Question 2 [20 Points] Linearly nonseparable SVM

In this question, we will follow the formulation in Page 30 to solve a linearly nonseparable SVM. The dual problem is given by

\[ \begin{align} \underset{\boldsymbol \alpha}{\text{maximize}} & \quad \sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i, j=1}^{n} y_{i} y_{j} \alpha_{i} \alpha_{j} x_{i}^{\top} x_{j} \\ \text{subject to} & \quad 0 \leq \alpha_{i} \leq C, \,\, \text{for} \,\, i=1, \ldots, n \\ \text{and} & \quad \sum_{i=1}^{n} \alpha_{i} y_{i}=0 \end{align} \]

Perform the following:

Let \(b = (\alpha_1, \ldots, \alpha_n)^T\). Then properly define \(\mathbf{D}\), \(d\), \(\mathbf{A}\) and \(b_0\) corresponding to this \(b\) for this problem. Use \(C = 1\) as the penalty team.
Note: Equality constrains can be addressed using the meq argument.
Obtain the solution using the solve.QP() function, and convert the solution into \(\boldsymbol \beta\) and \(\beta_0\). Note:
- use the information provided on page 32 to obtain the support vectors and \(\beta_0\).
- Your solution may encounter numerical errors, e.g., very small negative \(\alpha\) values, or values very close to \(C\). You could consider thresholding them to exactly 0 or \(C\)
- Your \(\mathbf{D}\) may not be definite positive, so consider adding \(10^{-10}\) to its diagonal elements.

You need to report

The training error
The fitted \(\boldsymbol\beta\) and \(\beta_0\)
Plot the decision line on top of the scatter plot of data. Include the two margin lines. Clearly mark the support vectors.

  # set.seed(20)
  # n = 200 # number of data points for each class
  # p = 2 # dimension

  # Generate the positive and negative examples
  # xpos <- matrix(rnorm(n*p,mean=0,sd=1),n,p)
  # xneg <- matrix(rnorm(n*p,mean=1.5,sd=1),n,p)
  # x <- rbind(xpos,xneg)
  # y <- c(rep(-1, n), rep(1, n))
  # train = data.frame(x1 = x[, 1], x2 = x[, 2], y = y)
  # write.csv(train, "SVM-Q2.csv", row.names = FALSE)

  train = read.csv("SVM-Q2.csv")
  x = as.matrix(train[, 1:2])
  y = train[, 3]
  
  set.seed(20)
  n = 200 # number of data points for each class
  p = 2 # dimension

  # Generate the positive and negative examples
  xpos <- matrix(rnorm(n*p,mean=0,sd=1),n,p)
  xneg <- matrix(rnorm(n*p,mean=1.5,sd=1),n,p)
  x <- rbind(xpos,xneg)
  y <- c(rep(-1, n), rep(1, n))
  
  
  plot(x, col=ifelse(y>0,"darkorange", "deepskyblue"), pch = 19, xlab = "x1", ylab = "x2")
  legend("topleft", c("Positive","Negative"), col=c("darkorange", "deepskyblue"), 
         pch=c(19, 19), text.col=c("darkorange", "deepskyblue"))

Question 3 [30 Points] Penalized Loss Linear SVM

We can also perform linear and nonlinear classification using the penalized loss framework. In this question, we will only use the linear version. Use the same dataset in Question 2. Consider the following logistic loss function:

\[L(y, f(x)) = \log(1 + e^{- y f(x)}).\] The rest of the job is to solve this optimization problem if given the functional form of \(f(x)\). To do this, we will utilize the general-purpose optimization package/function. For example, in R, you can use the optim() function. Read the documentation of this function (or equivalent ones in Python) and set up the objective function properly to solve for the parameters. If you need an example of how to use the optim() function, read the corresponding part in the example file provided on our course website here (Section 10).

We let \(f(x)\) is to be a linear function, SVM can be solved by optimizing a penalized loss: \[ \underset{\beta_0, \boldsymbol\beta}{\arg\min} \quad \sum_{i=1}^n L(y_i, \beta_0 + x_i^T \boldsymbol\beta) + \lambda \lVert \beta \rVert^2\] You should use the data from Question 2, and answer these questions:

[10 pts] Drive the gradient of this penalized loss function, typeset with LaTex.
[10 pts] Write a penalized loss objective function SVMfn(b, x, y, lambda) and its gradient SVMgn(b, x, y, lambda).
[10 pts] Solve the coefficients using optim() and your objective and gradient functions with \(\lambda = 1\) and BFGS method. Use 0 as the initialized value.

Report the followings:

Your coefficients
Your loss and mis-classification rate on training data.
Plot all data and the decision line
Hint: If you want to check your gradient function, you can run optim() without a this gradient function and compare the parameters to your previous ones. Note this will be much slower. You are not required to report this result.

STAT 542: Homework 8

Spring 2022, by Ruoqing Zhu (rqzhu)

Due: Thursday, March 31, 11:59 PM CT

Instruction

About HW8

Question 1 [50 Points] Sovling SVM using Quadratic Programming

a) [25 points] The Primal Form

b) [25 points] The Dual Form

Question 2 [20 Points] Linearly nonseparable SVM

Question 3 [30 Points] Penalized Loss Linear SVM