Instruction

Students are encouraged to work together on homework. However, sharing, copying, or providing any part of a homework solution or code is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to compass2g. No email or hardcopy will be accepted. For late submission policy and grading rubrics, please refer to the course website.

About HW8

In this HW, we will code both the primal and dual form of SVM and utilize a general quadratic programming (quadprog package) solve to help us obtain the solution.

Question 1 [50 Points] Sovling SVM using Quadratic Programming

Install the quadprog package. The same package is also available in Python. However, make sure to read their documentations carefully. We will utilize the function solve.QP to solve SVM. This function is trying to perform the minimization problem: \[\begin{align} \text{minimize} & \quad \frac{1}{2} b^T \mathbf{D} b - d^T b, \nonumber \\ \text{subject to} & \quad \mathbf{A}^T b \geq b_0, \nonumber \end{align}\] where \(b\) is the unknown parameter. For more details, read the documentation of the package on CRAN. Use our the provided training data. This is a linearly separable problem.

  # this is how I generate the data
  # set.seed(3)
  # n = 30
  # p = 2
  # xpos = matrix(rnorm(n*p, mean=0, sd=1), n, p)
  # xneg = matrix(rnorm(n*p, mean=3, sd=1), n, p)
  # x = rbind(xpos, xneg)
  # y = matrix(c(rep(1, n), rep(-1, n)))
  # 
  # train = data.frame(x1 = x[, 1], x2 = x[, 2], y = y)
  # write.csv(train, "SVM-Q1.csv", row.names = FALSE)

  train = read.csv("SVM-Q1.csv")
  x = as.matrix(train[, 1:2])
  y = train[, 3]
  
  plot(x, col=ifelse(y>0,"darkorange", "deepskyblue"), pch = 19, xlab = "x1", ylab = "x2")
  legend("topleft", c("Positive","Negative"), col=c("darkorange", "deepskyblue"), 
         pch=c(19, 19), text.col=c("darkorange", "deepskyblue"))

a) [25 points] The Primal Form

Use the formulation defined on page 13 of the SVM lecture note. The primal problem is

\[ \begin{align} \quad \underset{\beta_{0}, \boldsymbol{\beta}}{\text{minimize}} &\quad \frac{1}{2} \|\boldsymbol{\beta}\|^{2} \\ \text{subject to} &\quad y_{i}\left(x_{i}^{\top} \boldsymbol{\beta}+\beta_{0}\right) \geq 1, \,\, \text{for} \,\, i=1, \ldots, n \end{align} \]

Perform the following:

  • Let \(b = (\beta_0, \boldsymbol \beta)\) in the solve.QP() function. Properly define \(\mathbf{D}\), \(d\), \(\mathbf{A}\) and \(b_0\) corresponding to this \(b\) for the linearly separable SVM primal problem.
  • Calculate the decision function by solving this optimization problem with the solve.QP() function.
  • Report our \(\beta_0\) and \(\boldsymbol \beta\)
  • Plot the decision line on top the previous training data scatter plot. Include the two margin lines. Clearly mark the support vectors.

Note: The package requires \(\mathbf{D}\) to be positive definite, while it is not true in our case. To address this problem, add \(10^{-10}\) to the top-left element of your \(\mathbf{D}\) matrix, which is the one corresponding to \(\beta_0\). This will make \(\mathbf{D}\) invertible. This may affect your results slightly. So be careful when plotting your support vectors.

b) [25 points] The Dual Form

Formulate the SVM dual problem on page 21 the lecture note. The dual problem is

\[ \begin{align} \underset{\boldsymbol \alpha}{\text{maximize}} & \quad \sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i, j=1}^{n} y_{i} y_{j} \alpha_{i} \alpha_{j} x_{i}^{\top} x_{j} \\ \text{subject to} & \quad \alpha_{i} \geq 0, \,\, \text{for} \,\, i=1, \ldots, n \\ \text{and} & \quad \sum_{i=1}^{n} \alpha_{i} y_{i}=0 \end{align} \]

Perform the following:

  • Let \(b = (\alpha_1, \ldots, \alpha_n)^T\). Then properly define \(\mathbf{D}\), \(d\), \(\mathbf{A}\) and \(b_0\) corresponding to this \(b\) for our SVM problem.
  • Note: Equality constrains can be addressed using the meq argument.
  • Obtain the solution using the solve.QP() function, and convert the solution into \(\boldsymbol \beta\) and \(\beta_0\).

You need to report

  • A table including \(\beta_0, \beta_1, \beta_2)\) of both Q1a and Q1b. Only keep first three digits after the decimal point.
  • Plot the decision line on top of our scatter plot. Include the two margin lines. Clearly mark the support vectors.
  • Report the \(\ell_1\) norm of \(\beta_{Q1a} - \beta_{Q1b}\), where \(\beta_{Q1a}\) and \(\beta_{Q2b}\) are the 3-dimensional solution obtained in Q1a and Q1b, respectively.

Note: Again, \(\mathbf{D}\) may not be positive definite. This time, add \(10^{-10}\) to all diagonal elements to \(\mathbf{D}\). This may affect your results slightly. So be careful when plotting your support vectors.

Question 2 [20 Points] Linearly nonseparable SVM

In this question, we will follow the formulation in Page 30 to solve a linearly nonseparable SVM. The dual problem is given by

\[ \begin{align} \underset{\boldsymbol \alpha}{\text{maximize}} & \quad \sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i, j=1}^{n} y_{i} y_{j} \alpha_{i} \alpha_{j} x_{i}^{\top} x_{j} \\ \text{subject to} & \quad 0 \leq \alpha_{i} \leq C, \,\, \text{for} \,\, i=1, \ldots, n \\ \text{and} & \quad \sum_{i=1}^{n} \alpha_{i} y_{i}=0 \end{align} \]

Perform the following:

You need to report

  # set.seed(20)
  # n = 200 # number of data points for each class
  # p = 2 # dimension

  # Generate the positive and negative examples
  # xpos <- matrix(rnorm(n*p,mean=0,sd=1),n,p)
  # xneg <- matrix(rnorm(n*p,mean=1.5,sd=1),n,p)
  # x <- rbind(xpos,xneg)
  # y <- c(rep(-1, n), rep(1, n))
  # train = data.frame(x1 = x[, 1], x2 = x[, 2], y = y)
  # write.csv(train, "SVM-Q2.csv", row.names = FALSE)

  train = read.csv("SVM-Q2.csv")
  x = as.matrix(train[, 1:2])
  y = train[, 3]
  
  set.seed(20)
  n = 200 # number of data points for each class
  p = 2 # dimension

  # Generate the positive and negative examples
  xpos <- matrix(rnorm(n*p,mean=0,sd=1),n,p)
  xneg <- matrix(rnorm(n*p,mean=1.5,sd=1),n,p)
  x <- rbind(xpos,xneg)
  y <- c(rep(-1, n), rep(1, n))
  
  
  plot(x, col=ifelse(y>0,"darkorange", "deepskyblue"), pch = 19, xlab = "x1", ylab = "x2")
  legend("topleft", c("Positive","Negative"), col=c("darkorange", "deepskyblue"), 
         pch=c(19, 19), text.col=c("darkorange", "deepskyblue"))

Question 3 [30 Points] Penalized Loss Linear SVM

We can also perform linear and nonlinear classification using the penalized loss framework. In this question, we will only use the linear version. Use the same dataset in Question 2. Consider the following logistic loss function:

\[L(y, f(x)) = \log(1 + e^{- y f(x)}).\] The rest of the job is to solve this optimization problem if given the functional form of \(f(x)\). To do this, we will utilize the general-purpose optimization package/function. For example, in R, you can use the optim() function. Read the documentation of this function (or equivalent ones in Python) and set up the objective function properly to solve for the parameters. If you need an example of how to use the optim() function, read the corresponding part in the example file provided on our course website here (Section 10).

We let \(f(x)\) is to be a linear function, SVM can be solved by optimizing a penalized loss: \[ \underset{\beta_0, \boldsymbol\beta}{\arg\min} \quad \sum_{i=1}^n L(y_i, \beta_0 + x_i^T \boldsymbol\beta) + \lambda \lVert \beta \rVert^2\] You should use the data from Question 2, and answer these questions:

Report the followings: