Instruction

Students are encouraged to work together on homework. However, sharing, copying, or providing any part of a homework solution or code is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to compass2g. No email or hardcopy will be accepted. For late submission policy and grading rubrics, please refer to the course website.

About HW9

In this HW, we use two examples to demonstrate how the RKHS and Representer Theorem are used in practice. In both questions, we will use the radial basis kernel, defined as

\[K(\mathbf{x}, \mathbf{z}) = e^{- \frac{\lVert \mathbf{x} - \mathbf{z}\rVert^2}{2\sigma^2}}.\] You are not required to tune the parameter \(\sigma\). The information will be given.

Question 1 [45 Points] Kernel Ridge Regression

Let’s first generate a set of data using the following code (or some similar code in Python):

  set.seed(1)
  n = 500
  p = 2
  X = matrix(rnorm(n*p, 0, 1), n, p)
  y = 2*sin(X[, 1]*2) + 2*atan(X[, 2]*4) + rnorm(n, 0, 0.5)
  
  alldata = cbind(X, y)
  train = alldata[1:200, ]
  test = alldata[201:500, ]

[5 Points] As a comparison, we can first fit a ridge regression to this data. Use the train part to fit a ridge regression using the glmnet() package with 10-fold cross-validation. Use lambda.min as your tuning parameter to predict the testing data. What is the prediction error?

To fit a kernel ridge regression, we use the following formulation:

\[\underset{\boldsymbol \alpha}{\text{minimize}} \quad \frac{1}{n} \big\lVert \mathbf{y} -\mathbf{K} \boldsymbol \alpha \big\rVert^2 + \lambda \boldsymbol \alpha^\text{T} \mathbf{K} \alpha\]

It should be noted that you could add an \(\alpha_0\) term to this regression to rigorously add the intercept term. However, the theoretical mean of \(Y\) is zero. Hence, it is not a crucial issue, and not required here. Following our derivation in the class, perform the following to complete this kernel ridge regression:

Question 2 [55 Points] Non-linear SVM as Penalized Version

Recall that in HW8, we solved SVM using a penalized loss framework, with the logistic loss function: \[L(y, f(x)) = \log(1 + e^{- y f(x)}).\]

Again, we can specify the form of \(f(\cdot)\) as in a RKHS. And the penalized objective function becomes

\[\frac{1}{n} \sum_{i=1}^n L(y_i, K_i^\text{T} \boldsymbol \alpha) + \lambda \alpha^\text{T} \mathbf{K} \alpha\] where \(\mathbf{K}\) is the same \(n \times n\) kernel matrix as before and \(K_i\) is the \(i\)th column of \(\mathbf{K}\). The data are generated by the following code.

  set.seed(1)
  n = 500
  p = 2
  X = matrix(runif(n*p, -2, 2), n, p)
  y = sign( X[, 2] - sin(pi*X[, 1]))
  alldata = cbind(X, y)
  train = alldata[1:200, ]
  test = alldata[201:500, ]

  # visualize the data 
  plot(train[, 1], train[, 2], col = ifelse(y > 0, "red", "blue"), 
       pch = 19, )
  lines(seq(-3, 3, 0.01), sin(pi*seq(-3, 3, 0.01)) )

Similar to the HW8 linear SVM penalized version, we perform the following steps to complete this model fitting:

Note: 1) In a real problem, you can perform cross-validation to find the best tuning parameters. 2) You may also add a constant term \(\alpha_0\) to the problem to take care of intercept. These two are not required for the HW.