Students are encouraged to work together on homework. However, sharing, copying, or providing any part of a homework solution or code is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. For late submission policy and grading rubrics, please refer to the course website.
HWx_yourNetID.pdf
. For example,
HW01_rqzhu.pdf
. Please note that this must be a
.pdf
file. .html
format
cannot be accepted since they may not be readable in
Gradescope. All proofs must be typed in LaTeX format. Make all of your
R
code chunks visible for grading..Rmd
file
as a template, be sure to remove this instruction
section.R
version \(\geq 4.0.0\).
This will ensure your random seed generation is the same as everyone
else. Please note that updating the R
version may require
you to reinstall all of your packages. In the markdown file, set
seed properly so that the results can be reproduced.Load an image from the ElemStatLearn
package. This is an
archived package on CRAN. If you do not have this package, download and
install the 2015.6.26.2
version at here,
or use the install_github()
function from the
devtools
package. Here is the plot of the first image in
the zip.train
dataset, which is a digit 6
. The
resolution of this image is \(16 \times
16\). “blow it up” to \(48 \times
48\) by replicating each pixel into a \(3 \times 3\) block. Here is a plot of the
original and enlarged image, they look exactly the same, just different
dimensions.
# Handwritten Digit Recognition Data
library(ElemStatLearn)
# plot two images
par(mfrow=c(1,2), mar=c(0,0,1,0))
# look at the first sample
img16 <- zip2image(zip.train, 1)
## [1] "digit 6 taken"
image(img16, col=gray(256:0/256), zlim=c(0,1),
xlab="", ylab="", axes = FALSE)
# change the resolution of this image to 48 x 48
img48 <- img16[rep(1:16, each = 3), rep(1:16, each = 3)]
# plot the enlarged image
image(img48, col = gray(256:0/256), zlim=c(0,1),
xlab="", ylab="", axes=FALSE)
Although the second image is larger, it is still very pixelated. Let’s consider a way to make it better. Treat the two dimensions of the 48 x 48 image as two covariates, and apply a (2-dimensional) smoothing method with RKHS to obtain a smoothed pixel gray scale values and make the image look better. For this question, you should
Let \(\cal X\) be a set and \(\cal F\) be a Hilbert space. \(\Phi\) is a map from \(\cal X\) to \(\cal F\). If we define a kernel function \(k(\cdot, \cdot)\) as \[ k(x, x') = \langle \Phi(x), \Phi(x') \rangle_{\cal F}, \quad \forall x, x' \in \cal X, \] show that \(k\) is a positive definite kernel (Hint: use definition).
Suppose \(k_1\) and \(k_2\) are two positive definite kernels on \(\cal X\). Show that the kernel \(k = k_1 + k_2\) is also a positive definite kernel.
Consider a kernel function as \[ k(x, x') = 1\{ | x - x' | < \sigma \} \] is this a positive definite kernel? If yes, prove it. If no, give a counter example.
Show that if a reproducing kernel \(k(\cdot, \cdot)\) exists for a Hilbert space \(\cal H \in \mathbb{R}^{\cal X}\), then it is unique. Hint: you may consider assuming that there are two different kernel functions \(k_1\) and \(k_2\) for \(\cal H\), and then show that the Hilbert norm of their difference \(\|k_1(x, \cdot) - k_2(x, \cdot)\|_{\cal H}\) is zero for any \(x \in \cal X\). To do that, expand this equation and use properties of a Hilbert space and the reproducing property.
By the same approach as kernel ridge regression, we can also use the kernel method to do logistic regression. For a logistic regression with linear predictor, the -2 log-likelihood function is given by
\[ -2 \ell(\beta) = -2 \sum_{i=1}^n \left[ y_i x_i^T \beta - \log(1 + \exp(x_i^T \beta)) \right], \] where we used \(x_i^T \beta\) to model the effect. Now we know that this can be changed to \(f(x_i)\) for some function \(f \in \cal H\). And all we need to do is to specify a kernel function, and also add a penalty term \(\lambda \|f\|_{\cal H}^2\). So the optimization problem becomes
\[ \widehat\alpha = \arg\min_{\alpha \in \mathbb{R}^n} \left\{ -2 \sum_{i=1}^n \left[ y_i K_i^T \alpha - \log(1 + \exp(K_i^T \alpha) \right] + \lambda \alpha^T K \alpha \right\}, \]
where \(K_i\) is the \(i\)-th row in the kernel matrix \(K\). The analytic solution does not exist,
but you can define this loss function and throw it into the
optim()
function in R
to get the solution. For
this question, use the following training and testing data from the
handwritten digit dataset with digits 5 and 6 only. For this question,
you should:
optim()
to get the solution of the coefficients. # get the first 100 observations with digits 5 and 6
digits5 <- zip.train[zip.train[,1] == 5, ]
digits5 <- digits5[1:50, ]
digits6 <- zip.train[zip.train[,1] == 6, ]
digits6 <- digits6[1:50, ]
train56 <- rbind(digits5, digits6)
# get the testing data
digits5_test <- zip.test[zip.test[,1] == 5, ]
digits5_test <- digits5_test[1:50, ]
digits6_test <- zip.test[zip.test[,1] == 6, ]
digits6_test <- digits6_test[1:50, ]
test56 <- rbind(digits5_test, digits6_test)