Please remove this section when submitting your homework.
Students are encouraged to work together on homework and/or utilize advanced AI tools. However, there are two basic rules:
Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. Please refer to the course website for late submission policy and grading rubrics.
HWx_yourNetID.pdf
. For example,
HW01_rqzhu.pdf
. Please note that this must be a
.pdf
file. .html
format will not be
accepted because they are often not readable on gradescope.
Make all of your R
code chunks visible for
grading.R
is \(\geq 4.0.0\). This
will ensure your random seed generation is the same as everyone
else..Rmd
file
as a template, be sure to remove this instruction
section.In this question, you are required to write your own function for
kernel regression using the Nadaraya-Watson estimator. You are not
allowed to use any existing functions in R
to perform the
kernel regression. Let’s first generate our data.
# generate the training data
set.seed(432)
n = 3000
p = 2
x = matrix(rnorm(n*p), n, p)
y = x[, 1]^2 + rnorm(n)
# define testing data
x0 = c(1.5, rep(0, p-1))
x0
(of dimension \(p\)). You function should be in the form of
MyNW(x0, x, y, h)
, where x
(\(n \times p\)) and y
(\(n \times 1\)) are the training data,
h
is the bandwidth. Within the function, you should be
using a multivariate Gaussian kernel: \[
K(x_0, x_i) = \exp\left( - \, \frac{ \lVert x_0 - x_i \rVert_2^2 }{2
h^2} \right)
\] where \(\lVert \cdot
\rVert_2^2\) is the notation for the squared Euclidean distance,
and \(h\) is the bandwidth. And here I
removed the normalizing constant since it will be
cancelled out in the Nadaraya-Watson estimator (in both the numerator
and denominator). You should then use this kernel function in the
Nadaraya-Watson kernel estimator, defined in our lecture. Please
make sure that your code would automatically work for different values
of dimension \(p\) (see part
d). To test your function, use it on your training data, with
h = n^(-1/6)
, which is an optimal choice of bandwidth for
\(p = 2\) under some smoothness
conditions. Apply your function to predict x0
, and
report:x0
.Note that the idea of this error metric is similar to HW4, but in the non-parametric sense. The prediction should be reasonably close to the true value due to the large sample size.
[15 pts] Now, let’s perform a simulation study to calculate the
Bias\(^2\) and Variance of this kernel
regression estimator. We will use the same model as in part a), but
change the setting to \(n = 100\). The
idea is to repeat a) for many times (nsim
\(= 200\)), and then approximate the
Bias\(^2\) and Variance based on the
understandings we have learned previously. Report your estimated
Bias\(^2\), Variance (see our previous
homework) and the average squared error \[
\frac{1}{\text{nsim}} \sum_{i=1}^{\text{nsim}} \left( \hat{f}_i(x_0) -
f(x_0) \right)^2
\]
[20 pts] We cannot understand the bias-variance trade-off with
just one choice of \(h\). Hence, let’s
consider a range of 50 different \(h\)
values, with h = n^(-1/6)*seq(0.1, 2, length.out = 50)
. You
should then construct a matrix of size nsim
by
50
to record the prediction of each \(h\) in each simulation run. After that,
plot your bias\(^2\), variance and
prediction error against the \(h\)
values in a single figure. Summerize the pattern that you see in this
simulation. Does it match our under standing of the bias-variance
trade-off?
[15 pts] Now we want to see how the performance of the kernel
regression changes as the dimension increases. We will use the same
mechanics to generate the training data. However, with different values
of \(p\), ranging from \(2\) to \(31\), but __with \(h\) fixed as n^(-1/6)
. Please
note that the true value for predicting x0
will remain the
same since it only depends on the first variable. Do the
following:
nsim
by \(30\) to record the prediction of each
simulation under each choice of dimension.x0
(just the
first \(j\) dimension) using the
training data with just the first \(j\)
dimensions. And record the prediction for each choice of \(j\).nsim = 500
, and increase the sample size to \(n = 300\).After your simulation, calculate the bias\(^2\), variance and prediction error in the same way as your previous question, and plot the result against the number of dimensions you used. What do you observe? If your result is not stable enough to draw conclusions, you can consider to increase the number of simulations or slightly increase the sample size.
For this question, let’s consider the effect of low-dimensional manifold on KNN regression. We will consider a setting with some latent structures. You should generate \(n = 400\) observations with \(p = 100\). Use the first 200 as the training data and the rest as testing data. Perform the following steps:
Hence, the expected outcome depends only on the first two latent variables. The goal of this experiment is to observe how the KNN model could be affected by the dimension of the latent space. Please keep in mind that you do not observe \(Z\) in a real world, and you can only observe \(X\) and \(\mathbf{y}\). Perform the following:
[15 pts] Fit a KNN regression using the generated data with \(m = 3\), and predict the corresponding
testing data. Vary \(k\) in a grid
seq(2, 82, 4)
. What are the testing errors? Repeat this
experiment with \(m = 30\), and report
the testing errors.
[15 pts] Now let’s perform a simulation study that repeats 50 times. You should re-generate \(A\) each time. Average your prediction errors (of each \(k\)) over the simulation runs, just like the simulations we have done in previous HW. At the end, you should show a plot that summarizes and compare the prediction errors of two different settings, respectively. For example, using \(k\) as the horizontal axis, and prediction errors in the vertical axis, with two line, each representing a setting of \(m\).
[10 points] In both settings, we are still using 100 variables to fit the KNN model, but the performances are very different. Can you comment on the results? Which setting is easier for KNN to obtain a better fit? Why is that?