Please remove this section when submitting your homework.
Students are encouraged to work together on homework and/or utilize advanced AI tools. However, sharing, copying, or providing any part of a homework solution or code to others is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. For late submission policy and grading rubrics, please refer to the course website.
HWx_yourNetID.pdf
. For example,
HW01_rqzhu.pdf
. Please note that this must be a
.pdf
file. .html
format
cannot be accepted. Make all of your R
code chunks visible for grading..Rmd
file
as a template, be sure to remove this instruction
section.R
is \(\geq
4.0.0\). This will ensure your random seed generation is the same
as everyone else. Please note that updating the R
version
may require you to reinstall all of your packages.Similar to the previous homework, we will use simulated datasets to evaluate a kernel regression model. You should write your own code to complete this question. We use two-dimensional data generator:
\[ Y = \exp(\beta^T x) + \epsilon \]
where \(\beta = c(1, 1)\), \(X\) is generated uniformly from \([0, 1]^2\), and \(\epsilon\) follows i.i.d. standard Gaussian. Use the following code to generate a set of training and testing data:
set.seed(2)
trainn <- 200
testn <- 1
p = 2
beta <- c(1.5, 1.5)
# generate data
Xtrain <- matrix(runif(trainn * p), ncol = p)
Ytrain <- exp(Xtrain %*% beta) + rnorm(trainn)
Xtest <- matrix(runif(testn * p), ncol = p)
# the first testing observation
Xtest
## [,1] [,2]
## [1,] 0.4152441 0.5314388
# the true expectation of the first testing observation
exp(Xtest %*% beta)
## [,1]
## [1,] 4.137221
[10 pts] For this question, you need to write your own code for implementing a two-dimensional Nadaraya-Watson kernel regression estimator, and predict just the first testing observation. For this task, we will use independent Gaussian kernel function introduced during the lecture. Use the same bandwidth \(h\) for both dimensions. As a starting point, use \(h = 0.07\). What is your predicted value?
[20 pts] Based on our previous understanding the bias-variance trade-off of KNN, do the same simulation analysis for the kernel regression model. Again, you only need to consider the predictor of this one testing point. Your simulation needs to be able to calculate the following quantities:
Use at least 5000 simulation runs. Based on your simulation, answer the following questions:
We introduced the local polynomial regression in the lecture, with the objective function for predicting a target point \(x_0\) defined as
\[ (\mathbf{y} - \mathbf{X}\boldsymbol{\beta}_{x_0})^\text{T} \mathbf{W} (\mathbf{y} - \mathbf{X}\boldsymbol{\beta}_{x_0}), \]
where \(W\) is a diagonal weight matrix, with the \(i\)th diagonal element defined as \(K_h(x_0, x_i)\), the kernel distance between \(x_i\) and \(x_0\). In this question, we will write our own code to implement this model. We will use the same simulated data provided at the beginning of Question 1.
set.seed(2)
trainn <- 200
testn <- 1
p = 2
beta <- c(1.5, 1.5)
# generate data
Xtrain <- matrix(runif(trainn * p), ncol = p)
Ytrain <- exp(Xtrain %*% beta) + rnorm(trainn)
Xtest <- matrix(runif(testn * p), ncol = p)
[10 pts] Using the same kernel function as Question 1, calculate the kernel weights of \(x_0\) against all observed training data points. Report the 25th, 50th and 75th percentiles of the weights so we can check your answer.
[15 pts] Based on the objective function, derive the normal equation for estimating the local polynomial regression in matrix form. And then define the estimated \(\boldsymbol{\beta}_{x_0}\). Write your answer in latex.
[10 pts] Based on the observed data provided in Question 1,
calculate the estimated \(\boldsymbol{\beta}_{x_0}\) for the testing
point Xtest
using the formula you derived. Report the
estimated \(\boldsymbol{\beta}_{x_0}\).
Calculate the prediction on the testing point and compare it with the
true expectation.
[20 pts] Now, let’s use this model to predict the following 100 testing points. After you fit the model, provide a scatter plot of the true expectation versus the predicted values on these testing points. Does this seem to be a good fit? As a comparison, fit a global linear regression model to the training data and predict the testing points. Does your local linear model outperforms the global linear mode? Note: this is not a simulation study. You should use the same training data provided previously.
set.seed(432)
testn <- 100
Xtest <- matrix(runif(testn * p), ncol = p)