Please remove this section when submitting your homework.
Students are encouraged to work together on homework and/or utilize advanced AI tools. However, sharing, copying, or providing any part of a homework solution or code to others is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. For late submission policy and grading rubrics, please refer to the course website.
HWx_yourNetID.pdf
. For example,
HW01_rqzhu.pdf
. Please note that this must be a
.pdf
file. .html
format
cannot be accepted. Make all of your R
code chunks visible for grading..Rmd
file
as a template, be sure to remove this instruction
section.R
is \(\geq
4.0.0\). This will ensure your random seed generation is the same
as everyone else. Please note that updating the R
version
may require you to reinstall all of your packages.Similar to the previous homework, we will use simulated datasets to evaluate a kernel regression model. You should write your own code to complete this question. We use two-dimensional data generator:
\[ Y = \exp(\beta^T x) + \epsilon \]
where \(\beta = c(1, 1)\), \(X\) is generated uniformly from \([0, 1]^2\), and \(\epsilon\) follows i.i.d. standard Gaussian. Use the following code to generate a set of training and testing data:
set.seed(2)
trainn <- 200
testn <- 1
p = 2
beta <- c(1.5, 1.5)
# generate data
Xtrain <- matrix(runif(trainn * p), ncol = p)
Ytrain <- exp(Xtrain %*% beta) + rnorm(trainn)
Xtest <- matrix(runif(testn * p), ncol = p)
# the first testing observation
Xtest
## [,1] [,2]
## [1,] 0.4152441 0.5314388
# the true expectation of the first testing observation
exp(Xtest %*% beta)
## [,1]
## [1,] 4.137221
Solution:
# Calculate the Euclidean distance between all training data and the first testing data
h = 0.07
w <- exp(-rowSums(sweep(Xtrain, 2, Xtest, "-")^2)/h^2/2) # skipping the constant term since it doesnt matter
yhat <- sum(w * Ytrain) / sum(w)
yhat
## [1] 4.198552
# compare with the true mean
yhat - exp(Xtest %*% beta)
## [,1]
## [1,] 0.06133096
Use at least 5000 simulation runs. Based on your simulation, answer the following questions:
Solution:
# Calculate the bias, variance, and MSE
nsim <- 5000
yhat <- numeric(nsim)
errors <- numeric(nsim)
for (i in 1:nsim) {
Xtrain <- matrix(runif(trainn * p), ncol = p)
Ytrain <- exp(Xtrain %*% beta) + rnorm(trainn)
w <- exp(-rowSums(sweep(Xtrain, 2, Xtest, "-")^2)/h^2/2)
yhat[i] <- sum(w * Ytrain) / sum(w)
# ytest
ytest = exp(Xtest %*% beta) + rnorm(1)
errors[i] <- yhat[i] - ytest
}
bias <- mean(yhat) - exp(Xtest %*% beta)
variance <- var(yhat)
mse <- mean(errors^2)
# report results and validate our theory
bias^2
## [,1]
## [1,] 0.001825033
variance
## [1] 0.09545267
bias^2 + variance + 1
## [,1]
## [1,] 1.097278
mse
## [1] 1.076495
Based on our theory, MSE should be the sum of bias^2, variance, and irreducible error. In this case, the MSE is 1.0972777, which is very close to the simulated MSE (1.0764953). This confirms our theory. In terms of bias and variance, the variance is much larger. Hence we should increase the bandwidth \(h\) to reduce the MSE.
Solution:
# define the bandwidth to consider
hseq <- seq(0.02, 0.3, by = 0.02)
biasall <- numeric(length(hseq))
varianceall <- numeric(length(hseq))
nsim <- 5000
# Calculate the prediction error
for (j in 1:length(hseq)) {
h <- hseq[j]
for (i in 1:nsim) {
Xtrain <- matrix(runif(trainn * p), ncol = p)
Ytrain <- exp(Xtrain %*% beta) + rnorm(trainn)
w <- exp(-rowSums(sweep(Xtrain, 2, Xtest, "-")^2)/h^2/2)
yhat[i] <- sum(w * Ytrain) / sum(w)
}
biasall[j] <- mean(yhat) - exp(Xtest %*% beta)
varianceall[j] <- var(yhat)
}
# plot the MSE
plot(hseq, biasall^2 + varianceall, type = "l", xlab = "Bandwidth", ylab = "MSE")
# find the optimal bandwidth
hseq[which.min(biasall^2 + varianceall)]
## [1] 0.12
Our simulation seems to stable enough and with a fine grid of bandwidths, we can see that the MSE is minimized at 0.12. This bandwidth is (almost) optimal for this particular model and sample size.
We introduced the local polynomial regression in the lecture, with the objective function for predicting a target point \(x_0\) defined as
\[ (\mathbf{y} - \mathbf{X}\boldsymbol{\beta}_{x_0})^\text{T} \mathbf{W} (\mathbf{y} - \mathbf{X}\boldsymbol{\beta}_{x_0}), \]
where \(W\) is a diagonal weight matrix, with the \(i\)th diagonal element defined as \(K_h(x_0, x_i)\), the kernel distance between \(x_i\) and \(x_0\). In this question, we will write our own code to implement this model. We will use the same simulated data provided at the beginning of Question 1.
set.seed(2)
trainn <- 200
testn <- 1
p = 2
beta <- c(1.5, 1.5)
# generate data
Xtrain <- matrix(runif(trainn * p), ncol = p)
Ytrain <- exp(Xtrain %*% beta) + rnorm(trainn)
Xtest <- matrix(runif(testn * p), ncol = p)
# the first testing observation
Xtest
## [,1] [,2]
## [1,] 0.4152441 0.5314388
# the true expectation of the first testing observation
exp(Xtest %*% beta)
## [,1]
## [1,] 4.137221
Solution:
# Calculate the weight matrix
h = 0.07
W <- diag(exp(-rowSums(sweep(Xtrain, 2, Xtest, "-")^2)/h^2/2) / (2*pi*h) )
quantile(diag(W), c(0.25, 0.5, 0.75))
## 25% 50% 75%
## 1.312951e-12 1.136713e-07 8.826478e-05
Solution:
The normal equation for estimating the local polynomial regression in matrix form is:
\[ \mathbf{X}^\text{T} \mathbf{W} \mathbf{X} \boldsymbol{\beta}_{x_0} = \mathbf{X}^\text{T} \mathbf{W} \mathbf{y} \]
Hence, the estimated \(\boldsymbol{\beta}_{x_0}\) is:
\[ \boldsymbol{\beta}_{x_0} = (\mathbf{X}^\text{T} \mathbf{W} \mathbf{X})^{-1} \mathbf{X}^\text{T} \mathbf{W} \mathbf{y} \]
Xtest
using the formula you derived. Report the
estimated \(\boldsymbol{\beta}_{x_0}\).
Calculate the prediction on the testing point and compare it with the
true expectation.Solution:
# Calculate the estimated beta
beta_hat <- solve(t(Xtrain) %*% W %*% Xtrain) %*% t(Xtrain) %*% W %*% Ytrain
beta_hat
## [,1]
## [1,] 3.978612
## [2,] 4.907357
# Predict the testing point
yhat <- Xtest %*% beta_hat
# the truth
exp(Xtest %*% beta)
## [,1]
## [1,] 4.137221
set.seed(432)
testn <- 100
Xtest <- matrix(runif(testn * p), ncol = p)
Solution:
# Predict the testing points
yhat <- numeric(testn)
for (i in 1:testn) {
W <- diag(exp(-rowSums(sweep(Xtrain, 2, Xtest[i,], "-")^2)/h^2/2) / (2*pi*h) )
beta_hat <- solve(t(Xtrain) %*% W %*% Xtrain) %*% t(Xtrain) %*% W %*% Ytrain
yhat[i] <- Xtest[i,] %*% beta_hat
}
ymean <- exp(Xtest %*% beta)
plot(ymean, yhat, xlab = "True Expectation", ylab = "Predicted Value")
abline(coef = c(0,1))
# Fit a global linear regression model
lmfit <- lm(y ~ ., data = data.frame(Xtrain, "y" = Ytrain))
yhat_lm <- predict(lmfit, data.frame(Xtest))
plot(ymean, yhat_lm, xlab = "True Expectation", ylab = "Predicted Value")
abline(coef = c(0,1))