Abstract
This is the supplementaryR
file for ridge regression in the lecture note “Penalized”.
Ridge regression solves the following \(\ell_2\) penalized linear model
\[\widehat {\boldsymbol\beta}^{\,\text{ridge}} = \underset{{\boldsymbol\beta}}{\arg\min} \,\, \frac{1}{n}\lVert \mathbf y- \mathbf X{\boldsymbol\beta}\rVert^2 + \lambda \lVert {\boldsymbol\beta}\rVert^2\]
The solution can be obtained through
\[{\boldsymbol\beta}= \big(\mathbf X^\text{T}\mathbf X+ n \lambda \mathbf I\big)^{-1} \mathbf X^\text{T}\mathbf y\]
We use the prostate cancer data prostate
from the ElemStatLearn
package. The dataset contains 8 explanatory variables and one outcome lpsa
, the log prostate-specific antigen value.
library(ElemStatLearn)
head(prostate)
## lcavol lweight age lbph svi lcp gleason pgg45 lpsa train
## 1 -0.5798185 2.769459 50 -1.386294 0 -1.386294 6 0 -0.4307829 TRUE
## 2 -0.9942523 3.319626 58 -1.386294 0 -1.386294 6 0 -0.1625189 TRUE
## 3 -0.5108256 2.691243 74 -1.386294 0 -1.386294 7 20 -0.1625189 TRUE
## 4 -1.2039728 3.282789 58 -1.386294 0 -1.386294 6 0 -0.1625189 TRUE
## 5 0.7514161 3.432373 62 -1.386294 0 -1.386294 6 0 0.3715636 TRUE
## 6 -1.0498221 3.228826 50 -1.386294 0 -1.386294 6 0 0.7654678 TRUE
We fit a ridge regression on a grid of \(\lambda\) values. For each \(\lambda\), the coefficients of all variables are recorded. The left plot shows how these coefficients change as a function of \(\lambda\) We can easily see that as \(\lambda\) becomes larger, the coefficients are shrunken towards 0. To select the best \(\lambda\) value, we use the GCV (generalized cross-validation) criteria. The right plot shows how GCV changes as a function of \(\lambda\). Becareful that the coefficients of the fitted objects fit$coef
are scaled by the standard deviation of the covariates. If you need the original scale, use coef(fit)
.
fit = lm.ridge(lpsa~., prostate[, -10], lambda=seq(0, 30, length.out = 100))
par(mfrow=c(1,2))
matplot(coef(fit)[, -1], type = "l", xlab = "Lambda", ylab = "Coefficients")
text(rep(50, 8), coef(fit)[1,-1], colnames(prostate)[1:8])
title("Prostate Cancer Data: Ridge Coefficients")
# use GCV to select the best lambda
plot(fit$lambda[1:100], fit$GCV[1:100], type = "l", col = "darkorange",
ylab = "GCV", xlab = "Lambda", lwd = 3)
title("Prostate Cancer Data: GCV")
We select the best \(\lambda\) that produces the smallest GCV.
fit$lambda[which.min(fit$GCV)]
## [1] 6.666667
round(coef(fit)[which.min(fit$GCV), ], 4)
## lcavol lweight age lbph svi lcp gleason pgg45
## 0.0190 0.4960 0.6054 -0.0170 0.0864 0.6896 -0.0429 0.0632 0.0035
An alternative approach of selecting the best \(\lambda\) is using cross-validation. This can be done using the glmnet
package. Note that for ridge regression, we need to specify alpha = 0
.
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.1.2
## Warning: package 'Matrix' was built under R version 4.1.2
set.seed(3)
fit2 = cv.glmnet(data.matrix(prostate[, 1:8]), prostate$lpsa, nfolds = 10, alpha = 0)
coef(fit2, s = "lambda.min")
## 9 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 0.011566731
## lcavol 0.492211875
## lweight 0.604155671
## age -0.016727236
## lbph 0.085820464
## svi 0.685477646
## lcp -0.039717080
## gleason 0.063806235
## pgg45 0.003411982
plot(fit2$glmnet.fit, "lambda")