Exercises D
Data mining - CdL CLAMSES
Homepage
Theoretical exercises
D.1 - Bias-variance of KNN
Consider the k-nearest neighbours estimator \hat{f}(x) for a regression problem Y_i = f(x_i) + \epsilon_i, under the usual assumptions. Find the bias and the variance of \hat{f}(x) and discuss their behavior as function of k.
D.2 - Differentiability of Nadaraya-Watson
Show that the Nadaraya-Watson estimator with fixed bandwidth h and a Gaussian kernel is differentiable.
What can be said about the Epanechnikov kernel?
D.3 - Local linear regression (explicit formula)
Prove the theorem about the explicit formula for local linear regression estimator stated in this slide.
D.4 - Preservation of linear trends
Show that local linear regression applied to (x_i, y_i) preserves the linear part of the fit. In other words, let us decompose y_i = \hat{y}_{i,\text{ols}} + r_i, where \hat{y}_{i,\text{ols}} = \hat{\beta}_0 + \hat{\beta}_1 x_i represents the linear regression estimate, and \bm{S} is the smoothing matrix, then \bm{S}\bm{y} = \bm{S}\hat{\bm{y}}_\text{ols} + \bm{S}\bm{r} = \hat{\bm{y}}_\text{ols} + \bm{S}\bm{r}. Another way of looking at this property is the following: if the points (x_i, y_i) belong to a line, then the fitted values of a local linear regression coincide with y_i. More formally, show that if y_i = \alpha + \beta x_i, then \hat{f}(x) = \sum_{i=1}^ns_i(x)y_i = \alpha + \beta x.
Does the same property hold for the Nadaraya-Watson estimator?
Hint. Begin by showing that \sum_{i=1}^ns_i(x) = 1 and that \sum_{i=1}^ns_i(x)(x_i - x) = 0. Exploit these properties to obtain the proof.
D.5 - Conceptual exercise
Every nonparametric regression model involves a smoothing parameter. For example, consider the parameter \lambda of smoothing splines or h of local linear regression.
Can we estimate h and \lambda using a standard method such as the maximum likelihood?
D.6 - Leave-one-out cross-validation
Prove the theorem about the leave-one-out cross validation for “projective” linear smoothers, described in this slide.
Then, show that the theorem is valid for the following estimators:
- Nadaraya-Watson estimator. Hint: use the fact that Nadaraya-Watson is “projective” almost by definition.
- Local linear regression. Hint: prove that local linear regression is “projective”.
- Regression splines. Hint: this is a special instance of linear model, so…
- Smoothing splines. Hint: you may want to use again the Sherman-Morrison formula.
Finally, prove that in all the above cases |y_i - \hat{y}_{-i}| \ge |y_i - \hat{y}_i|. That is, the leave-one-out residuals are always bigger than the regular residuals.
D.7 - Truncated power basis
A cubic spline with one knots \xi can be obtained using a basis of the form x, x^2, x^3 and (x - \xi)_+, as a consequence of the truncated power basis Theorem of this slide.
In this exercise we will show that a function f(x) = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 (x - \xi)_+^3 is a cubic spline, regardless of the values of \beta_0,\dots,\beta_4.
Find a cubic polynomial f_1(x) = a_1 + b_1 x + c_1 x^2 + d_1 x^3, such that f(x) = f_1(x) for all x \le \xi. Express a_1, b_1, c_1, d_1 in terms of \beta_0,\dots,\beta_4.
Find another cubic polynomial f_2(x) = a_2 + b_2 x + c_2 x^2 + d_2 x^3, such that f(x) = f_2(x) for all x > \xi. Express a_2, b_2, c_2, d_2 in terms of \beta_0,\dots,\beta_4. This establishes that f(x) is piecewise polynomial.
Show that f_1(\xi) = f_2(\xi). That is, f(x) is continuous at \xi.
Show that f'_1(\xi) = f'_2(\xi). That is, the first derivative f'(x) is continuous at \xi.
Show that f''_1(\xi) = f''_2(\xi). That is, the second derivative f''(x) is continuous at \xi.
Conclude that f(x) is indeed a cubic spline.
D.8 - Smoothing splines estimator
Derive the smoothing spline estimator \hat{\beta} = (\bm{N}^T\bm{N} + \lambda \bm{\Omega})^{-1}\bm{N}^T\bm{y}, given for granted the validity of the Green and Silverman theorem about the optimality of natural cubic splines, as stated here.
Practical exercises
D.9 - Implementation of local linear regression
Write a function called loclin(x, y, h)
that implements local linear regression using the formula of this slide. You can use any kernel function of your choice.
Compare the results of your function loclin
on the auto
dataset with those of the libraries KernSmooth
and sm
, along the lines of what has been done in class.
D.10 - Implementation of leave-one-out cross-validation
Write a function called loo_cv(X, y, h)
that computes the leave-one-out cross-validation for the loclin(x, y, h)
function, using the result of this slide.
Identify the optimal bandwidth for the loclin
local linear regression, using the auto
dataset.