Exercises D

Data mining - CdL CLAMSES

Author

Affiliation

Tommaso Rigon

Università degli Studi di Milano-Bicocca

Homepage

Theoretical exercises

D.1 - Bias-variance of KNN

Consider the k-nearest neighbours estimator \hat{f}(x) for a regression problem Y_i = f(x_i) + \epsilon_i, under the usual assumptions. Find the bias and the variance of \hat{f}(x) and discuss their behavior as function of k.

D.2 - Differentiability of Nadaraya-Watson

Show that the Nadaraya-Watson estimator with fixed bandwidth h and a Gaussian kernel is differentiable.

What can be said about the Epanechnikov kernel?

D.3 - Local linear regression (explicit formula)

Prove the theorem about the explicit formula for local linear regression estimator stated in this slide.

✏️ - Solution

The solution is provided here.

D.4 - Preservation of linear trends

Show that local linear regression applied to (x_i, y_i) preserves the linear part of the fit. In other words, let us decompose y_i = \hat{y}_{i,\text{ols}} + r_i, where \hat{y}_{i,\text{ols}} = \hat{\beta}_0 + \hat{\beta}_1 x_i represents the linear regression estimate, and \bm{S} is the smoothing matrix, then \bm{S}\bm{y} = \bm{S}\hat{\bm{y}}_\text{ols} + \bm{S}\bm{r} = \hat{\bm{y}}_\text{ols} + \bm{S}\bm{r}. Another way of looking at this property is the following: if the points (x_i, y_i) belong to a line, then the fitted values of a local linear regression coincide with y_i. More formally, show that if y_i = \alpha + \beta x_i, then \hat{f}(x) = \sum_{i=1}^ns_i(x)y_i = \alpha + \beta x.

Does the same property hold for the Nadaraya-Watson estimator?

Hint. Begin by showing that \sum_{i=1}^ns_i(x) = 1 and that \sum_{i=1}^ns_i(x)(x_i - x) = 0. Exploit these properties to obtain the proof.

D.5 - Conceptual exercise

Every nonparametric regression model involves a smoothing parameter. For example, consider the parameter \lambda of smoothing splines or h of local linear regression.

Can we estimate h and \lambda using a standard method such as the maximum likelihood?

D.6 - Leave-one-out cross-validation

Prove the theorem about the leave-one-out cross validation for “projective” linear smoothers, described in this slide.

Then, show that the theorem is valid for the following estimators:

Nadaraya-Watson estimator. Hint: use the fact that Nadaraya-Watson is “projective” almost by definition.
Local linear regression. Hint: prove that local linear regression is “projective”.
Regression splines. Hint: this is a special instance of linear model, so…
Smoothing splines. Hint: you may want to use again the Sherman-Morrison formula.

Partial solution

To be added. Refer to the notes available on e-learning for a solution of the first part of the exercise.

D.7 - Truncated power basis

A cubic spline with one knots \xi can be obtained using a basis of the form x, x^2, x^3 and (x - \xi)_+, as a consequence of the truncated power basis Theorem of this slide.

In this exercise we will show that a function f(x) = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 (x - \xi)_+^3 is a cubic spline, regardless of the values of \beta_0,\dots,\beta_4.

Find a cubic polynomial f_1(x) = a_1 + b_1 x + c_1 x^2 + d_1 x^3, such that f(x) = f_1(x) for all x \le \xi. Express a_1, b_1, c_1, d_1 in terms of \beta_0,\dots,\beta_4.
Find another cubic polynomial f_2(x) = a_2 + b_2 x + c_2 x^2 + d_2 x^3, such that f(x) = f_2(x) for all x > \xi. Express a_2, b_2, c_2, d_2 in terms of \beta_0,\dots,\beta_4. This establishes that f(x) is piecewise polynomial.
Show that f_1(\xi) = f_2(\xi). That is, f(x) is continuous at \xi.
Show that f'_1(\xi) = f'_2(\xi). That is, the first derivative f'(x) is continuous at \xi.
Show that f''_1(\xi) = f''_2(\xi). That is, the second derivative f''(x) is continuous at \xi.

Conclude that f(x) is indeed a cubic spline.

D.8 - Smoothing splines estimator

Derive the smoothing spline estimator \hat{\beta} = (\bm{N}^T\bm{N} + \lambda \bm{\Omega})^{-1}\bm{N}^T\bm{y}, given for granted the validity of the Green and Silverman theorem about the optimality of natural cubic splines, as stated here.

Practical exercises

D.9 - Implementation of local linear regression

Write a function called loclin(x, y, h) that implements local linear regression using the formula of this slide. You can use any kernel function of your choice.

Compare the results of your function loclin on the auto dataset with those of the libraries KernSmooth and sm, along the lines of what has been done in class.

D.10 - Implementation of leave-one-out cross-validation

Write a function called loo_cv(X, y, h) that computes the leave-one-out cross-validation for the loclin(x, y, h) function, using the result of this slide.

Identify the optimal bandwidth for the loclin local linear regression, using the auto dataset.