Exercises D

Data mining - CdL CLAMSES

Author
Affiliation

Tommaso Rigon

Università degli Studi di Milano-Bicocca

Homepage

The theoretical exercises described below could be quite difficult. At the exam, you can expect a simplified version of them; otherwise, they would represent a formidable challenge for most of you.

Theoretical exercises

Consider the k-nearest neighbours estimator \hat{f}(x) for a regression problem Y_i = f(x_i) + \epsilon_i, under the usual assumptions. Find the bias and the variance of \hat{f}(x) and discuss their behavior as function of k.

Show that the Nadaraya-Watson estimator with fixed bandwidth h and a Gaussian kernel is differentiable.

What can be said about the Epanechnikov kernel?

Show that local linear regression applied to (x_i, y_i) preserves the linear part of the fit. In other words, let us decompose y_i = \hat{y}_{i,\text{ols}} + r_i, where \hat{y}_{i,\text{ols}} = \hat{\beta}_0 + \hat{\beta}_1 x_i represents the linear regression estimate, and \bm{S} is the smoothing matrix, then \bm{S}\bm{y} = \bm{S}\hat{\bm{y}}_\text{ols} + \bm{S}\bm{r} = \hat{\bm{y}}_\text{ols} + \bm{S}\bm{r}. Another way of looking at this property is the following: if the points (x_i, y_i) belong to a line, then the fitted values of a local linear regression coincide with y_i. More formally, show that if y_i = \alpha + \beta x_i, then \hat{f}(x) = \sum_{i=1}^ns_i(x)y_i = \alpha + \beta x.

Does the same property hold for the Nadaraya-Watson estimator?

Hint. Begin by showing that \sum_{i=1}^ns_i(x) = 1 and that \sum_{i=1}^ns_i(x)(x_i - x) = 0. Exploit these properties to obtain the proof.

Every nonparametric regression model involves a smoothing parameter. For example, consider the parameter \lambda of smoothing splines or h of local linear regression.

Can we estimate h and \lambda using a standard method such as the maximum likelihood?

Consider the theorem on leave-one-out cross-validation for “projective” linear smoothers, as described in this slide.

Show that the theorem holds for the following estimators:

  1. Nadaraya–Watson estimator. Hint: use the fact that the Nadaraya–Watson estimator is “projective” almost by definition.
  2. Local linear regression. Hint: prove that local linear regression is “projective.”

A cubic spline with one knot \xi can be obtained using a basis of the form x, x^2, x^3 and (x - \xi)_+, as a consequence of the truncated power basis Theorem of this slide.

In this exercise we will show that a function f(x) = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 (x - \xi)_+^3 is a cubic spline, regardless of the values of \beta_0,\dots,\beta_4.

  1. Find a cubic polynomial f_1(x) = a_1 + b_1 x + c_1 x^2 + d_1 x^3, such that f(x) = f_1(x) for all x \le \xi. Express a_1, b_1, c_1, d_1 in terms of \beta_0,\dots,\beta_4.

  2. Find another cubic polynomial f_2(x) = a_2 + b_2 x + c_2 x^2 + d_2 x^3, such that f(x) = f_2(x) for all x > \xi. Express a_2, b_2, c_2, d_2 in terms of \beta_0,\dots,\beta_4. This establishes that f(x) is piecewise polynomial.

  3. Show that f_1(\xi) = f_2(\xi). That is, f(x) is continuous at \xi.

  4. Show that f'_1(\xi) = f'_2(\xi). That is, the first derivative f'(x) is continuous at \xi.

  5. Show that f''_1(\xi) = f''_2(\xi). That is, the second derivative f''(x) is continuous at \xi.

Conclude that f(x) is indeed a cubic spline.

Derive the smoothing spline estimator \hat{\beta} = (\bm{N}^T\bm{N} + \lambda \bm{\Omega})^{-1}\bm{N}^T\bm{y}, given for granted the validity of the Green and Silverman theorem about the optimality of natural cubic splines, as stated here.

Coding exercises

Write a function called loclin(x, y, h) that implements local linear regression using the formula of this slide. You can use any kernel function of your choice.

Compare the results of your function loclin on the auto dataset with those of the libraries KernSmooth and sm, along the lines of what has been done in class.

Write a function called loo_cv(X, y, h) that computes the leave-one-out cross-validation for the loclin(x, y, h) function, using the result of this slide.

Identify the optimal bandwidth for the loclin local linear regression, using the auto dataset.