Exercises B

Data mining - CdL CLAMSES

Author
Affiliation

Tommaso Rigon

Università degli Studi di Milano-Bicocca

Homepage

Theoretical exercises

B.1 - Bias-variance decomposition

Prove the bias-variance decomposition stated in this slide.

B.2 - Optimism

Prove that, as stated in this slide, the optimism, defined as \text{Opt} = \mathbb{E}(\text{MSE}_\text{test}) - \mathbb{E}(\text{MSE}_\text{train}), can be equivalently expressed as \text{Opt} = \frac{2}{n}\sum_{i=1}^n\text{cov}(Y_i, \hat{f}(\bm{x}_i)). Refer also to Exercise 3.3 of the textbook A&S (2011), which is implicitly computing similar quantities.

To be added. Refer to the notes available on e-learning.

B.3 - Leave-one-out

Prove the leave-one-out formula for linear models stated in this slide.

Hint: many steps are similar to those needed for obtaining recursive least squares. Begin the proof by applying Sherman-Morrison formula to \hat{\beta}_{-i}.

To be added. Refer to the notes available on e-learning.

B.4 - An estimator with minimal variance

Let us consider the inequality \bm{B} \preccurlyeq \bm{A} between two squared matrices \bm{A} and \bm{B} of dimension p \times p. The symbol \preccurlyeq means that \bm{A} - \bm{B} is positive semi-definite.

Assume the true model for the data (x_i, y_i) is linear, therefore \bm{Y} = \bm{X}\beta + \bm{\epsilon}. Let us assume, as usual, that the errors \epsilon_i are iid and \mathbb{E}(\epsilon_i) = 0, \text{var}(\epsilon_i) = \sigma^2.

Let \bm{V} = \sigma^2 (\bm{X}^T\bm{X})^{-1} be the covariance matrix of the ordinary least squares estimator \hat{\beta}_\text{ols} and let \tilde{\bm{V}} be the covariance matrix of another linear and unbiased estimator \tilde{\beta} = \bm{A}\bm{y}.

Prove that \bm{V} \preccurlyeq \tilde{\bm{V}}. Hence, does it make sense to look for other (linear) estimators for \hat{\beta}?

B.5 - Heteroscedastic errors

Let us consider a regression problems in which Y_i = f(\bm{x}_i) + v(\bm{x}_i) \epsilon_i, \qquad \tilde{Y}_i = f(\bm{x}_i) + v(\bm{x}_i)\tilde{\epsilon}_i, \quad i=1,\dots,n, where \epsilon_i and \tilde{\epsilon}_i are iid, with \mathbb{E}(\epsilon_i)=0 and \text{var}(\epsilon_i)=\sigma^2. Hence, the error terms v(\bm{x}_i) \epsilon_i have zero mean and variance \sigma^2v^2(\bm{x}_i). In other words, the errors are heteroscedastic.

Show that the in-sample prediction error under squared loss can be decomposed as \begin{aligned} \text{ErrF} &= \mathbb{E}\left[\frac{1}{n} \sum_{i=1}^n \{\tilde{Y}_i- \hat{f}(\bm{x}_i)\}^2\right] \\ & = \frac{\sigma^2}{n}\sum_{i=1}^nv^2(\bm{x}_i) + \frac{1}{n}\sum_{i=1}^n\mathbb{E}\left[\hat{f}(\bm{x}_i) - f(\bm{x}_i)\right]^2 + \frac{1}{n}\sum_{i=1}^n\text{var}\{\hat{f}(\bm{x}_i)\}. \end{aligned} Conclude that the minimizer of \text{ErrF} is f(\bm{x}_i) = \mathbb{E}(\tilde{Y}_i). Discuss the implications of the above results.

Practical exercises

B.6 - The trawl dataset

Consider the dataset trawl contained in the sm library. Implement a function that fit a polynomial regression using Score1 as response variable (y) and Longitude as input variable (x). Implement a procedure that selects the optimal degree of the polynomial based on leave-one-out cross-validation.