Exam - 4 July 2025

Data Mining - CdL CLAMSES

Author

Affiliation

Tommaso Rigon

Università degli Studi di Milano-Bicocca

Homepage

The time available to the candidate is 2 hours and 30 minutes.

Problem 1

Let \hat{y}_{-i} = \bm{x}_i^T\hat{\beta}_{-i} be the leave-one-out predictions of a linear model and let h_i = [\bm{H}]_{ii} and \hat{y}_i be the leverages and the predictions of the full model estimated using least squares, respectively.

Prove that the leave-one-out residuals are y_i - \hat{y}_{-i} = \frac{y_i - \hat{y}_i}{1 - h_i}, \qquad i=1,\dots,n.
What is the usefulness of this result? Discuss.

Problem 2

Discuss similarities and differences between cross-validation and generalized cross-validation.

Problem 3

Consider lasso and ridge regression when the predictors are mutually orthogonal. More precisely, let \bm{Z} = (\tilde{\bm{z}}_1,\dots,\tilde{\bm{z}}_p) be the design matrix and suppose \bm{Z} is orthogonal and standardized, which means \bm{Z}^T\bm{Z} = I_p. Moreover, suppose the predictors and the response have been centered, that is \sum_{i=1}^ny_i = \sum_{i=1}^n z_{ij} = 0.

Find an explicit expression for \hat{\beta}_\text{ridge}.
Find an explicit expression for \hat{\beta}_\text{lasso}.

Problem 4

Let (x_i, y_i) \in \mathbb{R}^2 for i=1,\dots,n be pairs of iid realizations from an unknown density function f(x, y), namely (X_i, Y_i) \overset{\text{iid}}{\sim} f. A common estimator for f(x, y) is the kernel density estimator (KDE), which is defined as follows: \hat{f}(x, y) = \frac{1}{n}\sum_{i=1}^n\frac{1}{h_1 h_2}\phi\left(\frac{x - x_i}{h_1}\right)\phi\left(\frac{y - y_i}{h_2}\right), where \phi(\cdot) is the density function of a standard Gaussian and h_1, h_2 > 0 are positive constants.

Verify that \hat{f}(x, y) > 0 and \hat{f}(x, y) integrates to 1 on \mathbb{R}^2. In other words, show that \hat{f}(x, y) is a bivariate density function.
Compute the estimated marginal density \hat{f}(x) and the estimated conditional density \hat{f}( y \mid x) associated to \hat{f}(x, y).
Let g(x) = \mathbb{E}(Y \mid X = x) be the conditional mean associated to f(y \mid x). Obtain the estimator \hat{g}(x) for g(x) from the estimated conditional density \hat{f}(y \mid x).
Discuss similarities between \hat{g}(x) and other known nonparametric estimators of g(x).