Exercises A

Data mining - CdL CLAMSES

Author
Affiliation

Tommaso Rigon

Università degli Studi di Milano-Bicocca

Homepage

The theoretical exercises described below could be quite difficult. At the exam, you can expect a simplified version of them; otherwise, they would represent a formidable challenge for most of you.

Theoretical exercises

Suppose the n\times p design matrix \bm{X} has full rank, that is \text{rk}(\bm{X}) = p, with p < n. Prove that \text{rk}(\bm{H}) = \text{tr}(\bm{H}) = p. Moreover, show that \bm{H} = \bm{H}^T and that \bm{H}^2 = \bm{H}. Finally, show that if the intercept is included into the model, then the sum of the elements of each row of \bm{H} (and hence each column, because of symmetry) equals 1, that is \sum_{j=1}^n[\bm{H}]_{ij} = 1, \qquad i=1,\dots,n.

Hint. You may want to look for the properties of projection matrices on your favorite linear algebra textbook, otherwise this exercise becomes quite hard.

Prove the statement of Proposition A.1. In other words, suppose the n\times p matrix \bm{X} has full rank, that is \text{rk}(\bm{X}) = p, with p < n. Then, show that \bm{X}^T\bm{X} is positive definite.

Prove the statement of Proposition A.2. In other words, show that when the predictors are orthogonal the least square estimate is \hat{\beta}_j = \frac{\tilde{\bm{z}}_j^T\bm{y}}{\tilde{\bm{z}}_j^T\tilde{\bm{z}}_j}, \qquad j=1,\dots,p.

Assume, in addition, that \bm{Y} = \bm{Z}\beta + \bm{\epsilon} and that the errors \epsilon_i \overset{\text{iid}}{\sim} \text{N}(0, \sigma^2). Then, obtain the covariance matrix of \hat{\beta} and conclude that the estimators \hat{\beta}_j and \hat{\beta}_{j'} are independent.

Verify the correctness of the recursive least square equations described in this slide.

The proof is already concisely written in the slides, you just need to convince yourself of the correctness of every step and add the missing details.

The second part of this exercise is quite hard and optional, because it involves a lot of algebraic steps. Verify the correctness of the deviance formula: ||\bm{y}_{(n + 1)} - \bm{X}_{(n+1)}\hat{\beta}_{(n+1)}||^2 = ||\bm{y}_{(n)} - \bm{X}_{(n)}\hat{\beta}_{(n)}||^2 + v_{(n)} e_{n+1}^2, which is mentioned here without proof.

Consider a logistic regression model for binary data with a single predictor x_i \in \mathbb{R}, so that f(x_i) = \beta_0 + \beta_1 x_i and \mathbb{P}(Y_i = 1) = \pi(x_i) = 1/[1 + \exp\{-f(x_i)\}]. Suppose there exists a point x_0 \in \mathbb{R} that perfectly separates the two binary outcomes.

Investigate the behavior of the maximum likelihood estimate \hat{\beta}_0, \hat{\beta} in this scenario. Then, generalize this result to the multivariate case, when \bm{x}_i \in \mathbb{R}^p. That is, suppose there exists a vector \bm{x}_0 \in \mathbb{R}^p that perfectly separates the binary outcomes.

A detailed description of this separability issue is provided in the Biometrika paper by Albert and Anderson (1984). See also the paper by Rigon and Aliverti (2023) for a simple correction.

In a linear model, suppose the original input values x_{ij} are standardized, namely the transformed variables z_{ij} are such that z_{ij} = \frac{x_{ij} - \bar{x}_j}{s_j}, where \bar{x}_j and s_j are the mean and the standard deviation of the jth variable. We denote with \hat{\beta} the OLS estimate based on the original data and with \hat{\gamma} the OLS estimate based on the standardized data.

Prove that the predicted values coincide, that is: \hat{y}_i = \bm{x}_i^T\hat{\beta} = \bm{z}_i^T\hat{\gamma}, \qquad i=1,\dots,n, where \bm{x}_i = (x_{i1},\dots,x_{ip})^T and \bm{z}_i = (z_{i1},\dots,z_{ip})^T. Hence, standardization of the inputs in ordinary least squares has no effect on the predictions. Similar linear operations on the inputs, such as normalization, would lead to the same conclusion.

Coding exercises

Implement a function ols_function(X, y) that computes the least squares estimate using the recursive least squares algorithm, described in this slide.

A detailed description is also provided in Algorithm 2.2 of Azzalini & Scarpa, 2011

Implement a function called logistic_mle(X, y) which computes the maximum likelihood estimate for a logistic regression model using the iteratively re-weighted least squares, as described here.

Verify that the output of logistic_mle and the built-in glm function coincide, using the heart dataset.