Exercises A
Data mining - CdL CLAMSES
Homepage
Theoretical exercises
A.1 - Properties of the projection matrix \bm{H}
Suppose the n\times p design matrix \bm{X} has full rank, that is \text{rk}(\bm{X}) = p, with p < n. Prove that \text{rk}(\bm{H}) = \text{tr}(\bm{H}) = p. Moreover, show that \bm{H} = \bm{H}^T and that \bm{H}^2 = \bm{H}. Finally, show that if the intercept is included into the model, then the sum of the elements of each row of \bm{H} (and hence each column, because of symmetry) equals 1, that is \sum_{j=1}^n[\bm{H}]_{ij} = 1, \qquad i=1,\dots,n.
Hint. You may want to look for the properties of projection matrices on your favorite linear algebra textbook, otherwise this exercise becomes quite hard.
A.2 - Positive definiveness of \bm{X}^T\bm{X}
Prove the statement of Proposition A.1. In other words, suppose the n\times p matrix \bm{X} has full rank, that is \text{rk}(\bm{X}) = p, with p < n. Then, show that \bm{X}^T\bm{X} is positive definite.
A.3 - OLS with orthogonal predictors
Prove the statement of Proposition A.2. In other words, show that when the predictors are orthogonal the least square estimate is \hat{\beta}_j = \frac{\tilde{\bm{z}}_j^T\bm{y}}{\tilde{\bm{z}}_j^T\tilde{\bm{z}}_j}, \qquad j=1,\dots,p.
Assume, in addition, that \bm{Y} = \bm{Z}\beta + \bm{\epsilon} and that the errors \epsilon_i \overset{\text{iid}}{\sim} \text{N}(0, \sigma^2). Then, obtain the covariance matrix of \hat{\beta} and conclude that the estimators \hat{\beta}_j and \hat{\beta}_{j'} are independent, .
A.4 - Final step of the Gram-Schidmt algorithm
Consider the last equation of this slide, that is \hat{\beta}_p = (\tilde{\bm{z}}_p^T\bm{y}) / (\tilde{\bm{z}}_p^T \tilde{\bm{z}}_p). Realize that this estimate can be regarded as the final (additional) step of the Gram-Schmidt algorithm, as mentioned in Algorithm 2.1 of the textbook Azzalini & Scarpa (2011).
A.5 - Recursive least squares
Verify the correctness of the recursive least square equations described in this slide.
The proof is already concisely written in the slides, you just need to convince yourself of the correctness of every step and add the missing details.
The second part of this exercise is quite hard and optional, because it involves a lot of algebraic steps. Verify the correctness of the deviance formula: ||\bm{y}_{(n + 1)} - \bm{X}_{(n+1)}\hat{\beta}_{(n+1)}||^2 = ||\bm{y}_{(n)} - \bm{X}_{(n)}\hat{\beta}_{(n)}||^2 + v_{(n)} e_{n+1}^2, which is mentioned here without proof.
A.6 - Separability in logistic regression
Consider a logistic regression model for binary data with a single predictor x_i \in \mathbb{R}, so that f(x_i) = \beta_0 + \beta_1 x_i and \mathbb{P}(Y_i = 1) = \pi(x_i) = 1/[1 + \exp\{-f(x_i)\}]. Suppose there exists a point x_0 \in \mathbb{R} that perfectly separates the two binary outcomes.
Investigate the behavior of the maximum likelihood estimate \hat{\beta}_0, \hat{\beta} in this scenario. Then, generalize this result to the multivariate case, when \bm{x}_i \in \mathbb{R}^p. That is, suppose there exists a vector \bm{x}_0 \in \mathbb{R}^p that perfectly separates the binary outcomes.
A detailed description of this separability issue is provided in the Biometrika paper by Albert and Anderson (1984). See also the paper by Rigon and Aliverti (2023) for a simple correction.
A.7 - Standardization of the input variables
In a linear model, suppose the original input values x_{ij} are standardized, namely the transformed variables z_{ij} are such that z_{ij} = \frac{x_{ij} - \bar{x}_j}{s_j}, where \bar{x}_j and s_j are the mean and the standard deviation of the jth variable. We denote with \hat{\beta} the OLS estimate based on the original data and with \hat{\gamma} the OLS estimate based on the standardized data.
Prove that the predicted values coincide, that is: \hat{y}_i = \bm{x}_i^T\hat{\beta} = \bm{z}_i^T\hat{\gamma}, \qquad i=1,\dots,n, where \bm{x}_i = (x_{i1},\dots,x_{ip})^T and \bm{z}_i = (z_{i1},\dots,z_{ip})^T. Hence, standardization of the inputs in ordinary least squares has no effect on the predictions. Similar linear operations on the inputs, such as normalization, would lead to the same conclusion.
Practical exercises
A.8 - Recursive least squares
Implement a function ols_function(X, y)
that computes the least squares estimate using the recursive least squares algorithm, described in this slide.
A detailed description of this procedure is also described in Algorithm 2.2 of Azzalini & Scarpa, 2011
A.9 - Iteratively re-weighted least squares for logistic regression
Implement a function called logistic_mle(X, y)
which computes the maximum likelihood estimate for a logistic regression model using the iteratively re-weighted least squares, as described here.
Verify that the output of logistic_mle
and the built-in glm
function coincide, using the hearth dataset.