Exercises F

Data mining - CdL CLAMSES

Author
Affiliation

Tommaso Rigon

Università degli Studi di Milano-Bicocca

Homepage

Theoretical exercises

Exercise F.1 - Degrees of freedom of MARS

Given the data y_i with mean f(x_i) and variance \sigma^2 and a fitting operation \hat{y}_i = \hat{f}(x_i), let us define the effective degrees of freedom as 1/\sigma^2\sum_{i=1}^n \text{cov}(Y_i, \hat{f}(x_i)), as in the slides.

Consider the estimate \hat{f}(x_i) of a MARS, using a set of predictors \tilde{\bm{x}}_1,\dots,\tilde{\bm{x}}_p.

  1. Generate n = 100 observations with predictors \tilde{\bm{x}}_1,\dots,\tilde{\bm{x}}_p as independent standard Gaussian variates and fix these values.

  2. Generate response values y_i also as standard Gaussian (\sigma^2 = 1), independent of the predictors.

  3. Fit several MARS models using the earth R package and compare the final number of basis functions of each model with the associated effective degrees of freedom. Do about 50 simulations of the response and average the results to get a decent Monte Carlo approximation of the degrees of freedom. Perform this operation as a function of the following tuning parameters:

    1. A sufficiently large grid of values for nk, the maximum number of terms to be included in the forward pass.

    2. Different maximum degrees of MARS: degree = 1, degree = 2, and degree = 3.

    3. Different pruning strategies: pmethod = "none" (no pruning) and pmethod = "backward" (backward regression).

Practical exercise

Exercise F.2 - Implementation of the backfitting algorithm

Implement the backfitting algorithm that is described in this slide. Use it to predict the Salary of the baseball players on the Hitters dataset.