Exercises F
Data mining - CdL CLAMSES
Homepage
Theoretical exercises
Exercise F.1 - Degrees of freedom of MARS
Given the data y_i with mean f(x_i) and variance \sigma^2 and a fitting operation \hat{y}_i = \hat{f}(x_i), let us define the effective degrees of freedom as 1/\sigma^2\sum_{i=1}^n \text{cov}(Y_i, \hat{f}(x_i)), as in the slides.
Consider the estimate \hat{f}(x_i) of a MARS, using a set of predictors \tilde{\bm{x}}_1,\dots,\tilde{\bm{x}}_p.
Generate n = 100 observations with predictors \tilde{\bm{x}}_1,\dots,\tilde{\bm{x}}_p as independent standard Gaussian variates and fix these values.
Generate response values y_i also as standard Gaussian (\sigma^2 = 1), independent of the predictors.
Fit several MARS models using the
earth
R package and compare the final number of basis functions of each model with the associated effective degrees of freedom. Do about 50 simulations of the response and average the results to get a decent Monte Carlo approximation of the degrees of freedom. Perform this operation as a function of the following tuning parameters:A sufficiently large grid of values for
nk
, the maximum number of terms to be included in the forward pass.Different maximum degrees of MARS:
degree = 1
,degree = 2
, anddegree = 3
.Different pruning strategies:
pmethod = "none"
(no pruning) andpmethod = "backward"
(backward regression).
Practical exercise
Exercise F.2 - Implementation of the backfitting algorithm
Implement the backfitting algorithm that is described in this slide. Use it to predict the Salary
of the baseball players on the Hitters
dataset.