Exercises E

Statistics III - CdL SSE

Author

Affiliation

Tommaso Rigon

Università degli Studi di Milano-Bicocca

Homepage

The theoretical exercises described below are quite difficult. At the exam, you can expect a simplified version of them; otherwise, they would represent a formidable challenge for most of you.

On the other hand, the data analyses are more or less aligned with what you may encounter in the final examination.

The vast majority of these exercises are taken from the textbooks Salvan et al. (2020) and Agresti (2015), possibly with a few minor modifications. You can consult these textbooks if you need additional exercises.

Data analysis

Basketball dataset - Partial solution: Basketball.R

The Basketball dataset can be downloaded here and shows the three-point shooting, by game, of Ray Allen of the Boston Celtics during the 2010 NBA (basketball) playoffs (e.g, he made 0 of 4 shots in game 1). Commentators remarked that his shooting varied dramatically from game to game.

In the ith game, suppose that S_i = m_i Y_i is the number three-points shots made out of m_i attempts is distributed as S_i \sim \text{Binomial}(m_i, \pi_i), and that the S_i are independent. The original source of the dataset is the textbook:

Agresti, A. (2015). Foundations of Linear and Generalized Linear Models. Wiley.

Import the data and then

Fit the null model, with \pi_i = \pi. Find and interpret the estimate \hat{\pi}, obtain its standard error and a confidence interval.
Is there evidence of overdispersion? Motivate your answer based on the data.
Describe a factor that could cause overdispersion. Adjust the standard error for overdispersion and obtain another confidence interval based on this correction.
Compare the two confidence intervals and interpret the results.

Bioassay dataset - Partial solution: Bioassay.R

The data in the Bioassay dataset, available in the MLGdata library, refer to a biological experiment. The variable y represents the number of observed events out of a total of subjects (den) exposed to a dose z. The original source of the data is

Finney, D. J. (1947). Probit Analysis. Cambridge: Cambridge University Press.

Import the data and then:

Fit a binomial model with a probit link function.
To check for possible overdispersion, consider a quasi-likelihood model. Compute the standard errors using a robust approach.
Compute the variance of the standardized Pearson residuals for the probit model and the quasi-likelihood model. Comment the results.

Germination dataset (continuation)

Reconsider the Germination dataset previously analyzed in Exercises C.

Evaluate the possible presence of overdispersion, and compare the results obtained from fitting the different models that account for it. Propose a final model and interpret the results.

Heart dataset (continuation)

Reconsider the Heart dataset previously analyzed in Exercises C.

Evaluate the possible presence of overdispersion, and compare the results obtained from fitting the different models that account for it. Propose a final model and interpret the results.

Theoretical

Exercise A

Does the inflated-variance quasi-likelihood approach make sense as a way to generalize the ordinary normal model with v(\mu_i) = \sigma^2? Why or why not?

Exercise B

Altham (1978) introduced the discrete distribution f(x; \pi, \theta) = c(\pi, \theta) \binom{n}{x}\pi^x(1 - \pi)^{n - x}\theta^{x(n-x)}, \qquad x = 0, 1,\dots, n, where c(\pi, \theta) is a normalizing constant.

Show that this is in the two-parameter exponential family and that the binomial occurs when \theta = 1.

Comment: Altham noted that overdispersion occurs when \theta < 1. Lindsey and Altham (1998) used this as the basis of an alternative model to the beta-binomial.

References

Agresti, A. (2015), Foundations of Linear and Generalized Linear Models, Wiley.

Salvan, A., Sartori, N., and Pace, L. (2020), Modelli lineari generalizzati, Springer.