The theoretical exercises described below are quite difficult. At the exam, you can expect a simplified version of them; otherwise, they would represent a formidable challenge for most of you.
On the other hand, the data analyses are more or less aligned with what you may encounter in the final examination.
The vast majority of these exercises are taken from the textbooks Salvan et al. (2020) and Agresti (2015), possibly with a few minor modifications. You can consult these textbooks if you need additional exercises.
Data analysis
The Basketball dataset can be downloaded here and shows the three-point shooting, by game, of Ray Allen of the Boston Celtics during the 2010 NBA (basketball) playoffs (e.g, he made 0 of 4 shots in game 1). Commentators remarked that his shooting varied dramatically from game to game.
In the ith game, suppose that S_i = m_i Y_i is the number three-points shots made out of m_i attempts is distributed as S_i \sim \text{Binomial}(m_i, \pi_i), and that the S_i are independent.
Import the data and then
Fit the null model, with \pi_i = \pi. Find and interpret the estimate \hat{\pi}, obtain its standard error and a confidence interval.
Is there evidence of overdispersion? Motivate your answer based on the data.
Describe a factor that could cause overdispersion. Adjust the standard error for overdispersion and obtain another confidence interval based on this correction.
Compare the two confidence intervals and interpret the results.
The data in the Bioassay dataset, available in the MLGdata library, refer to a biological experiment (Finney, 1947, Table 9). The variable y represents the number of observed events out of a total of subjects (den) exposed to a dose z.
Fit a binomial model with a probit link function.
To check for possible overdispersion, consider a quasi-likelihood model, also evaluating the standard errors using a robust approach.
Reconsider the Germination dataset previously analyzed in Exercises C.
Evaluate the possible presence of overdispersion, and compare the results obtained from fitting the different models that account for it. Propose a final model and interpret the results.
Reconsider the Homicide dataset previously analyzed in Exercises D.
Evaluate the possible presence of overdispersion.
Construct an approximate 95% confidence interval for the ratio of the mean number of reported homicides between White and Black respondents, using both the Poisson and a quasi-likelihood approach. Comment on the results.
Theoretical
Does the inflated-variance quasi-likelihood approach make sense as a way to generalize the ordinary normal model with v(\mu_i) = \sigma^2? Why or why not?
Altham (1978) introduced the discrete distribution
f(x; \pi, \theta) = c(\pi, \theta) \binom{n}{x}\pi^x(1 - \pi)^{n - x}\theta^{x(n-x)}, \qquad x = 0, 1,\dots, n,
where c(\pi, \theta) is a normalizing constant.
Show that this is in the two-parameter exponential family and that the binomial occurs when \theta = 1.
Comment: Altham noted that overdispersion occurs when \theta < 1. Lindsey and Altham (1998) used this as the basis of an alternative model to the beta-binomial.
References
Agresti, A. (2015), Foundations of Linear and Generalized Linear Models, Wiley.
Salvan, A., Sartori, N., and Pace, L. (2020), Modelli lineari generalizzati, Springer.