Generalized Linear Models
Generalized Linear Models#
Extracted from [Murphy, 2019]:
In earlier chapters , we discussed logistic regression, which, in the binary case, corresponds to the model \(\mathbb{P}(y \mid \boldsymbol{x}, \boldsymbol{w})=\operatorname{Ber}\left(y \mid \sigma\left(\boldsymbol{w}^{\top} \boldsymbol{x}\right)\right)\). In Chapter 11, we discussed linear regression, which corresponds to the model \(\mathbb{P}(y \mid \boldsymbol{x}, \boldsymbol{w})=\mathcal{N}\left(y \mid \boldsymbol{w}^{\top} \boldsymbol{x}, \sigma^2\right)\). These are obviously very similar to each other. In particular, the mean of the output, \(\mathbb{E}[y \mid \boldsymbol{x}, \boldsymbol{w}]\), is a linear function of the inputs \(\boldsymbol{x}\) in both cases.
It turns out that there is a broad family of models with this property, known as generalized linear models or GLMs [MN89].
A GLM is a conditional version of an exponential family distribution (Section 3.4), in which the natural parameters are a linear function of the input. More precisely, the model has the following form:
where \(\eta_n \triangleq \boldsymbol{w}^{\top} \boldsymbol{x}_n\) is the (input dependent) natural parameter, \(A\left(\eta_n\right)\) is the log normalizer, \(\mathcal{T}(y)=y\) is the sufficient statistic, and \(\sigma^2\) is the dispersion term. \({ }^1\)
We will denote the mapping from the linear inputs to the mean of the output using \(\mu_n=\ell^{-1}\left(\eta_n\right)\), where the function \(\ell\) is known as the link function, and \(\ell^{-1}\) is known as the mean function.
Based on the results in Section 3.4.3, we can show that the mean and variance of the response variable are as follows: