Generalized linear model

From Free net encyclopedia

In statistics the generalized linear model (GLM) generalizes the Ordinary least squares regression. A GLM has the following three components:

(1) A description of the distribution of the data (<math>Y</math>):

<math>E(Y) = \mu</math> and <math> Var(Y) = f(\mu) </math>

It is convenient if the variance follows from the exponential family which covers a very large range of distributions.

(2) A linear model that predicts a value <math>\eta</math> based on predictors (<math>X</math>) and some unknown parameters (<math>\beta</math>).

<math> \eta = X^T\beta </math>

This is a linear equation very similar to ordinary least squares.

(2) A link function (<math>g</math>) that links the expected value of the data (<math>\mu</math>) with the linear model

<math> \mu = g^{-1}(\eta), </math> so that <math> g(\mu) = \eta. </math>

This implies that the descriptors of <math>Y</math> are linear with respect to <math>\eta</math> but not necesarily linear with respect to <math>Y</math> itself.


Parameters must be estimated with maximum likelihood or quasi maximum likelihood.


Contents

Examples

The simplest example of a GLM is linear regression. Here the link function is the identity and the variance is assumed to be normally distributed.

Binomial Data

When the response data (<math>Y</math>) is binary, the variance is generally regarded as binomial and the interpretation of <math>\mu_i</math> is then the probability of <math>Y_i</math> taking on the value one. The variance function is given by:

<math>Var(Y)= \phi \mu (1-\mu)</math>

where <math>\phi</math> is often exactly one. When it is not, the variance is often described as quasibinomial.

There are many popular link functions for binomial functions, they include the logistic function:

<math>g(p) = \log\left( { p \over 1-p } \right) </math>

In addition, any Cumulative density function can be used and the normal is a popular choice and is called the Probit model, and the link is

<math>g(p) = \Phi^{-1}(p)</math>

where <math>\Phi</math> is the cumulative density function of the normal distribution

The identity link is also sometimes used for binomial data (this is equivalent to using the uniform distribution instead of the normal as the CDF) but this encounters problems when the predicted probabilities are greater or less than one. In implementation this is possible to fix but interpreting the coefficients can be difficult in this model. But, it is not too distant from the logit or probit around p=0.5.

Count data

Another example of generalized linear models includes Poisson regression which models count data. In this case, the variance is proportional to the mean

<math>Var(y) = \phi \mu </math>

where <math>\phi</math>, the dispersion parameter, is often equal to one. When it is not, the variance is often described as poisson with overdispersion or quasipoisson.

References

  • P. McCullagh and J.A. Nelder. Generalized Linear Models. London: Chapman and Hall, 1989.
  • A.J. Dobson. Introduction to Generalized Linear Models, Second Edition. London: Chapman and Hall/CRC, 2001.

External links