Poisson distribution

From Free net encyclopedia

Template:Probability distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution. It expresses the probability of a number of events occurring in a fixed time if these events occur with a known average rate, and are independent of the time since the last event. The distribution was discovered by Siméon-Denis Poisson (17811840) and published, together with his probability theory, in 1838 in his work Recherches sur la probabilité des jugements en matières criminelles et matière civile [Research on the Probability of Judgments in Criminal and Civil Matters]) belonging to certain random variables N that count, among other things, a number of discrete occurrences (sometimes called "arrivals") that take place during a time-interval of given length. The probability that there are exactly k occurrences (k being a non-negative integer, k = 0, 1, 2, ...) is

<math>f(k;\lambda)=\frac{e^{-\lambda} \lambda^k}{k!},\,\!</math>

where

  • e is the base of the natural logarithm (e = 2.71828...),
  • k! is the factorial of k,
  • λ is a positive real number, equal to the expected number of occurrences that occur during the given interval. For instance, if the events occur on average every 4 minutes, and you are interested in the number of events occurring in a 10 minute interval, you would use as model a Poisson distribution with λ = 2.5.

As a function of k, this is the probability mass function.

Contents

Poisson processes

Sometimes <math>\lambda</math> is taken to be the rate, i.e., the average number of occurrences per unit time. In that case, if Nt is the number of occurrences before time t then we have

<math>\Pr(N_t=k)=f(k;\lambda t)=\frac{e^{-\lambda t} (\lambda t)^k}{k!},\,\!</math>

and the waiting time T until the first occurrence is a continuous random variable with an exponential distribution (with parameter <math>\lambda</math>). This probability distribution may be deduced from the fact that

<math>\Pr(T>t)=\Pr(N_t=0)=e^{-\lambda t}.\,</math>

When time becomes involved, then we have a 1-dimensional Poisson process, which involves both the discrete Poisson-distributed random variables that count the number of arrivals in each time interval, and the continuous Erlang-distributed waiting times. There are also Poisson processes of dimension higher than 1.

Related distributions

  • <math>Y \sim \mathrm{Poi}(\bar{\lambda})</math> is a Poisson distribution if <math>Y = \sum_{m=1}^N X_m</math> for <math>X_m \sim \mathrm{Poi}(\lambda_m)</math> independent Poisson distributions and <math>\bar{\lambda} = \sum_{m=1}^N \lambda_m</math>.
  • Assume <math>X_1 \sim \mathrm{Poi}(\lambda_1)</math> and <math>X_2 \sim \mathrm{Poi}(\lambda_2)</math>, and let <math>Y = X_1 + X_2</math>. Then the distribution of <math>X_1</math> conditional on <math>Y=y</math> is binomial; specifically, <math>X_1|Y=y \sim \mathrm{Binom}(y, \lambda_1/(\lambda_1+\lambda_2))</math>.

Occurrence

The Poisson distribution arises in connection with Poisson processes. It applies to various phenomena of discrete nature (that is, those that may happen 0, 1, 2, 3, ... times during a given period of time or in a given area) whenever the probability of the phenomenon happening is constant in time or space. Examples of events that can be modelled as Poisson distributions include:

  • The number of cars that pass through a certain point on a road during a given period of time.
  • The number of spelling mistakes a secretary makes while typing a single page.
  • The number of phone calls at a call center per minute.
  • The number of times a web server is accessed per minute.
    • For instance, the number of edits per hour recorded on Wikipedia's Recent Changes page follows an approximately Poisson distribution.
  • The number of roadkill found per unit length of road.
  • The number of mutations in a given stretch of DNA after a certain amount of radiation.
  • The number of unstable nuclei that decayed within a given period of time in a piece of radioactive substance. The radioactivity of the substance will weaken with time, so the total time interval used in the model should be significantly less than the mean lifetime of the substance.
  • The number of pine trees per unit area of mixed forest.
  • The number of stars in a given volume of space.
  • The number of soldiers killed by horse-kicks each year in each corps in the Prussian cavalry. This example was made famous by a book of Ladislaus Josephovich Bortkiewicz (18681931).
  • The distribution of visual receptor cells in the retina of the human eye.
  • The number of V2 rocket attacks per area in England, according to the fictionalized account in Thomas Pynchon's Gravity's Rainbow.
  • The number of light bulbs burn out in a certain amount of time.

How does this distribution arise? – The law of rare events

In several of the above examples—for example, the number of mutations in a given sequence of DNA—the events being counted are actually the outcomes of discrete trials, and would more precisely be modelled using the binomial distribution. However, the binomial distribution with parameters n and λ/n, i.e., the probability distribution of the number of successes in n trials, with probability λ/n of success on each trial, approaches the Poisson distribution with expected value λ as n approaches infinity. This limit is sometimes known as the law of rare events. It provides a means by which to approximate random variables using the Poisson distribution rather than the more-cumbersome binomial distribution.

Here are the details. First, recall from calculus that

<math>\lim_{n\to\infty}\left(1-{\lambda \over n}\right)^n=e^{-\lambda}.</math>

Let p = λ/n. Then we have

<math>\lim_{n\to\infty} \Pr(X=k)=\lim_{n\to\infty}{n \choose k} p^k (1-p)^{n-k}

=\lim_{n\to\infty}{n! \over (n-k)!k!} \left({\lambda \over n}\right)^k \left(1-{\lambda\over n}\right)^{n-k}</math>

<math>=\lim_{n\to\infty} \underbrace{\left({n \over n}\right)\left({n-1 \over n}\right)\left({n-2 \over n}\right) \cdots \left({n-k+1 \over n}\right)}\ \underbrace{\left({\lambda^k \over k!}\right)}\ \underbrace{\left(1-{\lambda \over n}\right)^n}\ \underbrace{\left(1-{\lambda \over n}\right)^{-k}}.</math>

As n approaches ∞, the expression over the first underbrace approaches 1; the second remains constant since "n" does not appear in it at all; the third approaches e−λ; and the fourth expression approaches 1.

Consequently the limit is

<math>{\lambda^k e^{-\lambda} \over k!}.\,\!</math>

More generally, whenever a sequence of binomial random variables with parameters n and pn is such that

<math>\lim_{n\rightarrow\infty} np_n = \lambda,</math>

the sequence converges in distribution to a Poisson random variable with mean λ (see, e.g., law of rare events).

Properties

The expected value of a Poisson distributed random variable is equal to λ and so is its variance. The higher moments of the Poisson distribution are Touchard polynomials in λ, whose coefficients have a combinatorial meaning. In fact when the expected value of the Poisson distribution is 1, then Dobinski's formula says that the nth moment equals the number of partitions of a set of size n.

The mode of a Poisson distributed random variable with non-integer <math>\lambda</math> is equal to <math>\lfloor \lambda \rfloor</math>, which is the largest integer less than or equal to <math>\lambda</math>. This is also written as floor(<math>\lambda</math>). When <math>\lambda</math> is a positive integer, the modes are <math>\lambda</math> and <math>\lambda - 1</math>.

For sufficiently large values of <math>\lambda</math> (<math>\lambda >1000</math> say), the normal distribution with mean <math>\lambda</math> and variance <math>\lambda</math> is an excellent approximation to the Poisson distribution. If <math>\lambda</math> is greater than about 10, then the normal distribution is a good approximation if an appropriate continuity correction is performed, i.e., P(Xx), where (lower-case) x is a non-negative integer, is replaced by P(Xx + 0.5).

If N and M are two independent random variables, both following a Poisson distribution with parameters <math>\lambda</math> and <math>\mu</math>, respectively, then N + M follows a Poisson distribution with parameter <math>\lambda+\mu</math>.

The moment-generating function of the Poisson distribution with expected value <math>\lambda</math> is

<math>\mathrm{E}\left(e^{tX}\right)=\sum_{k=0}^\infty e^{tk} f(k;\lambda)=\sum_{k=0}^\infty e^{tk} {\lambda^k e^{-\lambda} \over k!} =e^{\lambda(e^t-1)}.</math>

All of the cumulants of the Poisson distribution are equal to the expected value <math>\lambda</math>. The nth factorial moment of the Poisson distribution is <math>\lambda^n</math>.

The Poisson distributions are infinitely divisible probability distributions.

Parameter estimation

Given a sample of N  measured values <math>k_i</math> we wish to estimate the value of the parameter <math>\lambda</math> of the Poisson population from which the sample was drawn. To calculate the maximum likelihood value, we form the likelihood function

<math>L(\lambda)=\prod_{i=1}^N f(k_i;\lambda) = \prod_{i=1}^N \frac{e^{-\lambda}\lambda^{k_i}}{k_i!}

= \frac{e^{-N\lambda}\lambda^{\Sigma k_i}}{\prod k_i!}</math>

where the sums and products are from <math>i=1</math> to <math>N</math>. Taking the logarithm of L and then the derivative with respect to <math>\lambda</math> and equating to zero yields the MLE estimate of <math>\lambda</math>:

<math>\lambda_\mathrm{MLE}=\frac{1}{N}\sum_{i=1}^N k_i</math>

From the properties of characteristic functions, it is seen that the characteristic function of the distribution of <math>\lambda_\mathrm{MLE}</math>  is

<math>\varphi_\mathrm{MLE}(t)=\left(\prod_{i=1}^N \varphi(t/N)\right)=\varphi^N(t/N)=\exp(N\lambda(e^{it/N}-1))</math>

The expected value of <math>\lambda_\mathrm{MLE}</math>  is then found to be:

<math>\operatorname{E} (\lambda_\mathrm{MLE}) =

-i\left(\frac{d}{dt}\,\varphi_\mathrm{MLE}(t)\right)_{t=0}=\lambda</math>

Since the average value of <math>\lambda_\mathrm{MLE}</math> is equal to <math>\lambda</math>, it is therefore an unbiased estimator of <math>\lambda</math>.

The "law of small numbers"

The word law is sometimes used as a synonym of probability distribution, and convergence in law means convergence in distribution. Accordingly, the Poisson distribution is sometimes called the law of small numbers because it is the probability distribution of the number of occurrences of an event that happens rarely but has very many opportunities to happen. The Law of Small Numbers is a book by Ladislaus Bortkiewicz about the Poisson distribution, published in 1898. Some historians of mathematics have argued that the Poisson distribution should have been called the Bortkiewicz distribution.

See also

External links

de:Poisson-Verteilung es:Distribución de Poisson fa:توزیع پواسون fr:Loi de Poisson it:Variabile casuale poissoniana he:התפלגות פואסון nl:Poissonverdeling ja:ポアソン分布 pl:Rozkład Poissona su:Sebaran Poisson fi:Poissonin jakauma sv:Poissonfördelning zh:泊松分布