Jensen's inequality
From Free net encyclopedia
In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function.
Contents |
Finite form
For a real continuous convex function φ and positive real weights ai,
- <math>\varphi\left(\frac{\sum a_{i} x_{i}}{\sum a_{i}}\right) \le \frac{\sum a_{i} \varphi (x_{i})}{\sum a_{i}} </math>
The inequality is reversed if φ is concave.
As a particular case, if the weights ai are all equal to unity,
- <math>\varphi\left(\frac{\sum x_{i}}{n}\right) \le \frac{\sum \varphi (x_{i})}{n}</math>
The log(x) function is concave, so substituting φ(x) = log(x), this establishes the (logarithm of) the familiar arithmetic mean-geometric mean inequality:
- <math> \frac{x_1 + x_2 + \cdots + x_n}{n} \ge \sqrt[n]{x_1 x_2 \cdots x_n}.</math>
The variable x may if required be a function of another random variable (or set of random variables) t, such that xi = g(ti).
All of this carries directly over to the general continuous case: the weights ai are replaced by a non-negative integrable function f(x),such as a probability distribution, for example; and the summations replaced by integrals.
General statement
The inequality can be stated quite generally using measure theory. It can also be stated equally generally in the language of probability theory. The two statements say exactly the same thing.
In the language of measure theory
Let (Ω,A,μ) be a measure space, such that μ(Ω) = 1. If g is a real-valued function that is μ-integrable, and if φ is convex function on the range of g, then
- <math>\varphi\left(\int_{\Omega} g\, d\mu\right) \le \int_\Omega \varphi \circ g\, d\mu. </math>
In the language of probability theory
In the terminology of probability theory, μ is a probability measure. The function g is replaced by a real-valued random variable X (just another name for the same thing, as long as the context remains one of "pure" mathematics). The integral of any function over the space Ω with respect to the probability measure μ becomes an expected value. The inequality then says that if φ is any convex function, then
- <math>\varphi\left(\Bbb{E}\{X\}\right) \leq \Bbb{E}\{\varphi(X)\}.\,</math>
Proof
Let g be a μ-integrable function on a measure space Ω, and let φ be a convex function on the range of g. Define the right-handed derivative of φ at x as
- <math>\varphi^\prime(x):=\lim_{t\to0^-}\frac{\varphi(x+t)-\varphi(x)}{t}</math>
Since φ is convex, the quotient of the right-hand side is decreasing when t approaches 0 from the right, and bounded below by any term of the form
- <math>\frac{\varphi(x+t)-\varphi(x)}{t}</math>
where t < 0, and therefore, the limit does always exist.
Now, let us define the following:
- <math>x_0:=\int_\Omega g\, d\mu,</math>
- <math>a:=\varphi^\prime(x_0),</math>
- <math>b:=\varphi(x_0)-x_0\varphi^\prime(x_0).</math>
Then for all x, <math>ax+b\leq\varphi(x)</math>. To see that, take x>x0, and define t = x − x0 > 0. Then,
- <math>\varphi^\prime(x_0)\leq\frac{\varphi(x_0+t)-\varphi(x_0)}{t}.</math>
Therefore,
- <math>\varphi^\prime(x_0)(x-x_0)+\varphi(x_0)\leq\varphi(x)</math>
as desired. The case for x < x0 is proven similarly, and clearly <math>ax_0+b=\varphi(x_0)</math>.
φ(x0) can then be rewritten as
- <math>ax_0+b=a\left(\int_\Omega g\,d\mu\right)+b.</math>
But since μ(Ω) = 1, then for every real number k we have
- <math>\int_\Omega k\,d\mu=k.</math>
In particular,
- <math>a\left(\int_\Omega g\,d\mu\right)+b=\int_\Omega(ag+b)\,d\mu\leq\int_\Omega\varphi\circ g\,d\mu.</math>
Applications and special cases
Form involving a probability density function
Suppose Ω is a measurable subset of the real line and f(x) is a non-negative function such that
- <math>\int_{-\infty}^\infty f(x)\,dx = 1.</math>
In probabilistic language, f is a probability density function.
Then Jensen's inequality becomes the following statement about convex integrals:
If g is any real-valued measurable function and φ is convex over the range of g, then
- <math> \varphi\left(\int_{-\infty}^\infty g(x)f(x)\, dx\right) \le \int_{-\infty}^\infty \varphi(g(x)) f(x)\, dx. </math>
If g(x) = x, then this form of the inequality reduces to a commonly used special case:
- <math>\varphi\left(\int_{-\infty}^\infty x\, f(x)\, dx\right) \le \int_{-\infty}^\infty \varphi(x)\,f(x)\, dx.</math>
Alternative finite form
If <math>\Omega</math> is some finite set <math>\{x_1,x_2,\ldots,x_n\}</math>, and if <math>\mu</math> is a normalized counting measure on <math>\Omega</math>, then the general form reduces to a statement about sums:
- <math> \varphi\left(\sum_{i=1}^{n} g(x_i)\lambda_i \right) \le \sum_{i=1}^{n} \varphi(g(x_i))\lambda_i, </math>
provided that <math> \lambda_1 + \lambda_2 + \cdots + \lambda_n = 1, \lambda_i \ge 0. </math>
There is also an infinite discrete form.
Statistical physics
Jensen's equality is of particular importance in statistical physics when the convex function is an exponential, giving:
- <math> e^{\langle X \rangle} \leq \left\langle e^X \right\rangle, </math>
where angle brackets denote expected values with respect to some probability distribution in the random variable X.
The proof in this case is very simple (cf. Chandler, Sec. 5.5). The desired inequality follows directly, by writing
- <math> \left\langle e^X \right\rangle
= e^{\langle X \rangle} \left\langle e^{X - \langle X \rangle} \right\rangle </math>
and then applying the inequality
- <math> e^X \geq 1+X \, </math>
to the final exponential.
Information theory
If p(x) is the true probability distribution for x, and q(x) is another distribution, then applying Jensen's inequality for the random variable Y(x) = q(x)/p(x) and the function φ(y) = −log(y) gives
- <math>\Bbb{E}\{\varphi(Y)\} \ge \varphi(\Bbb{E}\{Y\})</math>
- <math>\Rightarrow \int p(x) \log \frac{p(x)}{q(x)} \ge - \log \int p(x) \frac{q(x)}{p(x)} </math>
- <math>\Rightarrow \int p(x) \log \frac{p(x)}{q(x)} \ge 0 </math>
- <math>\Rightarrow - \int p(x) \log q(x) \ge - \int p(x) \log p(x), </math>
a result called Gibbs' inequality.
It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities p rather than any other distribution q. The quantity that is greater than zero is called the Kullback-Leibler distance of q from p.
Rao-Blackwell theorem
If L is a convex function, then Jensen's inequality
- <math>L(\Bbb{E}\{\delta(X)\}) \le \Bbb{E}\{L(\delta(X))\}</math>
- <math>\Rightarrow \Bbb{E}\{L(\Bbb{E}\{\delta(X)\})\} \le \Bbb{E}\{L(\delta(X))\}.</math>
So if δ(X) is some estimator of an unobserved parameter θ given a vector of observables X; and if T(X) is known to be a sufficient statistic for θ; then an improved estimator, in the sense of having a smaller expected loss L, can be obtained by calculating
- <math>\delta_{1}(X) = \Bbb{E}_{\theta}\{\delta(X') \,|\, T(X')= T(X)\},</math>
the expectation value of δ with respect to θ, taken over all possible vectors of observations X compatible with the same value of T(X) as that observed.
This result is known as the Rao-Blackwell theorem.
References
External links
- Jensen's inequality on MathWorld
- Jensen's inequality serves as the logo for the Mathematics department of Copenhagen Universityde:Jensensche Ungleichung
nl:Ongelijkheid van Jensen pl:Nierówność Jensena pt:Desigualdade de Jensen ru:Неравенство Йенсена zh:延森不等式