Jensen's inequality

From Free net encyclopedia

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function.

1 Finite form
2 General statement
- 2.1 In the language of measure theory
- 2.2 In the language of probability theory
3 Proof
4 Applications and special cases
5 References
6 External links

[edit]

Finite form

For a real continuous convex function φ and positive real weights a_i,

<math>\varphi\left(\frac{\sum a_{i} x_{i}}{\sum a_{i}}\right) \le \frac{\sum a_{i} \varphi (x_{i})}{\sum a_{i}} </math>

The inequality is reversed if φ is concave.

As a particular case, if the weights a_i are all equal to unity,

<math>\varphi\left(\frac{\sum x_{i}}{n}\right) \le \frac{\sum \varphi (x_{i})}{n}</math>

The log(x) function is concave, so substituting φ(x) = log(x), this establishes the (logarithm of) the familiar arithmetic mean-geometric mean inequality:

<math> \frac{x_1 + x_2 + \cdots + x_n}{n} \ge \sqrt[n]{x_1 x_2 \cdots x_n}.</math>

The variable x may if required be a function of another random variable (or set of random variables) t, such that x_i = g(t_i).

All of this carries directly over to the general continuous case: the weights a_i are replaced by a non-negative integrable function f(x),such as a probability distribution, for example; and the summations replaced by integrals.

[edit]

General statement

The inequality can be stated quite generally using measure theory. It can also be stated equally generally in the language of probability theory. The two statements say exactly the same thing.

[edit]

In the language of measure theory

Let (Ω,A,μ) be a measure space, such that μ(Ω) = 1. If g is a real-valued function that is μ-integrable, and if φ is convex function on the range of g, then

<math>\varphi\left(\int_{\Omega} g\, d\mu\right) \le \int_\Omega \varphi \circ g\, d\mu. </math>

[edit]

In the language of probability theory

In the terminology of probability theory, μ is a probability measure. The function g is replaced by a real-valued random variable X (just another name for the same thing, as long as the context remains one of "pure" mathematics). The integral of any function over the space Ω with respect to the probability measure μ becomes an expected value. The inequality then says that if φ is any convex function, then

<math>\varphi\left(\Bbb{E}\{X\}\right) \leq \Bbb{E}\{\varphi(X)\}.\,</math>

[edit]

Proof

Let g be a μ-integrable function on a measure space Ω, and let φ be a convex function on the range of g. Define the right-handed derivative of φ at x as

<math>\varphi^\prime(x):=\lim_{t\to0^-}\frac{\varphi(x+t)-\varphi(x)}{t}</math>

Since φ is convex, the quotient of the right-hand side is decreasing when t approaches 0 from the right, and bounded below by any term of the form

<math>\frac{\varphi(x+t)-\varphi(x)}{t}</math>

where t < 0, and therefore, the limit does always exist.

Now, let us define the following:

<math>x_0:=\int_\Omega g\, d\mu,</math>

<math>a:=\varphi^\prime(x_0),</math>

<math>b:=\varphi(x_0)-x_0\varphi^\prime(x_0).</math>

Then for all x, <math>ax+b\leq\varphi(x)</math>. To see that, take x>x₀, and define t = x − x₀ > 0. Then,

<math>\varphi^\prime(x_0)\leq\frac{\varphi(x_0+t)-\varphi(x_0)}{t}.</math>

Therefore,

<math>\varphi^\prime(x_0)(x-x_0)+\varphi(x_0)\leq\varphi(x)</math>

as desired. The case for x < x₀ is proven similarly, and clearly <math>ax_0+b=\varphi(x_0)</math>.

φ(x₀) can then be rewritten as

<math>ax_0+b=a\left(\int_\Omega g\,d\mu\right)+b.</math>

But since μ(Ω) = 1, then for every real number k we have

<math>\int_\Omega k\,d\mu=k.</math>

In particular,

<math>a\left(\int_\Omega g\,d\mu\right)+b=\int_\Omega(ag+b)\,d\mu\leq\int_\Omega\varphi\circ g\,d\mu.</math>

Q.E.D.

[edit]

Applications and special cases

[edit]

Form involving a probability density function

Suppose Ω is a measurable subset of the real line and f(x) is a non-negative function such that

<math>\int_{-\infty}^\infty f(x)\,dx = 1.</math>

In probabilistic language, f is a probability density function.

Then Jensen's inequality becomes the following statement about convex integrals:

If g is any real-valued measurable function and φ is convex over the range of g, then

<math> \varphi\left(\int_{-\infty}^\infty g(x)f(x)\, dx\right) \le \int_{-\infty}^\infty \varphi(g(x)) f(x)\, dx. </math>

If g(x) = x, then this form of the inequality reduces to a commonly used special case:

<math>\varphi\left(\int_{-\infty}^\infty x\, f(x)\, dx\right) \le \int_{-\infty}^\infty \varphi(x)\,f(x)\, dx.</math>

[edit]

Alternative finite form

If <math>\Omega</math> is some finite set <math>\{x_1,x_2,\ldots,x_n\}</math>, and if <math>\mu</math> is a normalized counting measure on <math>\Omega</math>, then the general form reduces to a statement about sums:

<math> \varphi\left(\sum_{i=1}^{n} g(x_i)\lambda_i \right) \le \sum_{i=1}^{n} \varphi(g(x_i))\lambda_i, </math>

provided that <math> \lambda_1 + \lambda_2 + \cdots + \lambda_n = 1, \lambda_i \ge 0. </math>

There is also an infinite discrete form.

[edit]

Statistical physics

Jensen's equality is of particular importance in statistical physics when the convex function is an exponential, giving:

<math> e^{\langle X \rangle} \leq \left\langle e^X \right\rangle, </math>

where angle brackets denote expected values with respect to some probability distribution in the random variable X.

The proof in this case is very simple (cf. Chandler, Sec. 5.5). The desired inequality follows directly, by writing

<math> \left\langle e^X \right\rangle

= e^{\langle X \rangle} \left\langle e^{X - \langle X \rangle} \right\rangle </math>

and then applying the inequality

to the final exponential.

[edit]

Information theory

If p(x) is the true probability distribution for x, and q(x) is another distribution, then applying Jensen's inequality for the random variable Y(x) = q(x)/p(x) and the function φ(y) = −log(y) gives

<math>\Bbb{E}\{\varphi(Y)\} \ge \varphi(\Bbb{E}\{Y\})</math>

<math>\Rightarrow \int p(x) \log \frac{p(x)}{q(x)} \ge - \log \int p(x) \frac{q(x)}{p(x)} </math>

<math>\Rightarrow \int p(x) \log \frac{p(x)}{q(x)} \ge 0 </math>

<math>\Rightarrow - \int p(x) \log q(x) \ge - \int p(x) \log p(x), </math>

a result called Gibbs' inequality.

It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities p rather than any other distribution q. The quantity that is greater than zero is called the Kullback-Leibler distance of q from p.

[edit]

Rao-Blackwell theorem

If L is a convex function, then Jensen's inequality

<math>L(\Bbb{E}\{\delta(X)\}) \le \Bbb{E}\{L(\delta(X))\}</math>

<math>\Rightarrow \Bbb{E}\{L(\Bbb{E}\{\delta(X)\})\} \le \Bbb{E}\{L(\delta(X))\}.</math>

So if δ(X) is some estimator of an unobserved parameter θ given a vector of observables X; and if T(X) is known to be a sufficient statistic for θ; then an improved estimator, in the sense of having a smaller expected loss L, can be obtained by calculating

<math>\delta_{1}(X) = \Bbb{E}_{\theta}\{\delta(X') \,|\, T(X')= T(X)\},</math>

the expectation value of δ with respect to θ, taken over all possible vectors of observations X compatible with the same value of T(X) as that observed.

This result is known as the Rao-Blackwell theorem.

[edit]