Markov chain
From Free net encyclopedia
In mathematics, a Markov chain is a discrete-time stochastic process with the Markov property named after Andrey Markov. In such a process, the previous states are irrelevant for predicting the subsequent states, given knowledge of the current state.
A Markov chain describes at successive times the states of a system. At these times the system may have changed from the state it was in the moment before to another or stayed in the same state. The changes of state are called transitions. The Markov property means the system is memoryless, i.e. It does not "remember" the states it was in before, just "knows" its present state, and hence bases its "decision" to which future state it will transit purely on the present, not considering the past.
Contents |
Definition
A Markov chain is a sequence X1, X2, X3, ... of random variables with the property (Markov property): the conditional probability distribution of the next future state Xn+1 given the present and past states is a function of the present state Xn alone, i.e.:
- <math>\Pr(X_{n+1}=x|X_0=x_0, X_1=x_1,\ldots, X_n=x_n) = \Pr(X_{n+1}=x|X_n=x_n).\,</math>
The range of the variables, i.e., the set of their possible values, is called the state space, the value of Xn being the state of the process at time n.
There are also continuous-time Markov processes. Some authors reserve the word "chain" for the discrete-time case.
A simple way to visualise a specific type of Markov chain is through a finite state machine. If you are at state y at time n, then the probability that you will move on to state x at time n + 1 does not depend on n, and only depends on the current state y that you are in. Hence at any time n, a finite Markov chain can be characterized by a matrix of probabilities whose x, y element is given by <math> \Pr(X_{n+1}=x|X_n=y) \, </math> and is independent of the time index n. These kinds of discrete finite Markov chains can also be described by a directed graph, where the edges are labeled by the probabilities of going from one state to the other state that are on either end of the directed edge.
Andrey Markov produced the first results (1906) for these processes. A generalization to countably infinite state spaces was given by Kolmogorov (1936). Markov chains are related to Brownian motion and the ergodic hypothesis, two topics in physics which were important in the early years of the twentieth century, but Markov appears to have pursued this out of a mathematical motivation, namely the extension of the law of large numbers to dependent events.
Properties of Markov chains
A Markov chain is characterized by the conditional distribution
- <math> \Pr(X_{n+1}=x| X_n)\, </math>
which is called the transition probability of the process. This is sometimes called the "one-step" transition probability. The probability of a transition in two, three, or more steps is derived from the one-step transition probability and the Markov property:
- <math> \Pr(X_{n+2}=x|X_n) = \int \Pr(X_{n+2}=x,X_{n+1}=y|X_n)\,dy
= \int \Pr(X_{n+2}=x|X_{n+1}=y) \, \Pr(X_{n+1}=y|X_n) \, dy</math>
Likewise,
- <math> \Pr(X_{n+3}=x|X_n) = \int \int \Pr(X_{n+3}=x|X_{n+2}=y)\, \Pr(X_{n+2}=y|X_{n+1}=z) \, \Pr(X_{n+1}=z|X_n) \, dz \, dy</math>
These formulas generalize to arbitrary future times n + k by multiplying the transition probabilities and integrating k − 1 times.
The marginal distribution Pr(Xn = x) is the distribution over states at time n. The initial distribution is Pr(X0 = x). The evolution of the process through one time step is described by
- <math> \Pr(X_{n+1}=x) = \int P(X_{n+1}=x|X_n=y)\,P(X_n=y)\,dy </math>
This is a version of the Frobenius-Perron equation. There may exist one or more state distributions π such that
- <math> \pi(x) = \int \Pr(X=x|Y=y)\,\pi(y)\,dy</math>
where Y is just a convenient name for the variable of integration. Such a distribution π is called a stationary distribution or steady-state distribution, or sometimes a Gibbs state. A stationary distribution is an eigenfunction of the conditional distribution function, associated with the eigenvalue 1.
Whether there is a stationary distribution, and whether it is unique if it does exist, are determined by certain properties of the process. Irreducible means that every state is accessible from every other state. A process is periodic if there exists at least one state to which the process will continually return with a fixed time period (greater than one). Aperiodic means that there is no such state. Positive recurrent means that the expected return time is finite for every state. Sometimes the terms indecomposable, acyclic, and persistent are used as synonyms for "irreducible", "aperiodic", and "recurrent", respectively. When the state space of a Markov chain is not irreducible, it may be partitioned into a set of (irreducible) communicating classes, each of which may be classified as above. The problem of classification is an important one in the mathematical study of Markov chains and related stochastic processes.
If the Markov chain is positive recurrent, there exists a stationary distribution. If it is positive recurrent and irreducible, there exists a unique stationary distribution, and furthermore the process constructed by taking the stationary distribution as the initial distribution is ergodic. Then the average of a function f over samples of the Markov chain is equal to the average with respect to the stationary distribution,
- <math> \lim_{n\rightarrow\infty}\; \frac{1}{n} \; \sum_{k=0}^{n-1} f(X_k)
= \int f(x)\,\pi(x)\,dx </math>
In particular, this holds for f equal to the identity function. Thus the average of sample values over time is equal to the expected value of the stationary distribution.
Furthermore, the equivalence of averages also holds if f is the indicator function on some subset A of the state space.
- <math> \lim_{n\rightarrow\infty}\; \frac{1}{n} \; \sum_{k=0}^{n-1} \chi_A(X_k)
= \int_A \pi(x)\,dx = \mu_{\pi}(A) </math>
where μπ is the measure induced by π. This makes it possible to approximate the stationary distribution by a histogram or other density estimate of a sequence of samples.
Markov chains in discrete state spaces
If the state space is finite, the transition probability distribution can be represented as a matrix, called the transition matrix, with the (i, j)'th element equal to
- <math>P_{ij} = \Pr(X_{n+1}=j\mid X_n=i) \,</math>
For a discrete state space, the integrations in the k-step transition probability are summations, and can be computed as the k'th power of the transition matrix. That is, if P is the one-step transition matrix, then Pk is the transition matrix for the k-step transition.
The stationary distribution is a vector which satisfies the equation
- <math> \pi^{T}\mathbf{P} = \pi^{T},</math>
where <math>\pi^{T}</math> is the transpose of <math>\pi</math>. In other words, the stationary distribution <math>\pi</math> is a left eigenvector of the transition matrix, associated with the eigenvalue 1.
As a consequence, neither the existence nor the uniqueness of a stationary distribution is guaranteed for a general transition matrix P. However, if the transition matrix P is irreducible and aperiodic, then there exists a unique stationary distribution <math>\pi</math>. In addition, Pk converges elementwise to a rank-one matrix in which each row is the (transpose of the) stationary distribution <math>\pi^T</math>, that is
- <math>\lim_{k\rightarrow\infty}\mathbf{P}^k=\mathbf{1}\pi^T,</math>
where <math>\mathbf{1}</math> is the column vector with all entries equal to 1. This is stated by the Perron-Frobenius theorem.
This means that if we simulate or observe a random walk with transition matrix P, then the long-term probability of presence of the walker in a given state is independent from where the chain was started, and is dictated by the stationary distribution. The random walk "forgets" the past. In short, Markov chains are the "next thing" after memoryless processes (i.e., a sequence of independent identically distributed random variables).
A Markov chain is reversible if there exists an initial distribution <math>\pi</math> such that <math>\pi_i*p_{ij} = \pi_j*p_{ji}</math>. For reversible Markov chains, <math>\pi</math> is always a stationary distribution.
A transition matrix which is positive (that is, every element of the matrix is positive) is irreducible and aperiodic. A matrix is a stochastic matrix if and only if it is the matrix of transition probabilities of some Markov chain.
The special case of the transition probability being independent of the past is known as the Bernoulli scheme. A Bernoulli scheme with only two possible states is known as a Bernoulli process.
Scientific applications
Markovian systems appear extensively in physics, particularly statistical mechanics, whenever probabilities are used to represent unknown or unmodelled details of the system, if it can be assumed that the dynamics are time-invariant, and that no relevant history need be considered which is not already included in the state description.
Markov chains can also be used to model various processes in queueing theory and statistics. Claude Shannon's famous 1948 paper A mathematical theory of communication, which at a single step created the field of information theory, opens by introducing the concept of entropy through Markov modeling of the English language. Such idealised models can capture many of the statistical regularities of systems. Even without describing the full structure of the system perfectly, such signal models can make possible very effective data compression through entropy coding techniques such as arithmetic coding. They also allow effective state estimation and pattern recognition. The world's mobile telephone systems depend on the Viterbi algorithm for error-correction, while Hidden Markov models (where the Markov transition probabilities are initially unknown and must also be estimated from the data) are extensively used in speech recognition and also in bioinformatics, for instance for coding region/gene prediction.
The PageRank of a webpage as used by Google is defined by a Markov chain. It is the probability to be at page i in the stationary distribution on the following Markov chain on all (known) webpages. If N is the number of known webpages, and a page i has ki links then it has transition probability (1-q)/ki + q/N for all pages that are linked to and q/N for all pages that are not linked to. The parameter q is taken to be about 0.15.
Markov chain methods have also become very important for generating sequences of random numbers to accurately reflect very complicated desired probability distributions - a process called Markov chain Monte Carlo or MCMC for short. In recent years this has revolutionised the practicability of Bayesian inference methods.
Markov chains also have many applications in biological modelling, particularly population processes, which are useful in modelling processes that are (at least) analogous to biological populations.
A recent application of Markov chains is in geostatistics. That is, Markov chains are used in two to three dimensional stochastic simulations of discrete variables conditional on observed data. Such an application is called "Markov chain geostatistics", similar with kriging geostatistics. The Markov chain geostatistics method is still in development.
Markov chains can be used to model many games of chance. The children's games Chutes and Ladders and Candy Land, for example, are represented exactly by Markov chains. At each turn, the player starts in a given state (on a given square) and from there has fixed odds of moving to certain other states (squares).
Markov parody generators
Markov processes can also be used to generate superficially "real-looking" text given a sample document: they are used in various pieces of recreational "parody generator" software (see dissociated press, Jeff Harrison, Mark V Shaney or [1]). Markov chains have also been used in music composition.
See also
- Hidden Markov model
- Examples of Markov chains
- Markov process
- Markov decision process
- Shift of finite type
- Mark V Shaney
- Phase-type distribution
References
- A.A. Markov. "Rasprostranenie zakona bol'shih chisel na velichiny, zavisyaschie drug ot druga". Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom universitete, 2-ya seriya, tom 15, pp 135-156, 1906.
- A.A. Markov. "Extension of the limit theorems of probability theory to a sum of variables connected in a chain". reprinted in Appendix B of: R. Howard. Dynamic Probabilistic Systems, volume 1: Markov Chains. John Wiley and Sons, 1971.
- Leo Breiman. Probability. Original edition published by Addison-Wesley, 1968; reprinted by Society for Industrial and Applied Mathematics, 1992. ISBN 0-89871-296-3. (See Chapter 7.)
- J.L. Doob. Stochastic Processes. New York: John Wiley and Sons, 1953. ISBN 0-471-52369-0.
External links
- Generating Text (About generating random text using a Markov chain.)
- The World's Largest Matrix Computation (Google's PageRank as the stationary distribution of a random walk through the Web.)
- Disassociated Press in Emacs approximates a Markov process
- The Markov Bible Markov Chain rewrite of the Bible.ar:سلاسل ماركوف
de:Markow-Kette es:Cadena de Markov fr:Chaîne de Markov it:Processo markoviano he:שרשרת מרקוב lt:Markovo grandinė nl:Markov-keten pl:Proces Markowa pt:Cadeias de Markov ru:Цепь Маркова su:Ranté Markov sv:Markovkedja vi:Xích Markov zh:马尔可夫链