In much of statistical analysis it is assumed that observed data values represent independent samples from some underlying distribution whose form is the same for all samples (iid, or independently and identically distributed). However, there are many situations in which dependence is a more appropriate model. Examples range from population growth from one generation to the next, to the way in which genetic mutations take place over time, to pattern of spatial and temporal dependency, as for example occur as a disease spreads or pollutants disperse. Different kinds of dependency can be modeled in different ways, with one of the most well-known being as a Markov Process (named after the Russian mathematician, Andrei Markov). With this type of model the central notion is that the probability of an event or state of a system at time t is dependent only on its state at time t-1, and not on any previous states, i.e. the dependency relation is limited to one step in time, from some known initial state. This step-by-step version of a Markov Process is known as a Markov Chain. The Wikipedia page on Markov Chains provides a useful list of example application areas. In this topic we restrict our attention to discrete time, finite state Markov Chain, although there are a range of natural extensions to the concept, for example to continuous time and infinite states.

The initial state can be written as a row vector, p(1)=(p(s1), p(s2),p(s3),...,p(sn)), which lists all the possible states (often called the state space, S={s1,s2...sn}) and their values (relative proportions or probabilities) at the start time, t=1. So, for example, if there were three possible initial states, and the system starts in state 2, the row vector would be p(1)=(0 1 0). If at the start of the process 1/3 of events or entities were in each of the three states, we would have p(1)=(1/3 1/3 1/3).

If the probability of transition in a single time step from state A to state B is known or can be estimated, then we could write this as pAB. If furthermore, we have a finite number of states, say A, B and C, and we can obtain estimates for the transition probabilities between all pairs of states, including the transition 'no change', then we could write this set of probabilities as a transition matrix:

The rows of this matrix must add up to 1, since there are a fixed set of possible states. Such row-standardized matrices are called stochastic matrices. An example 3x3 stochastic matrix (after Kemeny et al., 1967, 1974 [KEM1],[KEM2]) is:

This particular matrix is presumed to represent changes in the weather from day to day, with the states, S, being A: Rain, B: Sun, and C: Snow. It is important to note that the transition matrix is assumed to be constant over time, otherwise known as stationary or time homogeneous. Because the changes in weather are to be modeled as a Markov process, the rule stated more formally is:

P(St+1=xt+1 | St=xt, St-1=xt-1,St-2=xt-2 ,...S1=x1) = P(St+1=xt+1 | St=xt)

i.e. the probability that the system will be in a particular state at time t+1 given that it was in a whole set of prior states, all the way back to a known initial state at time t=1, is simply a function of the immediately prior state, i.e. at time t. Sometimes this condition is referred to as a Markov Chain with a memory m=1. If, in this example, the initial state vector was p=(0 1 0), i.e. initially it was a sunny day, then the matrix operation: p(2)=p(1)T gives the probable weather on day 2:

Hence, starting with a sunny day, there is a 50:50 chance that the next day will be rainy or snowing, but no chance that it will be sunny. This process could be repeated, to obtain the expected state of the weather on day 3:

As is apparent, there is a chain of events here, and as noted above, a particular sequence or chain of this type is called a Markov Chain for obvious reasons. Indeed, Markov Chains can be represented as directed graphs (digraphs) where the vertices are the states and the edges are defined by the transition probabilities. Similarly, if there were only two states, for example Heads or Tails, and an initial pot of money ($100 say), then the state of the pot of money at time t+1, assuming a $1 bet on Heads was played each time, would be solely dependent on the amount of money left in the pot at time t, and the outcome of the coin toss at time t. If the outcome was Heads, the pot would increase and if Tails, would decrease. The particular sequence of Heads and Tails as time increments would result in a sequence or path that could be plotted over time, giving a 1D random walk.

The transition matrix, T, describing the day-to-day changes in the weather, has elements pij that describe the probability of transition from state i to state j. As we have noted, an interesting question is "what is the weather expected to be in 2 days from now?". Instead of undertaking the step-by-step process described above, we can examine the transition probability matrix more closely. If we consider row 1, we are looking at the transition possibility from Rain to Rain, Sun or Snow. For two days ahead, the probability of a transition from Rain to Rain, for example, is a combination of three alternative scenarios: Rain-Rain-Rain, Rain-Sun-Rain, and Rain-Snow-Rain, or algebraically the cross-product of row 1 of the matrix and column 1. Likewise, the probability of a transition from Rain to Sun over two days, and Rain to Snow over two days, can be computed as the cross product of row 1 and columns 2 and 3, respectively. The same logic applies for transitions from Sun and from Snow to each alternative state, and collectively these are simply the matrix power operation, T2. More specifically, if a stochastic matrix has dimensions r x r, representing r different possible states, the individual cell probabilities for a cell starting in state i will be in state j after n steps is simply the entry pij in the nth power of the transition matrix, Tn. These entries are often written as pij(n).

If we now apply this matrix power rule to the Rain, Sun, Snow example above, we have:

By the 6th power the matrix has reached an approximately fixed set of transition probabilities, ≈W, i.e. it has converged to an array consisting of identical row vectors w=[0.2 0.2 0.4]. Notice also that wT=w - this result is also unique, i.e. the stochastic vector w is the only such vector satisfying this equation and is known as the steady state vector.

Thus the future state of the process is completely determined by the transition probabilities and is independent of the starting state. Markov Chains of this type are known as regular. If a matrix of transition probabilities has no zero entries it will be regular. It may also be regular if it has one or more zeros, but some integer power of the matrix must have no zeros for it to be regular. In the example above, the true limit matrix, W, whose row entries add to 1, is reached in 6 steps to a very close approximation. With regular matrices a limiting matrix similar in structure to W, will always be reached as n→∞. Furthermore, the following relationship holds:

i.e. the future state of the system at time step n can be determined directly from the initial state times the matrix of transition probabilities raised to the nth power. This can be directly verified for the example above:

If a diagonal cell entry in a transition matrix is 1, any transition to that state can never leave it, and the state is described as absorbing. If it is possible to get to an absorbing state from every other state, the process is described as an Absorbing Markov Chain. In the example above, if state 2 (Sun) was an absorbing state, then eventually (after around 30 steps) the matrix powering process will converge as shown below:

More realistic examples of absorbing states are depletion (exhaustion) of a (natural) resource, death of a species, and a trading company becoming 'dormant' or wound up.

If we generalize the worked example above, we can define the absorbing transition matrix, T, as having r rows or states, with perhaps k>0 of these being absorbing states. so we have an r x r, matrix of which k rows contain a 1 in the diagonal. If we re-order the matrix so that all the absorbing states occupying the lower rows and the transition states occupying the upper rows, we can partition the matrix as follows:

Here 0 and I are the square zero matrix (all zeros) and square identity matrix (ones in the diagonal, zeros elsewhere), Q is the square sub-matrix of states that have non-absorbing transition probabilities with each other (i.e. between the various transitional states) and R is an r x k sub-matrix of transition probabilities from transitional states to absorbing states. This partitioned arrangement is sometimes referred to as the canonical form.

Now from basic matrix algebra we have:

or more generally, for an absorbing Markov Chain, in which Qn→0, we have:

where B=RN, and N=∑Qt-1, t=1...n. The entries in N represent the expected number of times the chain is in each state, rather as the sum of powered binary matrices (described in the section on matrix powers) identifies the number of alternative paths through a network. If we sum any row in N, we have the number of times a particular initial state (that row's state) transitions to other states, i.e. before being absorbed. Hence this equates to the number of steps, or time, before absorption. The entries in B adjust the values in N by the transition probabilities, R, giving the probability that each state will transition to a particular absorbing state.

A Markov Chain is called ergodic, or irreducible, if every state is reachable from every other state. If the transition matrix is considered as a digraph, this equates to a requirement that the graph permits paths from every node to every other node. Every regular chain is ergodic (ergodic chains are a superset of regular chains).

In the subsection above on absorbing chains, the matrix N arose as the sum of the powers of Qn. This matrix is known as the fundamental matrix of the process. It is useful for obtaining information about the properties of the process, such as determining the mean time to first reach a particular state from some other state (known as the mean first passage time).

For an absorbing chain it turns out that N can be obtained directly from Q, without performing the powering process, by inversion: N=(I-Q)-1. This results follows from the relation: N(I-Q)=I, and assuming N=∑Qt-1, t=1...n from above, we have (I-Q)∑Qt-1 for the left hand side (LHS) of this expression and the unit matrix, I, for the right hand side (RHS). Multiplying the LHS out, we have (I-Q)∑Qt-1 = I-Qt so if we multiply both sides by N, we have N(I-Qt)=∑Qt-1, but for large t Qt tends to 0, so the left hand side is simply N. Thus for an absorbing chain N=(I-Q)-1.

More generally, for any ergodic Markov Chain (and hence for all regular Markov Chains), we can obtain a similar result. However, since the limiting matrix of an ergodic or regular process does not tend to 0, unlike the case with powers of the Q matrix, a different model is required. One possibility is to look at a similar series, of the form: Z=I+(T-W)+(T2-W)+.... , where T is the transition matrix and W is the limiting matrix of T . We know that for large n, (Tn-W)→0, so this expansion has the form we are looking for. Somewhat surprisingly, it can be shown that (Tn-W)=(T-W)n (see Grinstead and Snell, 1997, p456-7 for the derivation [GRI1]) so the expression for Z can be re-written as Z=I+(T-W)+(T-W)2+....which is very like the series N=I+Q+Q2+.... which we have already shown is the same as N=(I-Q)-1. The series expansion in (T-W) is a convergent sequence, and is simply the expansion of: Z=(I-(T-W))-1 or Z=(I-T+W)-1. This latter expression is the fundamental matrix for ergodic chains. Note that the elements of Z will not necessarily be positive.

Mean first passage time and recurrence time

The first mean passage time of an ergodic Markov Chain (hence including any regular Markov Chain) is the expected time to move from one state, si, to another, sj. The matrix of first mean passage times is denoted M, with elements mij. The matrix M can be obtained directly from the fundamental matrix and the limiting matrix, W. The elements of M are simply:

(see Grinstead and Snell, 1997, p459-60 for the derivation, [GRI1]). For the example given at the start of this topic:

we have the symmetric fundamental matrix, Z:

and the first mean passage time matrix, M:

Thus the mean time to move from state 1 (Rain) to state 2 (Sun), for example, is 4 time units.

Central limit theorem for Markov Chains

Let the random variable Sj(n) represent the number of times an ergodic Markov Chain is in state j in the first n time periods. The expected value of Sj(n)→nwj as n→∞ and its variance is obtained from the limiting matrix and the fundamental matrix (Kemeny and Snell, 1967 [KEM1]) as:

If we take the standardized variable:

then the distribution of v is asymptotically Normal for any starting state.

References

[COL1] Collins L (1975) An Introduction to Markov Chains. CATMOG 1. Available from: http://qmrg.org.uk/catmog/

[GRI1] Grinstead C M, Snell J L (1997) Introduction to Probability, 2nd ed. AMS, 1997, Ch11. Available from: http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf

[IZQ1] Izquierdo L R, Izquierdo S S, Galán J M, Santos J I (2009) Techniques to Understand Computer Simulations: Markov Chain Analysis. Journal of Artificial Societies and Social Simulation, 12,1/6. Available online at: http://jasss.soc.surrey.ac.uk/12/1/6.html

[KEM1] Kemeny J G, Snell J L (1967) Finite Markov Chains. van Nostrand, Princeton, NJ

[KEM2] Kemeny J G, Snell J L, Thompson G L (1974) Introduction to Finite Mathematics. 3rd ed., Prentice-Hall, Englewood Cliffs, NJ

Wikipedia: Markov Chains: http://en.wikipedia.org/wiki/Markov_chain