Skip to main content

Classification of States

Recall that in the beginning, we described 4 types of problems/questions that we’re interested in solving:

  1. Short-term questions (e.g. distribution of XnX_n for any fixed nn) → can be solved using Chapman-Kolmogorov Equation
  2. Strategy-related questions (e.g. probability of being absorbed at a given absorption state, expected time to absorption, expected number of visits through a state, etc.) → solved using first step analysis
  3. Long-run performance (e.g. when nn \to \infty, what can we claim about XnX_n?)
  4. Extensions

Now, we’re focusing on long-run performance questions.

Consider the gambler’s ruin example. We know that there are 2 absorption states: 00 and NN (once we reach either of these states, we never leave, i.e., the process (effectively) stops and we’re done).

We’re interested in knowing whether XnX_n always achieves 00 or NN? Or is there a probability that the gambler will continue playing for infinite time and never go broke or reach NN dollars? What about the case when there is only one absorbing state 00 (i.e., “greedy” gambler scenario who does not set any quitting rule for himself)?

More generally, we see that there are 2 “kinds” of states for the gambler’s ruin problem:

  • Absorbing states → once reached, never escape
  • “Non-Absorption” states → the process just “passes through” such states and does not “stop” when it reaches it.

Can we find “similar” kinds of states for other problems (so that our results are more generalizeable).

For example, in the stationary MC below, we see that states 5 and 6 are not absorbing states, but they form a “trap” for the walker (or the process)

Screenshot 2023-09-16 at 4.57.17 PM.png


Definitions

Accessibility

definition

For a stationary MC {Xn,n=0,1,2,}\{X_n, n=0,1,2,\dots\} with transition probability matrix PP, state jj is said to be accessible from state ii, denoted by iji \to j, if Pij(m)>0P_{ij}^{(m)}>0 for some m0m \geq 0.

The meaning is quite simple: if there is a path of finite length with non-zero probability from iji \to j, then we say that jj is accessible from ii. When we say “path with non-zero probability” it means that all the edges along the path are “possible”.

Note that the requirement is not on the one-step transition probability, i.e., there does not have to be a “direct” edge (path of length 11) from iji \to j.

If Pij(m)>0P_{ij}^{(m)} > 0, it means that there is at least one path of length mm with non-zero probability from iji \to j. For example, if Pij(2)P_{ij}^{(2)}, there is at least one path ikji-k-j with non-zero probability (here, - denotes one transition).

Equivalently, this means that starting with ii, we can access jj at some finite step nn with non-zero probability.

Note: Unlike “distance”, the accessibility only considers one direction. That is, just because iji \to j, it does not impy jij \to i. An easy example is when we set i=1i=1 and j=0j=0 in the gambler’s ruin - since j=0j=0 is an absorbing state, it can never reach i=1i=1. However, the gambler can reach j=0j=0 from i=1i=1 by simply losing the first round (which has non-zero probability qq).

Communication

definition

If two states ii and jj are accessible from each other, i.e., iji \to j and jij \to i, then they are said to communicate, denoted by iji \longleftrightarrow j.

“Communicate” basically means that both the states can “send information to each other”. In such a case, we regard ii and jj to be in the same “communication class”.

Imagine you put some “information” in state ii, then that information can “flow” to all states that are accessible from ii. Similarly, if there is “information” in 2 states ii and jj and they’re in the same communication class, this information can be exchanged between the states over time, i.e., they can communicate with each other.

note

From a computer science (or graph theory) perspective, we can say that the two nodes are “strongly connected”.

It is quite obvious that accessibility can be inherited - i.e., if iji \to j and jkj \to k, then iki \to k (there is definitely a path with non-zero probability from iki \to k via jj - there might be other paths too, but at least there is this path).

Also, every state belongs to one and only one “communication class” (group of all the nodes that can communicate with each other): there is no overlapping, and no “leftover”. (think of this as the strongly connected components in graphs)

Communication classes form a partition of all the states → they are mutually disjoint and the union of all the classes must be the set of all possible states. (this is a direct consequence of the fact that the communication relationship is an equivalence relation, explained below).

So far it seems okay, but it’s still not clear why we care about communication classes. What’s the point of dividng the states by “communication”?

note

Communication is an equivalence relation, i.e.,

  1. Reflexive: iji \longleftrightarrow j (Pii(0)=1>0P_{ii}^{(0)} = 1 > 0 for any ii, given that you’re at ii,in 00 steps, you are sure to remain at ii)
  2. Symmetric: ij    jii \longleftrightarrow j \implies j \longleftrightarrow i (by definition)
  3. Transitivity: ijjk    iki \longleftrightarrow j \land j \longleftrightarrow k \implies i \longleftrightarrow k (by Chapman-Kolmogorov equation)

Any 2 states in a communication class can communicate with each other, i.e., ALL states of a communication class can communicate with (and ONLY with) ALL other states of the same communication class.

A typical “form” of a markov chain (without probabilities) looks something like this:

Screenshot 2023-09-17 at 9.19.17 AM.png

Here, each rectangle contains one class. No 2 classes cannot communicate with each other (otherwise they can be “merged” into a single class). And, we can see a “flow” between classes.

note

Intuitively, we can see that in the above diagram, all the “flow” will be pushed out (eventually) of the top-left class into all the other classes (because all the arrows point outward).

Let’s define another term now:

definition

Reducible Chain: An MC is irreducible if ALL the states communicate with one another (i.e., there is a single communication class). Otherwise, the chain is said to be reducible (more than one communication class).

Why “reducible”? Because instead of focusing on each state, we can “reduce” the MC to its communication classes and deal with the “flow” between classes first, and then look into each class (separately). In particular, we can reduce the dimension of the transition probability matrix while we're performing FSA if we're only interested in finding out which classes the process enters. This is possible only because the communication classes (by defn.) are “isolated” from each other in some sense.

Note that an irrdeucible MC can have finite states, or infinitely many states. More generally, a communication class can have a finite or infinite number of states.

Recurrent and Transient States

Screenshot 2023-09-17 at 9.25.37 AM.png

In the above example, we can (intuitively) make the following observations. In the long run,

  • The flow will be trapped in states 4/3/2 finally
  • If the flow starts from 1, then it will end in 2/3
  • If the flow starts from 7, it may return to 7 for some turns but finally end in 2
  • If the flow starts from 8, it may pass by 5/6/7 first but still end in 4/3/2 finally

In general, we see that the flow (or the process) will converge to some states (recurrent) finally. In the process, it may pass by some states (transient) but will leave them in the long run.

We want to describe this intuition more rigorously (and more quantitatively).

To do so, we need to define another term:

definition

Return Probability: For any state ii, recall the probability that starting from state ii and returns at ii at the nnth transition is that:

Pii(n)=P(Xn=iX0=i)P_{ii}^{(n)} = P(X_n=i|X_0=i)

According to the definition, Pii(0)=1,Pii(1)=PiiP_{ii}^{(0)}=1, P_{ii}^{(1)}=P_{ii}

The return probabilty Pii(n)P_{ii}^{(n)} only captures the performance of XnX_n, NOT related to the path: e.g. the walker can start from ii and stay there until the nnth step, or leave ii, passes by ii several times and comes back at nnth step, or leaves ii, visits other states and returns to ii at the nnth step.

The probability of ALL these paths of length nn whose final state (and initial state) is ii are summed up to Pii(n)P_{ii}^{(n)}.

note

If Pii(n)0P_{ii}^{(n)} \to 0 when nn \to \infty, then the state ii is transient.

What this means is that “in the long run”, it is unlikely that the system ever returns to state ii.

We can calculate Pii(n)P_{ii}^{(n)} quite simply by raising the transition matrix to the nnth power (using Chapman-Kolmogorov equation). But the problem here is that we're more interested in finding out the distribution as nn \to \infty and whether or not the process ever returns back to the state. For this purpose, Pii(n)P_{ii}^{(n)} is not very good because it cannot measure the probability that the process revisits ii in the long run (we can't just sum up all the Pii(n)P_{ii}^{(n)}'s under the limit nn \to \infty because they're not disjoint events).

So, we can further narrow this to define a term “first return probability” → the probability that we return back to the state for the first time at XnX_n.

definition

First Return Probability: For any state ii, define the probability that starting from state ii, the first return to ii is at the nnth transition.

fii(n)=P(X1i,X2i,,Xn1i,Xn=iX0=i)f_{ii}^{(n)} = P(X_1\neq i, X_2 \neq i, \dots, X_{n-1} \neq i, X_n = i|X_0=i)

Note that we define (since it is difficult to claim what is the “first visit”, we simply set) fii(0):=0f_{ii}^{(0)} := 0 (impossible to “return” to ii in 0 steps because we already start from ii)

Since it is the “first revisit”, n=1fii(n)1\sum_{n=1}^\infty f_{ii}^{(n)} \leq 1 (all the paths that contribute to fii(n)f_{ii}^{(n)} for a given nn are unique to that nn → there is no overlapping, and so, the sum of all fii(n)f_{ii}^{(n)} over nn is simply the sum of the probabilities of the union of a disjoint set of paths (basically a subset of ALL possible paths) and so it must be 1\leq 1).

Naturally, fii(n)Pii(n)f_{ii}^{(n)} \leq P_{ii}^{(n)} (since fii(n)f_{ii}^{(n)} only includes a subset of the paths whose nnth state is ii whereas Pii(n)P_{ii}^{(n)} includes all such paths whose nnth state is ii). Intuitively, fii(n)f_{ii}^{(n)} imposes a stricter constraint on the path itself (so, the event for fii(n)f_{ii}^{(n)} is a subset of the event for Pii(n)P_{ii}^{(n)}) and so, the first return probability is of course, \leq the return probability.

note

When does the equality hold? That is, when is n=1fii(n)=1\sum_{n=1}^\infty f_{ii}^{(n)} = 1?

Intuitively, this means that in the long run, the process always comes back to ii at least once. We are guaranteed that there is at least one revisit to ii in the future.

Connection between fii(n)f_{ii}^{(n)} and Pii(n)P_{ii}^{(n)}

What is the connection between fii(n)f_{ii}^{(n)} and Pii(n)P_{ii}^{(n)}?

Again, based on our (deep 😲) understanding of the terms themselves, we can “decompose” all the paths that contribute to Pii(n)P_{ii}^{(n)} into (disjoint) sets depending on when they “first” return back to ii. That is, every path whose nnth state is ii, can be broken down to (path from ii to first return back to ii) + (remaining path). For example, if we’re interested in finding P11(3)P_{11}^{(3)}, i.e., the probability that we come back to state 11 on step 33 ( suppose that we have 3 states - 1, 2, 3 - none of them are absorbing, and we aren’t really interested in the actual probabilities but the paths that contribute to it instead):

Untitled

Let’s now “derive” this intuition mathematically. According to the law of total probability,

Pii(n)=k=1nP(first arrives i at step kX0=i)×P(Xn=iXk=i)=k=1nfii(k)Pii(nk)=k=0nfii(k)Pii(nk)\begin{equation*} \begin{split} P_{ii}^{(n)} &= \sum_{k=1}^nP(\text{first arrives }i \text{ at step }k|X_0=i)\times P(X_n=i|X_k=i) \\ &= \sum_{k=1}^n f_{ii}^{(k)}P_{ii}^{(n-k)} \\ &= \sum_{k=0}^n f_{ii}^{(k)}P_{ii}^{(n-k)} \end{split} \end{equation*}

The last equality comes from fii(0)=0f_{ii}^{(0)} = 0 (so adding an extra term doesn’t matter because it evaluates to 00 anyway).

theorem

Theorem: Return Probability and First Return Probability

Pii(n)=k=0nfii(k)Pii(nk)P_{ii}^{(n)} = \sum_{k=0}^n f_{ii}^{(k)}P_{ii}^{(n-k)}

Remarks:

  • Such type of formula is called a “convolution
  • We can find Pii(k)P_{ii}^{(k)} for all values of kk between 11 and nn by raising the transiiton probability matrix to the respective power (using Chapman-Kolmogorov’s equation), i.e., Pii(k)=PiikP_{ii}^{(k)} = P_{ii}^k

Note: Recurrency   ̸ ⁣ ⁣    Pii(n)1\;\not\!\!\implies P_{ii}^{(n)} \to 1.

For example, consider the following markov chain:

Untitled

Then, all of 1,2,31,2,3 are recurrent states. Pii(n)={1,n=3k0,n3kP_{ii}^{(n)} = \begin{cases} 1, n = 3k \\ 0, n \neq 3k \end{cases}

In other words, there is no (convergent) “limit” as to what happens when nn \to \infty and so, it’s not always possible to conclude that “recurrent” necessarily implies that Pii(n)=1P_{ii}^{(n)} = 1 as nn \to \infty.

Let’s also define

fii=n=0fii(n)=limNn=0Nfii(n)f_{ii}= \sum_{n=0}^\infty f_{ii}^{(n)} = \lim_{N \to \infty} \sum_{n=0}^N f_{ii}^{(n)}

In words, fiif_{ii} denotes the probability of revisiting ii in the future (we cannot directly use Pii(n)P_{ii}^{(n)} to denote such a thing because the PiiP_{ii}’s would not be disjoint, i.e., we would end up overcounting a lot of the paths).

Obviously, fii1f_{ii} \leq 1 since it is a probability. (we’re making use of the fact that all of the fiif_{ii}’s are disjoint events, i.e., for any jkj \neq k, the event of fii(j)f_{ii}^{(j)} is disjoint from fii(k)f_{ii}^{(k)} → it cannot be that the “first visit” of state ii on the same path occurs at both jj and kk → by definition, we’re only concerned with the first visit. So, each path that contains state ii contributes to exactly 1 fii(n)f_{ii}^ {(n)} term depending on when state ii was first revisited).

With all this information in mind, we can (finally!) define what recurrent and transient states actually are.

definition

A state ii is said to be recurrent if fii=1f_{ii}=1, and transient if fii<1f_{ii}<1

If we can guarantee that the process (or the “walker” 🤣) will revisit state ii (in finite steps!) for sure, then state ii is recurrent. Otherwise, the flow/process will eventually leave state ii.

In the MC shown above (with 3 states), all of 1,2,31,2,3 are recurrent states since we can guarantee that in exactly 3 steps, the process will come back to the current state, i.e., fii(3)=1f_{ii}^{(3)} = 1 for all i=1,2,3i=1,2,3 and that is in fact, the only non-zero term in the summation. So, n=0fii(n)=fii(3)=1\sum_{n=0}^\infty f_{ii}^{(n)} = f_{ii}^{(3)}=1, and hence they’re all recurrent.

We can also think of recurrent states as those states which will contain the “flow” (process) in the long run.

To reiterate, recurrent state     fii=1\iff f_{ii}=1. The intuition is that the flow will come back on and on (again and again) and it seems to be trapped.

We cannot use Pii(n)1P_{ii}^{(n)} \to 1 as nn \to \infty as being “recurrent” because of the counter-example we saw earlier (3 states forming a cycle with all probabilities of transition being 11).

note

fiif_{ii} is the total probability of revisiting the state ii in the long run. Pii(n)P_{ii}^{(n)} as nn \to \infty would be the probability of visiting the state ii at a time very very far into the future. They’re not the same. In fact, Pii(n)1P_{ii}^{(n)} \to1 intuitively feels like an “absorbing” state (we are guaranteed that the process will be at state ii after a very long time - stable system). But recurrent \neq absorption. An absorption state is always recurrent (since fii(1)=1f_{ii}^{(1)}=1, the sum evaluates to 11) but a recurrent state need not be an absorption state (consider the simplest case where we have 121 \longleftrightarrow 2 with probability 11 and the process flips-flops between the 2 states, never being absorbed but is still guaranteed to visit the current state in exactly 2 steps)

Number of Revisits

Intuitively, we “feel” that since we know that probabilistically, the system will always visit a recurrent state in a finite number of steps, as time goes to infinity, the recurrent state is visited an infinite number of times.

We can write it mathematically as such: Let NiN_i be the number of times that the MC revisits state ii:

Ni=i=0I(Xn=i)N_i = \sum_{i=0}^\infty I(X_n=i)

Then “recurrent” means Ni=N_i= \infty (think what happens for an absorption state: once we visit an absorption state, we will always be trapped there, and so, we will count it infinite number of times (with each increment of (discretized time) nn)).

On the contrary, transient means that there may be revisits, but limited number of them, i.e., Ni<N_i < \infty (i.e, it should be a finite number).

In conclusion, to show that this new definition of recurrent and transient is meaningful, we need to prove that it’s the same as the previous one. That is,

E[NiX0=i]={=,fii=1<,fii<1E[N_i|X_0=i] = \begin{cases} = \infty, \quad f_{ii}=1 \\ < \infty, \quad f_{ii} < 1 \end{cases}
theorem

Theorem: Number of Revisits

For a state ii, consider the expected number of visits to ii, E[NiX0=i]E[N_i|X_0=i], then:

  1. If fii<1f_{ii} < 1 (i.e., ii is transient), there is E[NiX0=i]=fii1fiiE[N_i|X_0=i] = \dfrac{f_{ii}}{1-f_{ii}}
  2. If fii=1f_{ii} = 1 (i.e., ii is transient), there is E[NiX0=i]=E[N_i|X_0=i] = \infty

If we can prove the above theorem, we can conclude that both definitions of “recurrent”/”transient” (using fiif_{ii} and using NiN_i) are equivalent.

So, what are we waiting for? Let’s go ahead and prove this theorem.

Proof

Ni1N_i \geq 1 means that when we start from state ii, there is at least one revisit. By our definition of fiif_{ii}, we can write it as:

P(Ni1X0=i)=k=1fii(k)=fiiP(N_i \geq 1|X_0=i) = \sum_{k=1}^\infty f_{ii}^{(k)} = f_{ii}

Both the LHS and RHS are equivalent ways of saying “probability that we will revisit state ii at least once in the future”.

Now, consider the event Ni2N_i \geq 2 → there are at least 2 revisits to state ii. Then, using law of total probability, we can write it as:

P(Ni2X0=i)=k=1P(Ni2first revisit happens at step k)P(first revisit happens at step k)=k=1P(Ni2Xk=i)fii(k)=k=1P(Ni1X0=i)fii(k)since we already revisited it once at step k=P(Ni1X0=i)k=1fii(k)=fii2\begin{equation*} \begin{split} P(N_i \geq 2 | X_0 = i) &= \sum_{k=1}^\infty P(N_i\geq 2|\text{first revisit happens at step }k)P(\text{first revisit happens at step }k) \\ &= \sum_{k=1}^\infty P(N_i\geq 2|X_k=i)f_{ii}^{(k)} \\ &= \sum_{k=1}^\infty P(N_i\geq 1|X_0=i)f_{ii}^{(k)} \quad \text{since we already revisited it once at step }k \\ &= P(N_i \geq 1|X_0=i)\sum_{k=1}^\infty f_{ii}^{(k)} = f_{ii}^2 \end{split} \end{equation*}

In general, we have: P(NimX0=i)=fiimP(N_i \geq m | X_0=i) = f_{ii}^m (we can easily show this by induction just as we’ve shown it for m=2m=2 using m=1m=1)

note

In the third step of the derviation above, we “shifted” the MC by kk time units back since the probabilistic structure remains unchanged. In particular, the probability of visiting state ii twice given that you’ve already revisited it once is the same as the probability of revisting state ii one more time (having already revisited it once in the past doesn’t change the probability of revisiting it again in the future). So, we “reset” the counter (after visiting it at step kk) and pretend as if we have to start over. It’s a neat trick to convert the XkX_k to X0X_0 so that we can pull it out of the summation (since it does not depend on kk) and then only calculate the fiif_{ii}’s.

Now that we have the probabilities, we can calculate the expectation:

E[NiX0=i]=k=1kP(Ni=kX0=i)=k=1j=1kP(Ni=kX0=i)=j=1k=jP(Ni=kX0=i) Reorder=j=1P(NijX0=i)\begin{equation*} \begin{split} E[N_i|X_0=i] &= \sum_{k=1}^\infty kP(N_i =k|X_0=i) \\ &= \sum_{k=1}^\infty \sum_{j=1}^k P(N_i=k|X_0=i) \\ &= \sum_{j=1}^\infty \sum_{k=j}^\infty P(N_i=k|X_0=i) \text{ Reorder} \\ &= \sum_{j=1}^\infty P(N_i \geq j|X_0=i) \end{split} \end{equation*}

Here’s a simple illustration on how we “regroup” the terms in the expectation to fit the form that we need (it’s just a new way of thinking about the terms). Suppose we only had 3 values of XX - 1, 2, 3 - and we had to calculate E[X]E[X]. Then,

Untitled

Okay, now we have: E[NiX0=i]=j=1P(NjX0=i)E[N_i|X_0=i] = \sum_{j=1}^\infty P(N\geq j|X_0=i). But we already know that P(NijX0=i)=fiijP(N_i \geq j|X_0=i)=f_{ii}^j.

E[NiX0=i]=j=1fiij=fii1fiiE[N_i|X_0=i] = \sum_{j=1}^\infty f_{ii}^j = \frac{f_{ii}}{1 - f_{ii}}

The last equality holds only when fii<1f_{ii} < 1 (because only under that assumption, we can apply the sum of geometric series formula here).

Consdider the recurrent case where fii=1f_{ii}=1. All the previous analysis holds except the last equality: so we have P(NimX0=i)=fiim=1P(N_i \geq m|X_0=i) = f_{ii}^m = 1 for all values of mm. (obviously, the series will not converge if all the terms are 11 and we’re summing an infinite number of terms). Then, E[NiX0=i]=k=1fiikE[N_i|X_0=i] = \sum_{k=1}^\infty f_{ii}^k \to \infty

Hence, we’ve proved both cases (transient + recurrent) of the theorem. 🪦

Remarks on the theorem:

  1. This theorem sets the theoretical evidence for the definition of recurrent/transient states.
  2. For transient states, the expected number of revisits increases when fiif_{ii} increases. However, eventually, we will leave ii never to return, i.e., we will never revisit ii again in the long run.
  3. For recurrent states, the expected number of revisits is infinite, which means that the process revisits it again and again (forever) in the long run (there is no possibility of never visiting it in the future)

Coming Full Circle

Recall that we had begun our discussion with Pii(n)P_{ii}^{(n)}. Now, we wonder where it fits in the picture and how it relates to our result. That is, we want to relate Pii(n)P_{ii}^{(n)} to our definitions of transient and recurrent states.

Consider the definition of NiN_i. We can express Ni=n=1I(Xn=i)N_i = \sum_{n=1}^\infty I(X_n=i) → number of times we revisit state ii (and each time we visit ii, it is captured by the indicator function turning “on”).

Then,

E[NiX0=i]=E[n=1I(Xn=i)X0=i]=n=1E[I(Xn=i)X0=i] linearity of expectation=n=1P(Xn=iX0=i) property of indicator function=n=1Pii(n)\begin{equation*} \begin{split} E[N_i|X_0=i] &= E \left [\sum_{n=1}^\infty I(X_n=i) | X_0=i \right] \\ &= \sum_{n=1}^\infty E[I(X_n=i)|X_0=i] \text{ linearity of expectation} \\ &= \sum_{n=1}^\infty P(X_n=i|X_0=i) \text{ property of indicator function} \\ &= \sum_{n=1}^\infty P_{ii}^{(n)} \end{split} \end{equation*}

From the previous theorem, we know that E[NiX0=i]E[N_i|X_0=i] is finite for transient states and infinite for recurrent states. Then, we have:

  1. n=1Pii(n)\sum_{n=1}^\infty P_{ii}^{(n)} is finite for transient states
  2. n=1Pii(n)\sum_{n=1}^\infty P_{ii}^{(n)} is infinite for recurrent states

So, we can define recurrent and transient states using this definition too, as such:

note

State ii is recurrent if, and only if, n=1Pii(n)=\sum_{n=1}^\infty P_{ii}^{(n)} = \infty

What does the above theorem really mean?

  1. For recurrent states, starting from any m>0m > 0, we have n=mPii(n)=\sum_{n=m}^\infty P_{ii}^{(n)} = \infty, which means there is always a significant probability to return back to ii even after a very long time.
  2. For transient states, the revisitng probability n=mPii(n)0\sum_{n=m}^\infty P_{ii}^{(n)} \to 0 when mm \to \infty. This means that in the long run, the probability of coming back vanishes (and we can say that “all the flow eventually moves out of the state, never to return”).

Phew. To help recap, here’s a quick summary of everything we’ve discussed so far (all the equivalent ways of saying “transient” or “recurrent”):

Recurrent    fii=1    n=1Pii(n)=    E[NiX0=i]=\text{Recurrent} \iff f_{ii} =1 \iff \sum_{n=1}^\infty P_{ii}^{(n)}=\infty \iff E[N_i|X_0=i] = \infty
Transient    fii<1    n=1Pii(n)<    E[NiX0=i]<\text{Transient} \iff f_{ii} <1 \iff \sum_{n=1}^\infty P_{ii}^{(n)}<\infty \iff E[N_i|X_0=i] < \infty

States in the Same Class

Until now, we’ve discussed what transient and recurrent states are. But how does it relate to the notion of communication classes we had talked about at the beginning?

It turns out that there’s a pretty neat theorem for this.

note
  1. If a state ii is recurrent and iji \longleftrightarrow j, then state jj is recurrent.

  2. If a state ii is transient and iji \longleftrightarrow j, then state jj is transient.

Basically, it means that the recurrent/transient status can be seen as a status for one communication class (and not just one state in that class).

So, it makes sense to say that “a communication class is recurrent (or transient)”. This means that all the states in this class are recurrent (or transient).

Side Note: An MC with finite states must have at least one recurrent class. Why? Since there are only finitely many states, as time goes to infinity, at least one state would be visited infinite time (if all of them have E[NiX0=i]E[N_i|X-0=i] being a finite value then the sum of all of them would be a finite value, but given infinite time, we cannot be visiting ALL finitely many states only finite times). Think, if all of them are transient where would the flow go after a very long time? It has to go to some state (which means that even in the long run, that state is still being revisited, i.e., it is recurrent)

Proof

Let’s prove claims (1) and (2) above.

Since ii and jj can communicate, there must exist some integers nn and mm such that:

Pij(n)>0,Pji(m)>0P_{ij}^{(n)} > 0, \quad P_{ji}^{(m)} > 0

where Pij(n)P_{ij}^{(n)} denotes the nn-step transition probability from ii to jj. This can be visualized using the diagram below:

Untitled

Consider any v0v \geq 0. Then,

Pjj(m+v+n)Pji(m)Pii(v)Pij(n)P_{jj}^{(m+v+n)} \geq P_{ji}^{(m)}P_{ii}^{(v)}P_{ij}^{(n)}

The above equation (RHS) is basically saying that we can go from jj to ii (in mm steps), move around from ii to any other states until we return to ii in vv steps, and then finally, we take the path from ii to jj in nn steps. The LHS is “total probability of returning to jj in m+n+vm+n+v steps. Clearly, the RHS has a stricter constraint that the first mm steps and last nn steps lead to (and come from) state ii. So, the event described by the RHS is a subset of the RHS (i.e., the RHS has more such paths (e.g. even if we stay at jj for m+n+vm+n+v steps, that is a valid path for LHS, but NOT RHS). Hence, it’s an inequality (\geq), not an equality.

Untitled

We can sum the above equation over v=0,1,1,2v=0,1,1,2 \cdots on both sides (since it must hold for all v0v \geq 0). Then,

v=0Pjj(m+v+n)v=0Pji(m)Pii(v)Pij(n)=Pji(m)(v=0Pii(v))Pij(n)\sum_{v=0}^\infty P_{jj}^{(m+v+n)} \geq \sum_{v=0}^\infty P_{ji}^{(m)}P_{ii}^{(v)}P_{ij}^{(n)} = P_{ji}^{(m)}\left(\sum_{v=0}^\infty P_{ii}^{(v)}\right)P_{ij}^{(n)}

Also,

w=0Pjj(w)v=0Pjj(m+v+n)\sum_{w=0}^\infty P_{jj}^{(w)} \geq \sum_{v=0}^\infty P_{jj}^{(m+v+n)}

since the RHS is a subset of LHS (and it’s possible for there to be other paths from jj to jj with length <m+n< m+n.

So, combining the 2 equations, we have:

w=0Pjj(w)Pji(m)(v=0Pii(v))Pij(n)\sum_{w=0}^\infty P_{jj}^{(w)} \geq P_{ji}^{(m)}\left(\sum_{v=0}^\infty P_{ii}^{(v)}\right)P_{ij}^{(n)}

When state ii is recurrent, v=0Pii(v)=\sum_{v=0}^\infty P_{ii}^{(v)} = \infty (by definition). Then, since Pji(m)P_{ji}^{(m)} and Pij(n)P_{ij}^{(n)} are non-zero probabilities, we have: w=0Pjj(w)=\sum_{w=0}^\infty P_{jj}^{(w)} = \infty, which means that even jj is recurrent.

With a similar analysis, we can show that if jj is transient, then ii is transient too (and vice versa too).


Reducible Chains

For reducible chains, there are multiple classes:

  • For a transient class, in the long run, the probability that the flow/process stays in this class (or ever returns back to this class after a very long time) is zero.
  • For a recurrent class, when there are finite states, there must be a least one recurrent class (otherwise, where will the process go after a long time?). If CkC_k is a recurrent class, then Pij=0P_{ij}=0 for any iCki \in C_k and jCkj \notin C_k (there are NO outgoing edges from a recurrent class. why? because fii1P(X1=jX0=i)f_{ii} \leq 1-P(X_1=j|X_0=i) which would mean that fii<1f_{ii} < 1 and so ii would be transient, contradicting our assumption that CkC_k is recurrent). So, CkC_k is kind of like an absorbing class (but within the class itself, the process can move from one state to another, without ever leaving this class).

We can figure out the probability of “entering” the recurrent class (using first-step analysis) and since the process cannot leave this class ever again, it makes our calculations simlper. Think of this as “collapsing” all the “similar” states into one “group’ and then dealing with the probability that the process moves from one group to another (instead of one state to another). For most cases, we’ll have much fewer communication classes than states so it makes our calculations in FSA easier.

Hence, given X0=iX_0=i, we can find the probability that the process stays in each class.

Example

Let’s work through an example to understand this better.

Suppose, we’re interested in calculating the long-run performance of the following markov chain (note that we’ll have to consider multiple initial states)

Untitled

Clearly, we have the following communication classes:

  • {1,2}\{1, 2\} → recurrent class (no outgoing edges)
  • {3}\{3\} → recurrent class (since f33(1)=1f_{33}^{(1)}=1)
  • {4,5}\{4, 5\} → transient class (since f441P(X1=3X0=4)=0.6<1f_{44} \leq 1-P(X_1=3|X_0=4) = 0.6 < 1 → this works because once we go from 44 to 33, we can never return back to 44 even with infinite time)

Obviously, if we start from a recurrent class, the process will never leave the class. And that’s not very interesting.

What we’re more interested in finding out is the probaility of being absorbed in class {1,2}\{1,2\} and the probability of being absorbed in class {3}\{3\} if we start from a transient state (either 44 or 55).

To solve this, we can create a “new” markov chain with the states in the same absorption class collapsed together to form a single “state”. We can’t collapse 44 and 55 together because they are the initial states we’re interested in (and the choice of initial state affects which recurrent class the process ends up in, in the long run) and so, they’re not “equivalent”.

Let’s call the class with states 11 and 22 to be class AA. Then, we know that (since there are only 2 absorbing classes in this example), probability of being absorbed in class AA = 11- probability of being absorbed at state 33. That is, finding one is equivalent to finding the other too.

Untitled

We’re only concerned with which class the process ends up going to (once it enters AA or 33, it cannot escape). So, we can treat the time when the process enters an absorbing class to be the stopping time (for now, we’re not concerned with what happens inside the absorbing class itself).

In other words, let T=min{n:Xn{A,3}}T= min\{n: X_n \in \{A,3\}\} be the stopping time.

Now, we can apply first step analysis on this “new markov chain”.

Let us define ui=P(XT=AX0=i)u_i = P(X_T = A|X_0=i) for i=4,5i=4,5. According to FSA, we have:

uA=1u3=0u4=0.4u3+0.4u4+0.2u5u5=0.2uA+0.2u4+0.6u5\begin{equation*} \begin{split} u_A&=1 \\ u_3 &= 0 \\ u_4 &= 0.4u_3 + 0.4u_4 + 0.2u_5 \\ u_5 &= 0.2u_A + 0.2u_4 + 0.6u_5 \end{split} \end{equation*}

Solving it, we get: u4=0.2,u5=0.6u_4=0.2, u_5=0.6.

This means that if we start from state 44, there is a 0.20.2 probability that we end up in class AA (and a 0.80.8 probability that we end up in state 33) in the long run, and similarly if we start from state 55.

Intuitively, this should make sense because if we’re at 44, we’re more likely to go to state 33 than to state 55 and the only way to reach class AA from state 44 is through state 55. So, probability of being absorbed at state 33 is greater than probbaility of being absorbed at class AA (and also higher than probability of going from 454 \to 5 itself 😲).

We can summarize all this as such: when nn \to \infty, we have:

P(XnCAX0=4)0.2,P43(n)0.8,P44(n)0,P45(n)0P(X_n \in C_A|X_0=4) \to 0.2, \quad P_{43}^{(n)} \to 0.8, \quad P_{44}^{(n)} \to 0, \quad P_{45}^{(n)} \to 0

and,

P(XnCAX0=5)0.6,P53(n)0.4,P54(n)0,P55(n)0P(X_n \in C_A|X_0=5) \to 0.6, \quad P_{53}^{(n)} \to 0.4, \quad P_{54}^{(n)} \to 0, \quad P_{55}^{(n)} \to 0
note

Even when considering states in the same transient class as being the initial states, the long-run probabilities for entering the different absorbing classes will be different, so we cannot “merge” them (i.e., we cannot treat them to be the “same”). In other words, the probabilities for entering the different absorbing classes will be different depending on which transient state we start from, even if the initial states from the same communication (transient) class. Hence, the probability of entering the different absorption classes is a function of the initial state, NOT the initial class.

Gambler's Ruin with No Quitting

Imagine a greedy gambler who doesn't set any quitting time for himself, i.e., he only stops playing when he goes broke. Then, we're interested to know the expected number of games he plays before going broke (or does he play "forever" with non-zero probability?). (More formally, we're interested in the limit as NN \to \infty)

Consider the symmetric case where p=q=12p = q = \frac{1}{2}.

Clearly, the communication classes are: {0}\{0\} and {1,2,}\{1,2, \cdots\}.

Since 0 is an absorbing state, it must have E[Ni]=E[N_i] = \infty and it is recurrent (consider that there is a non-zero probability of getting absorbed starting from any finite state).

Now, consider f11f_{11}. Clearly, f11q<1f_{11} \leq q < 1 (because once we go broke, we can never recover and hence, we can never revisit 1). This means that the communication class {1,2,}\{1,2, \cdots\} is transient. Hence, E[Ni]<E[N_i] < \infty for all i1i \geq 1.

But does this mean that the gambler is certain to go broke? Can we argue that "since all other states are transient, they can only be visited a finite number of times (in expectation) and so, for a large enough nn, the process has to leave these transient states and eventually the gambler will go broke?"

The answer is no. Why? Although we have finite E[Ni]E[N_i] for all i1i \geq 1, we have an inifnite number of states. So, the time the process can spend in this "transient" class can be infinity (we're summing an infinite series where each term itself is finite). Intuitively, the process can stay at every state for a short time and it will still have enough (inifnitely many!) remaining states it can visit in the transient class.

Hence, although only 00 has E[N0]=E[N_0] = \infty, it is possible that E[T]=E[T] = \infty.

Furthermore, when p>qp > q, we have: P(going broke)<1P(\text{going broke}) < 1 (since the process "diverges" away from zero and wanders towards infinity in the long run). In other words, if the gambler is lucky (or is cheating), and he doesn't set any quitting rule for himself, then in the long run, we can expect him to have more and more money.

Note that when pqp \leq q, the probabiilty of the gambler going broke (i.e., being ruined) is 11 (you are guaranteed to go broke eventually). When p>qp > q, the probability of the gambler going broke is qp\frac{q}{p}. A proof can be be found here.

Summary of Gambler's Ruin with No Quitting Rule

The following results hold for any finite initial state:

  1. If p=qp = q,
    • P(ruin)=1P(\text{ruin}) = 1
    • E[T]=E[T] = \infty
  2. If p>qp > q,
    • P(ruin)=qpP(\text{ruin}) = \frac{q}{p}
    • E[T]=E[T] = \infty
  3. If p<qp < q,
    • P(ruin)=1P(\text{ruin}) = 1
    • E[T]<E[T] < \infty

In words, we can express the above results as such:

  • If you're playing a fair game, you're guaranteed to go broke eventually (and in expectation you'll have to play forever to go broke). These two results may feel "contradictory" and "counter-intuitive" but are nevertheless both true.
  • If you're playing a game where the odds are in your favor, you'll be playing an infinite number of games. Why? Because on average, you gain p(1p)=2p1>0p - (1-p) = 2p -1 > 0 dollars on every bet. So, in the long-run, you're moving in the positive direction (away from the "broke" state, and towards "infinite money") and there's no absorbing state on that "end" (i.e., there is no end). Given infinite time, you'll be playing an infinite number of games (and you'll be making an infinite amount of money).
  • If the odds are against you, the expected loss per bet is (1p)p=12p(1-p) - p = 1 - 2p and so, in expectation, you'll go broke within a finite number of steps (you have a finite amount of money and you lose a finite amount of money on average per bet, then obviously you're moving towards "broke" state in every round). More precisely, if you start with kk dollars, you're expected to go broke in k12p\frac{k}{1-2p} steps. And the probability of going broke is 11.
info

For a more rigorous treatment, you can refer to this.