Skip to main content

Generalising First-Step Analysis

In the previous week, we were able to answer specific strategy-related questions about a specific instance of the gambler’s ruin problem. But we don’t want to perform the analysis each time we’re given a different example (i.e., different parameters, or initial conditions): so, can we come up with a more general solution?

Recall that for all the 3 questions, we have a quantity of interest, usually related to the stopping time TT and initial state X0=iX_0=i.

We can express all the 3 quantities of interest (and more) to be of the general form:

ai=E[n=0Tg(Xn)X0=i]a_i = E[\sum_{n=0}^T g(X_n)|X_0=i]

where aia_i is our quantity of interest, and gg is a function from SRS \to R

note

Note that the probability P(XT=jX0=i)P(X_T=j|X_0=i) where jj is any absorbing state is also a special kind of expectation where we can define gg to be the indicator function IjI_j which is “on” (1) when XTX_T takes the value jj and “off” (0) otherwise, i.e.,

P(XT=jX0=i)=E[Ij(XT=j)X0=i]P(X_T=j|X_0=i) = E[I_j(X_T=j)|X_0=i]

We can also write it as the following (because we know that TT is the stopping time and the indicator value will always be 00 for all values of n<Tn < T since otherwise, it would contradict the definition of TT):

P(XT=jX0=i)=E[n=0TIj(Xn=j)X0=i]P(X_T=j|X_0=i) = E[\sum_{n=0}^T I_j(X_n=j)|X_0=i]

The above expectation is taken over all possible values of TT (remember that TT itself is a random variable)

The above equations can be thought of as a sort of random sum (since the number of terms to be summed is itself a random variable, and each of the term is also a random variable).

So, if we’re intersted in the quantity ai=E[n=0Tg(Xn)X0=i]a_i = E[\sum_{n=0}^T g(X_n)|X_0=i], we can use the following steps:

  1. Define S|S| terms ai=E[n=0Tg(Xn)X0=i]a_i = E[\sum_{n=0}^T g(X_n)|X_0=i] for each iSi \in S (not just the particular aia_i that we care about, but ALL of them → since we need ALL of the rest of the values to find any particular value - by solving a system of linear equations)

  2. Then, apply the law of total expectation to aia_i, conditional on X1X_1:

    ai=kS((g(i)+E[n=1Tg(Xn)X0=i,X1=k])P(X1=kX0=i))a_i= \sum_{k \in S}\left (\left( g(i) + E\left [\sum_{n=1}^T g(X_n)|X_0=i, X_1=k \right ] \right)P(X_1=k|X_0=i) \right)

    That is, we look at all possible steps we can take in the first move and calculate accordingly → one-step transition (we consider the transition from iki \to k (for every possible value of kk) when trying to calculate aia_i). Notice that when we reach state kk, the time t=1t=1 so the sum is from n=1T\sum_{n=1}^T.

    And we need to perform this step (equation) to find aia_i for every value of ii

  3. We already know the value of P(X1=kX0=i)=pikP(X_1=k|X_0=i) = p_{ik} from the one-step transition matrix.

  4. Consider the process {Yn}\{Y_n\} defined to be Yn=Xn+1Y_n=X_{n+1}. It is stochastically equivalent (same probabilistic structure, only differing in the time at which they start) with {Xn}\{X_n\}. Further by Markovian property,

    E[n=1Tg(Xn)X0=i,X1=k]=E[n=0TYg(Yn)Y0=k]=akE\left[\sum_{n=1}^T g(X_n)|X_0=i,X_1=k\right] = E\left[\sum_{n=0}^{T_Y}g(Y_n)|Y_0=k \right] = a_k
  5. Combining our results from step (3) and (4) and inserting it into the equation in step (2), we get:

    ai=kS(g(i)+ak)pika_i = \sum_{k \in S}(g(i) + a_k)p_{ik}

    Note that the above is true only when aia_i is not an absorbing state. When aia_i itself is an absorbing state, then we can simply write it as (since the stopping time T=0T=0):

    ai=g(i)a_i = g(i)
  6. We can set up a system of linear equations about aia_i for all values of iSi \in S, solve them and obtain the result.

The above is the general procedure to follow when trying to answer any strategy related questions.

The only hard part is to define the function gg correctly in order to answer the question.

For example:

  • To find P(XT=0X0=3)P(X_T=0|X_0=3), we can use the I0I_0 indicator function that takes on the value 11 when the argument is 00, and 00 otherwise). In other words,
    P(XT=0X0=3)=E[n=0TI{Xn=0}X0=3]P(X_T=0|X_0=3) = E[\sum_{n=0}^T I\{X_n=0\}|X_0=3]
    So, g(x)=1g(x) = 1 when x=0x=0, and 00 otherwise.
  • To find E[TX0=3]E[T|X_0=3], we can set g(x)=1g(x)= 1 for all values of x=0,1,2,3,4x=0,1,2,3,4. Why? Because each transition is of unit “cost” (in time). Notice that this will give us T+1T+1 (since we’re summing T+1T+1 terms from 00 to TT), so we need to remember to subtract 11 (that’s because we’re counting the initial state as also being a “cost” in terms of time, but actually it starts at time = 0)

Let’s try to use this procedure to answer a slightly more complicated question:

Q. What is the expected number of times that the gambler reaches state 22 in his entire gambling journey if he starts with X0=3X_0=3?

We can write this as E[i=0TI{Xi=2}X0=3]E[\sum_{i=0}^T I\{X_i=2\}|X_0=3]

For this question, g(x)=1g(x) = 1 when x=2x=2, and 00 otherwise, i..e, g(Xn)=I{Xn=2}g(X_n) = I\{X_n=2\}.

Then, we set up wi=i=04(g(i)+ak)pikw_i = \sum_{i=0}^4 (g(i)+a_k)p_{ik} for all values of ii.

We get the following linear system:

w0=0w1=13w2+23w0w2=1+13w3+23w1w3=13w4+23w2w4=0\begin{equation*} \begin{split} w_0 &= 0 \\ w_1 &= \frac{1}{3}w_2 + \frac{2}{3}w_0 \\ w_2 &= 1 + \frac{1}{3}w_3 + \frac{2}{3}w_1 \\ w_3 &= \frac{1}{3}w_4 + \frac{2}{3}w_2 \\ w_4 &= 0 \end{split} \end{equation*}

We can easily solve the above system.

info

Notice that for the absorbing states, i.e., 00 or 44, the gambler stops the game at t=0t=0 (doesn't even play any games). So, the number of times he passes through 22 if he starts at 00 or 44 is 00. These are the boundary cases (base cases).

Generally, we don’t consider the number of times of passing 0 or 4 because it equals to the probability that it is trapped in 0 or 4. This degenerates to the first question (about the probability of reaching a particular absorption state) → the only possible time that the gambler reaches 0 is if he goes broke and the only possible time that the gambler reaches 4 is if he wins the entire game and quits. The reason is that these are absorbing states and once the gambler reaches either of these states, he cannot escape, and the game is over (we don’t count the same state infinite times even though he’s stuck there forever because we just “end” the game there). That is, he visits 0 exactly once if he goes broke, and 4 exactly once if he wins the game. So, the expectation of visiting 0 (or 4) is equal to the probability of going broke (or winning the game).