Generalising First-Step Analysis
In the previous week, we were able to answer specific strategy-related questions about a specific instance of the gambler’s ruin problem. But we don’t want to perform the analysis each time we’re given a different example (i.e., different parameters, or initial conditions): so, can we come up with a more general solution?
Recall that for all the 3 questions, we have a quantity of interest, usually related to the stopping time and initial state .
We can express all the 3 quantities of interest (and more) to be of the general form:
where is our quantity of interest, and is a function from
Note that the probability where is any absorbing state is also a special kind of expectation where we can define to be the indicator function which is “on” (1) when takes the value and “off” (0) otherwise, i.e.,
We can also write it as the following (because we know that is the stopping time and the indicator value will always be for all values of since otherwise, it would contradict the definition of ):
The above expectation is taken over all possible values of (remember that itself is a random variable)
The above equations can be thought of as a sort of random sum (since the number of terms to be summed is itself a random variable, and each of the term is also a random variable).
So, if we’re intersted in the quantity , we can use the following steps:
Define terms for each (not just the particular that we care about, but ALL of them → since we need ALL of the rest of the values to find any particular value - by solving a system of linear equations)
Then, apply the law of total expectation to , conditional on :
That is, we look at all possible steps we can take in the first move and calculate accordingly → one-step transition (we consider the transition from (for every possible value of ) when trying to calculate ). Notice that when we reach state , the time so the sum is from .
And we need to perform this step (equation) to find for every value of
We already know the value of from the one-step transition matrix.
Consider the process defined to be . It is stochastically equivalent (same probabilistic structure, only differing in the time at which they start) with . Further by Markovian property,
Combining our results from step (3) and (4) and inserting it into the equation in step (2), we get:
Note that the above is true only when is not an absorbing state. When itself is an absorbing state, then we can simply write it as (since the stopping time ):
We can set up a system of linear equations about for all values of , solve them and obtain the result.
The above is the general procedure to follow when trying to answer any strategy related questions.
The only hard part is to define the function correctly in order to answer the question.
For example:
- To find , we can use the indicator function that takes on the value when the argument is , and otherwise). In other words,So, when , and otherwise.
- To find , we can set for all values of . Why? Because each transition is of unit “cost” (in time). Notice that this will give us (since we’re summing terms from to ), so we need to remember to subtract (that’s because we’re counting the initial state as also being a “cost” in terms of time, but actually it starts at time = 0)
Let’s try to use this procedure to answer a slightly more complicated question:
Q. What is the expected number of times that the gambler reaches state in his entire gambling journey if he starts with ?
We can write this as
For this question, when , and otherwise, i..e, .
Then, we set up for all values of .
We get the following linear system:
We can easily solve the above system.
Notice that for the absorbing states, i.e., or , the gambler stops the game at (doesn't even play any games). So, the number of times he passes through if he starts at or is . These are the boundary cases (base cases).
Generally, we don’t consider the number of times of passing 0 or 4 because it equals to the probability that it is trapped in 0 or 4. This degenerates to the first question (about the probability of reaching a particular absorption state) → the only possible time that the gambler reaches 0 is if he goes broke and the only possible time that the gambler reaches 4 is if he wins the entire game and quits. The reason is that these are absorbing states and once the gambler reaches either of these states, he cannot escape, and the game is over (we don’t count the same state infinite times even though he’s stuck there forever because we just “end” the game there). That is, he visits 0 exactly once if he goes broke, and 4 exactly once if he wins the game. So, the expectation of visiting 0 (or 4) is equal to the probability of going broke (or winning the game).