Skip to main content

Poisson Process

Until now, we have been considering only discrete-time markov chains. Here, we relax that assumption and talk about a continuous-time markov chain.

Motivation

In the real world, we often have to count the frequency of occurences of an event in a given time interval. For example:

  • Number of flight passengers arrived until time tt
  • Number of vehicles which passed a given location, until time tt

We may also use this “counting” process to make decisions. For example, if we want to determine whether we need to build more ICU units in the hospital, we can ask the following questions:

  • How many new patients will need ICU until every time point tt?
  • How long will every patient need an ICU for?

And using the above information, we can come to a conclusion.

In the above examples, the state space is S={0,1,2,,}S= \{0,1,2, \cdots, \} (discrete).

Let X(t)X(t) denote the quantity of interest until time tt, then t>0t >0 is continuous (time index).

We’re only going to look at a special case of the continuous-time, discrete-state process: the Poisson process.

The Poisson Distribution

We begin with a fixed t=t0t=t_0. In the time interval (0,t0)(0,t_0), we’re interested to know how many evens happen?

For such a problem, we know the poisson distribution if a good approximation. Why? We can explain it as follows:

  • Because we can decompose (0,t0)(0,t_0) into many many small disjoint intervals. Then, on each interval (t0(i/n),t0(i+1)/n))(t_0*(i/n), t_0*(i+1)/n)), the number of events follows Bernoulli(pn)Bernoulli(p_n). Why? Because the intervals are so tiny that it’s almost impossible for 2 events to happen in that time. It can either be 0/1, which is precisely what a BernoulliBernoulli distribution can model. And we write “pnp_n” because it clearly depends on nn → if nn is large (so we have smaller intervals), then pnp_n is (naturally) smaller and vice versa.
  • We can sum together all the small intervals and then the number of events follows Binomial(pn)Binomial(p_n) (assuming that all intervals are independent, and pnp_n is the same for every interval). When nn \to \infty and pn0p_n \to 0 such that their product npnλnp_n \to \lambda, then Poisson(λ)Poisson(\lambda) is a good approximation of Binomial(n,pn)Binomial(n,p_n).
definition

By definition, a random variable XX is said to have a Poission distribution with parameter λ>0\lambda >0, denoted by XPoi(λ)X \sim Poi(\lambda), if its probability mass function is defined by:

p(x)=eλλxx!, x=0,1,2,p(x) = \frac{e^{-\lambda} \lambda^x}{x!},\ x=0,1,2, \cdots

We can show that the XPoi(npn)X \sim Poi(np_n) is a good approximation of YBinomial(n,pn)Y \sim Binomial(n, p_n) as follows:

  1. When nn \to \infty, given xx, using the definition of binomial distribution we have:
P(Y=x)=(nx)pnx(1pn)nx=n!(nx)!x!pnx(1pn)nxP(Y=x) = \binom{n}{x} p_n^x(1-p_n)^{n-x} = \frac{n!}{(n-x)!x!}p_n^x(1-p_n)^{n-x}
  1. As nxn \gg x, both n!n! and (nx)!(n-x)! are very large. We can use stirling’s approxiamation which says that for large values of nn, we have: n!2πn(n/e)nn! \sim \sqrt{2\pi n} (n/e)^n. The ratio (nx)/n\sqrt{(n-x)}/\sqrt{n} is approximately 11 for large enough nn (i.e., they cancel out). Hence,

    P(Y=x)(ne)nx!(nxe)nxpnx(1pn)nxP(Y=x) \approx \frac{(\frac n e)^n}{x!(\frac{n-x}{e})^{n-x}}p_n^x(1-p_n)^{n-x}
  2. Also, for large nn, we can approximate nn(nx)nxnnnnx=nx\frac{n^n}{(n-x)^{n-x}} \approx \frac{n^n}{n^{n-x}} = n^x.

  3. At the same time, we recall that the binomial expansion of (1x)n(1-x)^n is given by:

    (1x)n=1nx+(n2)x2(n3)x3+(1-x)^n = 1-nx + \binom{n}{2}x^2 - \binom{n}{3}x^3 + \cdots

    In our equation, pn0p_n \to 0 and so, all higher order terms of pnp_n in the expansion will be even smaller (and we can ignore them).

  4. Combining the above simplifications, and assuming that npn=λnp_n = \lambda is a moderate constant we can write:

    P(Y=x)eλλxx!P(Y=x) \approx e^{-\lambda}\frac{\lambda^x}{x!}

    which is the PMF for the poisson distribution.

Some Remarks

  1. For the poisson distribution, the mean is λ\lambda and the variance is also λ\lambda (wow!)

  2. The probability generating function PGF for Poisson Distribution is ϕX(t)=exp[λ(t1)]\phi_X(t)=\exp[\lambda(t-1)]. We can use the PGF to prove the properties below as well.

  3. If we have 2 independent poisson variables, XPoi(λ1)X \sim Poi(\lambda_1) and YPoi(λ2)Y \sim Poi(\lambda_2), then (X+Y)Poi(λ1+λ2)(X+Y) \sim Poi(\lambda_1 + \lambda_2). Moreover this can be generalised to nn independent variables too.

    The above result can be understood quite intuitively. We can see view this from two perspectives:

    • Variables measuring different events in the same time period → Let’s say we have two independent random Poisson variables for requests received at a web server in a day: XX = number of requests from humans/day, XPoi(λ1)X ∼ Poi(λ_1) and YY = number of requests from bots/day, YPoi(λ2)Y ∼ Poi(λ_2). Since the convolution (aka sum) of Poisson random variables is also a Poisson we know that the total number of requests (X+Y)(X + Y) is also a Poisson: (X+Y)Poi(λ1+λ2)(X + Y) ∼ Poi(λ_1 + λ_2).
    • Variables measuring the same event in different time periods → If XPoi(λ)X \sim Poi(\lambda) is the number of events in a time interval of length tt, then YPoi(nt)Y \sim Poi(nt) is the number of (the same) events in an interval of length ntnt.
  4. If XPoi(λ)X \sim Poi(\lambda) and ZXBinomial(X,r)Z|X \sim Binomial(X, r) then ZPoi(λr)Z \sim Poi(\lambda r).

    Intuition: XX counts the frequency of events in the interval. rr adds another “thinning” factor to determine whether we should really count it or not. It’s a two-stage process. Think of it as taking only rr fraction of the actual events     \iff for every event that occurs, it has probability rr of being counted by ZZ.

    Example: say we want to count the number of buses passing a given location, λ\lambda is the average number of vehicles per hour, and rr is the probability of a vehicle being a bus, then we can count the total number of buses per hour by using ZZ (as defined above using the two-stage process). Clearly, it is equivalent to defining a new variable YPoi(λr)Y \sim Poi(\lambda r) where λr\lambda r is the average number of buses crossing the location per hour.

    Read a great explanation here 🎉

Proof of (4) using PGF

We will show that the ZZ must be Poi(λp)Poi(\lambda p) by showing that its PGF is of that form.

  1. The PGF for ZZ is (by definition of PGF, not using the fact that ZZ is poisson since that is what we’re trying to prove):

    ϕZ(t)=E[E[tZX]]\phi_Z(t) = E[E[t^Z|X]]

    Also, recall that the PGF for binomial distribution XBin(n,p)X \sim Bin(n,p) is ϕX(t)=(1p+pt)n\phi_X(t)=(1-p+pt)^n.

  2. Given XX, ZZ is a binomial variable so we can write:

    ϕZ(t)=E[(1p+pt)X]\phi_Z(t) = E[(1-p+pt)^X]

    where the expectation is over XX.

  3. Moreover, ϕX(t)=E[tX]\phi_X(t)=E[t^X]. So, the RHS of the above equation is just ϕX(1p+pt)\phi_X(1-p+pt).

  4. Then,

    ϕZ(t)=ϕX(1p+pt)\phi_Z(t) = \phi_X(1-p+pt)
  5. Now, we know that XPoi(λ)X \sim Poi(\lambda) and so, ϕX(t)=exp[λ(t1)]\phi_X(t) = \exp[\lambda(t-1)]. Using this in the above equation, we get:

    ϕZ(t)=exp[λ(1p+pt1)]=exp[λp(t1)]\phi_Z(t)=\exp[\lambda(1-p+pt-1)] = \exp[\lambda p(t-1)]
  6. It is clear that exp[λp(t1)]\exp[\lambda p(t-1)] is the PGF of a Poi(λp)Poi(\lambda p) variable. Hence, ZZ is a Poi(λp)Poi(\lambda p) variable. QED

The Poisson Process

There are 3 ways we can choose to define the poisson process, and we discuss all of them (for the sake of understanding it better).

Def 1: Defining from the Poisson Distribution

The poisson distribution gives us the number of events in a fixed time interval (0,t0)(0,t_0).

Now, we want to be able to find the number of events for any time point t0>0t_0>0.

We hope that our process satisfies the following:

  • At each time point, X(t)=X(t) = #events in (0,t)(0,t) follows a poisson distribution.
  • On disjoint time points, the number of events is are independent (so that their sum is still poisson).
  • On different time intervals, the parameters are somewhat related (so that they’re describing a “process”, and not just a collection of random variables)

This is precisely the definition of the poisson process.

definition

💡 A poisson process with rate, or intensity, λ>0\lambda >0 is an integer-valued stochastic process {X(t):t0}\{X(t): t \geq 0 \} for which

  1. For any time points t0=0<t1<t2<<tnt_0=0<t_1<t_2<\cdots<t_n, the process increments

    X(t1)X(t0),X(t2)X(t1),,X(tn)X(tn1)X(t_1)-X(t_0), X(t_2)-X(t_1), \cdots, X(t_n)-X(t_{n-1})

    are independent random variables.

  2. For s0s \geq 0 and t>0t > 0, the random variables X(s+t)X(s)Poi(λt)X(s+t)-X(s) \sim Poi(\lambda t)

  3. X(0)=0X(0)=0

Basically, the distribution of the number of events on any time interval (t1,t2)(t_1,t_2) follows a Poi(λ(t2t1))Poi(\lambda (t_2-t_1)) distribution. And this λ\lambda is the same for all time intervals. That is, λ\lambda is the common/shared rate/intensity parameter.

We define λ\lambda to be the average number of events in one time unit (i.e., when t=1t=1, we have that the interval of unit time follows Poi(λ)Poi(\lambda)λ\lambda events on average in unit time).

We can interpret the common λ\lambda as such: the distribution of number of events in a given time interval does not depend on the starting time, but only on the length of the interval itself! If we have two time intervals [a,b][a,b] and [c,d][c,d] such that ba=dc=tb-a=d-c = t, then the dsitribution on both the intervals follows Poi(λt)Poi(\lambda t).

In particular, we also have X(t)Poi(λt)X(t) \sim Poi(\lambda t) distribution for every fixed time point tt because we define X(t)=X(t)X(0)X(t) = X(t) - X(0) for all t>0t > 0.

It’s actually quite intuitive → the number of events in (t1,t2)(t_1,t_2) will be independent of that in (t2,t3)(t_2, t_3) because they are disjoint.

Moreover, the average number of events in any time interval of length tt increases linearly with tt, i.e., E[X(t)]=λtE[X(t)] = \lambda t (and so is the variance).

Another corollary of the “common rate” property of a poisson process is that, the distribution remains unchanged when you shift the time point by any amount. For example, you start measuring the number of vehicles passing a given location at 8am, and they follow a poisson process (denoted by PPPP) with parameter λ\lambda. Suppose another observer also comes to the location at 10am and starts counting vehicles - of course, his “process” is going to be different, but it will follow the same probabilistic structure of PP(λ)PP(\lambda) because nothing has changed except the “starting time”, and the poisson process doesn’t depend on that (because of the common rate λ\lambda throughout all eternity tt).

This would not be true if the rate parameter changes over time → it’s more realistic that you see more cars from 8am - 10am since people are going to work and then, there are fewer cars from - 5pm, then more cars again from 5pm to 8pm since people are coming back from work. In such a scenario, λ\lambda itself varies with tt, and we do not have a poisson process (and here, the starting time definitely matters). We may still have poisson distributions on separate intervals (e.g. 8-10am, 10am-5pm, 5pm-8pm) but every interval enjoys its own λt\lambda_t.

Def 2: Building the process by small time intervals and the law of rare events

We now look at the poisson process from another viewpoint (to find another intuitive explanation).

Recall that we care about the poisson distribution mainly because it can be seen as an approximation of Bin(n,pn)Bin(n,p_n).

And, we like Bin(n,pn)Bin(n,p_n) because we can decompose (0,t0)(0,t_0) (for a fixed t0t_0) into nn disjoint time intervals and then, what happens in each time interval is independent with other intervals. In each time interval, the number of events Bernoulli(pn)\sim Bernoulli(p_n). This pnp_n is very small (this is the assumption we make when we use the poisson approximation, pn0,np_n \to 0, n \to \infty) so small that we do not need to consider that there can be more than 1 event in that time interval. These events are called rare events.

The poisson process has a similar setup of rare events.

theorem

Law of Rare Events

Let ϵ1,ϵ2,\epsilon_1, \epsilon_2, \cdots be independent Bernoulli random variables where P(ϵi=1)=piP(\epsilon_i=1)=p_i, and let Sn=ϵ1+ϵ2++ϵnS_n = \epsilon_1 + \epsilon_2 + \cdots + \epsilon_n. The exact probability for SnS_n and the poisson probability with λ=p1+p2++pn\lambda = p_1 + p_2 + \cdots + p_n differ by at most

P(Sn=k)eλλkk!i=1npi2\left|P(S_n=k)- \frac{e^{-\lambda}\lambda^k}{k!}\right| \leq \sum_{i=1}^n p_i^2

Remarks

  • Notice that the pip_i’s can be different → each bernoulli variable enjoys its own parameter pip_i.
  • The above theorem asssures us that our poisson approximation to the binomial is “good enough” when the pip_i’s are uniformly small enough. For instance, when p1=p2==pn=λ/np_1 = p_2 = \cdots = p_n=\lambda/n, the right side reduces to λ2/n\lambda^2/n, which is small when nn is large.
  • Basically, the law of rare events states that if the process satisfies the rarity (small probability of event occuring on an interval), then its close to the poisson distribution.

We can then define a poisson process using the law of rare events as such:

Let N((s,t])N((s,t]) be a RV counting the number of events occurring in the interval (s,t](s,t]. Then, N((s,t])N((s,t]) is a poisson point process of intensity λ>0\lambda >0 if

  1. For any time points t0=0<t1<t2<<tnt_0=0 < t_1 < t_2 < \cdots < t_n, the process increments

    N((t0,t1]),N((t1,t2]),,N((tn1,tn])N((t_0,t_1]), N((t_1,t_2]), \cdots, N((t_{n-1},t_n])

    are independent RVs.

  2. There is a positive constant λ\lambda for which the probability of at least one event happening in a time interval of length hh is

    P(N((t,t+h])1)=λh+o(h),h0P(N((t,t+h]) \geq 1) = \lambda h+o(h), \quad h \to 0

    where o(h)o(h) is any function that goes to zero faster than h0h \to 0, i.e., limh0o(h)h=0\lim_{h \to 0} \frac{o(h)}{h}=0 . For example, o(h)=2h2o(h)=2h^2 will always be smaller than hh as h0h \to 0.

    note

    Recall this is essentially the same definition (but reversed 😲) as little-O notation covered in CS3230 (but there, we defined it to mean that o(n)o(n) is a function that grows slower than nn when nn \to \infty)

  3. The probability of 2 or more events occuring in an interval of length hh is o(h)o(h) as h0h \to 0 (and this called a rare event)

Basically, we can write it as such: as h0h \to 0,

P(N((t,t+h])=k)={1λho(h), k=0λh, k=1o(h), k2P(N((t,t+h]) = k) = \begin{cases} 1-\lambda h-o(h), \ k=0 \\ \lambda h, \ k =1 \\ o(h), \ k \geq 2 \end{cases}

And under the limit h0h \to 0, we can see that majority of the probability is concentrated at k=0k=0, i.e., it is very very unlikely for any event to occur in such a small time interval.

This definition is more useful in practice - given a set of data, we can easily check whether it satisifies the above properties of rare events. It’s much harder (and difficult to justify) why you think the data has a poisson distribution (so the first definition is quite hard to apply to real world data).

Def 3: Relationship Between Poisson and Exponential Distribution

Now we have introduced 2 definitions of the poisson process.

But can we simulate a poisson process according to the current results? It’d be very difficult. How would we simulate “rare events” or “Poi(λt)Poi(\lambda t)” for any t>0t > 0? This is imposisble because tt is continuous! (how would you “record” the path of the poisson process for every value of tt in a computer when tt is continuous - that’s the difficulty).

We have to explore more properties of the process to propose a reasonable simulation algorithm.

Observe that for a poisson process, to record the entire path (even on continuous tt), it suffices to only record the values of tt at which an event occurs! 😲 Knowing when the events occur is enough to completely describe a poisson process.

So, we only need the times at which the events occur in a poisson process. We’re now interested in how do we find these times. That is, how do we simulate when an event ocurs? (note that this is a fundamentally different question than sampling the poisson process directly because previously, we would have to decompose “time” into many small intervals, simulate a bernoulli RV for each interval, and see whether an event occurs or not → now, instead of thinking from the perspective of every time interval, most of which do not have any event anyway, we think from the perspective of the events themselves, and try to simulate when they will occur!)

definition

Let X(t)X(t) be a poisson process with rate λ\lambda. Let WnW_n be the time of the occurrence of the nn-th event, then it is called the waiting time of the nnth event. We always set W0=0W_0=0.

The time between two occurrences Sn=WnWn1S_n=W_n-W_{n-1} are called sojourn times. SnS_n measures the duration that poisson process sojourns (temporarily stays) in state nn.

Untitled

Then, basically knowing the sojourn times (or equivalently, the waiting times) is enough to completely specify the poisson process.

It turns out that the waiting times follow a gamma distribution, and the sojourn times follow an exponential distribution!

theorem

💡 Theorem: Waiting Time

The waiting time WnW_n has the gamma distribution whose PDF is:

fWn(t)=eλtλntn1(n1)!, n=1,2,, t0f_{W_n}(t) = e^{-\lambda t}\frac{\lambda^nt^{n-1}}{(n-1)!}, \ n=1,2, \cdots, \ t \geq 0

In particular, W1W_1 the time to the first event is exponentially distributed with PDF:

fW1(t)=λeλt, t0f_{W_1}(t) = \lambda e^{-\lambda t}, \ t \geq 0

The last equation should already give a hint: if the first waiting time follows an exponential distribution, then what happens if we “restart” the process (by simply “shifting” the time unit) every time an event occurs? Then, S2S_2 of the original becomes W1W_1' of the new process and even this new process has the same probabilistic structure, so W1W_1' also follows (the same!) exponential distribution. Hence, S2S_2 also follows the same exponential distribution And so on.

We can keep “restarting” the process at every event occurrence, and it’s clear that all the SiS_i’s are also exponentially distributed with the same PDF (and they are independent, because we’re literally “restarting” the process all over). Hence, Si.i.d.Exp(λ)S \stackrel{i.i.d.}{\sim} Exp(\lambda) distribution.

Recall that Expλ(t)=λeλt, t0Exp_\lambda(t)=\lambda e^{-\lambda t}, \ t \geq 0. The mean of the exponential distribution is 1/λ1/\lambda and the variance is 1/λ21/\lambda^2.

It’s intuitive to interpret the mean: the average sojourn time (aka time between any two events) is equal to the reciprocal of the average number of events per unit time. Example: If we expect to see 10 cars every hour passing a location, then the average sojourn time will be 6 mins, i.e., we expect to see a car roughly every 6 mins (on average).

Before moving forward, let’s derive the PDF ff for the waiting time of W1W_1(that we’ve already stated above).

  1. Our goal is to find a PDF ff that gives us the probability density f(W1=s)f(W_1=s) for every s>0s >0 (recall that ss is continuous here)

  2. Then, if the first event occurs exactly at time ss, then X(sh)=0X(s-h)=0 and X(s+h)=1X(s+h)=1, for any h>0h>0 (specifically we can take the limit as h0+h\to 0^+).

  3. So, instead of considering the PDF directly, we can find P(W1<s)P(W_1 < s) instead, i.e., probability of the first event happening before time ss. The event W1<sW_1 <s is equivalent to X(s)1X(s) \geq 1 (both cases mean that the first event has already happened before ss) and hence, P(W1<s)=P(X(s)1)P(W_1<s) = P(X(s)\geq1).

  4. Since X(t)Poi(λt)X(t) \sim Poi(\lambda t), we can find P(X(s)1)P(X(s) \leq 1) easily to be:

    P(X(s)1)=1P(X(s)=0)=1eλs(λs)00!=1eλsP(X(s) \geq 1) = 1 - P(X(s)=0)= 1-\frac{e^{-\lambda s}(\lambda s)^0}{0!} = 1 - e^{-\lambda s}

    Hence, we have the cumulative denstiy function of W1W_1

  5. By definition, the PDF is the derivative of the cumulative density function. So,

    f(s)=dds(1eλs)=λeλsf(s) = \frac{d}{ds}(1-e^{-\lambda s}) = \lambda e^{-\lambda s}

    and we get our desired result - the exponential distribution 🎉

We can do the same derivation for a general WnW_n as well.

Remarks

  • Relationshp between W1W_1 and X(t)X(t): P(X(t)1)=P(W1t)P(X(t) \geq 1) = P(W_1 \leq t)
  • The CDF of XExp(λ)X \sim Exp(\lambda) is given by:
    F(x)=xf(t)dt={1eλx, x00,otherwiseF(x) = \int_{-\infty}^x f(t)dt = \begin{cases} \begin{align*} &1-e^{-\lambda x},\ &x \geq 0 \\ &0, &\text{otherwise} \end{align*} \end{cases}
  • W1Exp(λ)W_1 \sim Exp(\lambda) and exponential random variables are memoryless. This is also why we can arbitrarily start the poisson process at any time. That is, we can start observing the process at any time, and W1W_1 always has the same distribution. This is NOT always intuitive:
    • Suppose the frequency of buses at a bus stop follows a poisson distribution with an average of 3 buses every hour. You come to the bus stop at 10am and no bus arrives even until 10:20am. Your friend then comes to the bus stop at 10:20am. The waiting time for him is still going to be the same exponential distribution (with mean = 20 mins), and since you will have to wait the same amount of time as him (you both board the first bus that arrives), you will have waited 20 mins longer than him 😢 This is because in a poisson process, disjoint time intervals are independent! (however, this assumption is rarely true in practice - people would get mad at SBS and SMRT if this assumption were true 😲)
    • Another example: if a lightbulb’s lifetime follows an exponential distribution Exp(λ)Exp(\lambda) and it has already been glowing fine for 100100 hours, what is the expected amount of time we expect it to continue for before it dies? Well, 100+1/λ100 +1/\lambda. Starting at t=100t=100 (or in fact, any tt), the average number of more hours we expect it to glow for is 1/λ1/\lambda → giving a total of 100+1/λ100+1/\lambda hours. Again, this kind of assumption is rarely true in practice with bulbs 😖
    • Formally, if YExp(λ)Y \sim Exp(\lambda), then P(Y>s+tY>t)=P(Y>s)P(Y>s+t|Y>t) = P(Y>s) is called the memoryless property → the future of YY is independent of the past.
    • In fact, the exponential RV is the only(!) continuous RV possessing the memoryless property. So, memorylessness property indicates (aka implies) the exponential distribution.
    • It’s good to model lifetimes/waiting times until occurrence of some specific event.
  • Sum of exponential RVs follows a gamma distribution, so WkW_k follows a gamma distribution when k2k \geq 2.

Hence, the above theorem and properties gives us another way to define the poisson process using sojourn times, as follows:

Suppose we start with a sequence {Sn:n0}\{S_n: n \geq 0 \} of independent and identically distributed RVs, each with mean 1/λ1/\lambda. Define a counting process by saying that the nnth event occurs at time

Wn=S0+S1++Sn1W_n=S_0+S_1+ \cdots + S_{n-1}

then the resulting process will be a poisson process with rate λ\lambda.

Using this perspective, we can easily simulate a poisson process on the time interval (0,T)(0,T) as follows:

  1. Set t:=0t:=0, I:=0I:=0, and WW to be a vector initialized to [0][0].
  2. Repeat
    1. Generate SExp(λ)S \sim Exp(\lambda)
    2. Set t=t+St=t+S
    3. If t>Tt > T, then exit the algorithm
    4. Increment II
    5. Set WI=tW_I=t
  3. Return WW (the vector of waiting times)

WW contains the time that every event happens. To draw the Poisson process, start with X(0)=0X(0)=0 and increase X(t)X(t) by 11 each time tWt \in W.

What can we infer if we know that {X(t):t0}\{X(t):t \geq 0 \} is a Poisson process with parameter λ\lambda?

  • At any time, we know that the distribution X(t)Poi(λt)X(t) \sim Poi(\lambda t)?
  • W1Exp(λ)W_1 \sim Exp(\lambda) and n2,Wn\forall n \geq 2, W_n follows a Gamma distribution.

Now, if we put X(t)X(t) and WnW_n together

  • What if we have already observed X(t)=1X(t)=1, and we’re interested in W1W_1?
  • What if we have already observed X(t)=nX(t)=n, and we’re interested in the joint disribution of W1,,WnW_1, \cdots, W_n?

Conditional Distribution

Suppose we are told that exactly one event of a Poisson process has occurred by time tt, i.e., X(t)=1X(t)=1, and we’re asked to determine the distribution of the time at which the event occurred.

Intuitively, we know there is a “uniform” probability that the event occurred at any time in the interval (0,t)(0,t) because the interval has a constant rate λ\lambda. Let’s prove this intuition more rigorously.

Let fW1f_{W_1} be the PDf of W1W_1.

Since we’re given that X(t)=1X(t)=1, obviously s>t, fW1(sX(t)=1)=0\forall s > t, \ f_{W_1}(s|X(t)=1) = 0 → the first event must have happened in the interval (0,t](0,t] and cannot have been after tt.

note

It’s generally easier to solve problems of the kind “what happens in the future, given that you know what happened in the past” → use bayes theorem, if necessary, to convert problems to this form and then solve using conditional probability.

We can use (the mixed form of) Bayes theorem now, and only consider sts \leq t:

fW1(sX(t)=1)=fW1(s)P(X(t)=1W1=s)P(X(t)=1)\begin{equation} f_{W_1}(s|X(t)=1) = \frac{f_{W_1}(s)P(X(t)=1|W_1=s)}{P(X(t)=1)} \end{equation}

Now, we know that fW1Exp(λ)f_{W_1} \sim Exp(\lambda). Then, fW1(s)=λeλsf_{W_1}(s) = \lambda e^{-\lambda s}.

The term P(X(t)=1W1=s)P(X(t)=1|W_1=s) is equivalent to finding the probability that there are no events between ss and tt (which is the same as P(X(ts)=0)P(X(t-s)=0) too! → using constant rate property to shift the time interval as necessary). And we know that X(ts)Poi(λ(ts))X(t-s) \sim Poi(\lambda (t-s)). That is,

P(X(t)W1=s)=eλ(ts)(λ(ts))00!=eλ(ts)P(X(t)|W_1=s) = \frac{e^{-\lambda(t-s)}(\lambda (t-s))^0}{0!} = e^{-\lambda (t-s)}

And notice that in the denominator, X(t)=1X(t)=1 is possible if, and only if, the first event happened at ss for some sts \leq t and then no further events happened in (s,t](s,t]. Applying the law of total probability (continuous form):

P(X(t)=1)=0tfW1(s)P(X(t)=1W1=s)dsP(X(t)=1) = \int_0^t f_{W_1}(s)P(X(t)=1|W_1=s)ds

Using the actual distributions, we get:

P(X(t)=1)=0t(λeλs)eλ(ts)ds=λ0teλtdsP(X(t)=1) = \int_0^t (\lambda e^{-\lambda s}) e^{-\lambda (t-s)}ds= \lambda \int_0^t e^{-\lambda t}ds

Solving the above, we get:

P(X(t)=1)=λteλtP(X(t)=1) = \lambda te^{-\lambda t}

Substituting everything we have so far into equation (2)(2), we get:

fW1(sX(t)=1)=(λeλs)(eλ(ts))λtet=1tf_{W_1}(s|X(t)=1)= \frac{(\lambda e^{-\lambda s})(e^{-\lambda(t-s)})}{\lambda te^{-t}} = \frac 1 t

for 0<st0 < s \leq t, which is a uniform distribution over the interval (0,t](0, t] (as intuitively expected).

It means that given X(t)=1X(t)=1, this one occurrence could’ve happened at any time point on (0,t](0,t] with equal probability (density).

If we consider the generalized case that X(t)=nX(t)=n, and we want to know the joint distribution of W1,W2,,WnW_1, W_2, \cdots, W_n, then we can use the same approach as before (exercise for the reader).

The final result is:

f(w1,w2,,wnX(t)=n)=n!tnf(w_1,w_2, \cdots, w_n|X(t)=n) = \frac{n!}{t^n}

where 0w1w2wnt0 \leq w_1 \leq w_2 \leq \cdots \leq w_n \leq t.

It is basically the joint distribution of nn independent Unif(0,t)Unif(0,t) random variables (with an ordering restriction). Why? Because for X(t)=nX(t)=n exactly, we can just generate nn uniform random variables (since each event is equally likely to happen at any time), without considering the ordering of the events yet. Once we have the realizations, we can order them in ascending order, and assign w1,w2,,wnw_1, w_2, \cdots, w_n to them accordingly.

Each of the Uniform RVs has density 1/t1/t so the joint density (without “reordering”) is 1/tn1/t^n. But there are n!n! permutations of these orderings, and we consider all of them to be equivalent because we only consider the events in sorted order. That is, n!n! permutations of the realizations of these uniform random variables all map to the same “event” w1,,wnw_1, \cdots, w_n because we will order all of them the say way.

Hence, under the constraint w1w2wnw_1 \leq w_2 \leq \cdots \leq w_n, we have the density n!/tnn!/t^n at every point.

Notice that we can simply consider n!n! permutations instead of considering what happens when two uniform random variables have exactly the same value. Why? Because that’s “impossible” - 2 different continuous random variables cannot have the (exact) same value ever! (read a good explanation here 🎉)

Moreover, when our quantity of interest does not depend on the actual ordering, we can simply use the above result instead of the following one. Example: We’re interested in E[iWi]E[\prod_i W_i] or E[iWi]E[\sum_iW_i]→ this clearly will not change if we reorder the variables. So, we can simply consider each WiW_i to be a uniform random variable and solve it without relying on order statistics.

Waiting Times

Now, we consider the waiting time to the kkth event (in order!).

Note that the joint distribution is the same with nn independent uniform RV. Hence, the PDF of WkW_k is the same with the one for the kkth order statistic (taking the kkth smallest value from the realizations of the nn RVs).

important

Given that X(t)=nX(t)=n, the nn arrival/waiting times W1,W2,,WnW_1, W_2, \cdots, W_n have the same distribution as the order statistics corresponding to nn independent random variables uniformly distributed on the interval (0,t)(0,t), which is

fk(x)=n!(nk)!(k1)!1t(xt)k1(txt)nkf_k(x)=\frac{n!}{(n-k)!(k-1)!} \frac{1}{t} \left ( \frac x t \right )^{k-1} \left (\frac{t-x}{t} \right )^{n-k}

It’s important to remember that the waiting times W1,,WnW_1, \cdots, W_n are NOT independent of each other! Clearly, knowing one gives you information (an upper/lower bound) on the other.

However, if we’re not concerned about the order of waiting times, i.e., we’re interested in a quantity that remains unchanged no matter what the order of WiW_i’s are, and we’re given X(t)=nX(t)=n, then instead of considering the WiW_i’s, we can consider INDEPENDENT uniform random variables on (0,t](0,t] and find the quantity of interest on these uniform random variables instead. Why does this matter? Because now they are independent! So, we can even find out their variance easily (since the covariance terms vanish). We cannot directly do this for the WiW_i’s.

Why is the above true? Because the quantity derived from the independent uniform random variables has the same distribution as that derived from WiW_i’s. Each individual UiU_i is obviously not going to be identically distributed with WiW_i because the WiW_i’s are ordered, but UiU_i’s are not.