[MUSIC] In this lecture, we will formalize the computation of transient probabilities. Now, 1-step probabilities, where do you find them? In the P matrix, of course. Because the P matrix gives us the probability to move from state s to state s prime in one time instance. Now, DTMCs have yet another nice property. They are time homogeneous. What does it mean? It means that the probability to move from state s to state s prime is independent of the time instance at which we move. So if we express this in terms of random variables, it doesn't matter if we want to know the probability to be instead s prime at time point n. When we've been instead s at time point n minus 1 or whether we do the same for time points 0 and 1. You know that P is a stochastic matrix. This means that P has Eigenvalue 1, and that all Eigenvalue are at most 1. This also means that P to the power of N still is a stochastic matrix. Which makes the computation of the transient probabilities much easier. So now, how if we want to jump n times? Well, we can express the probability of being in state s prime at step n when we are in state s at a step m that is smaller than n. And this is formalized in this equation. And we see, That here also, the concept of time homogeneity is applied. So the probability remains the same, we start in state s, we move to state s prime, in n minus m steps. Now what we want to compute is the probability distribution at time n. This is Ps(n) or written as a random variable, the probability that the random variable at time point n takes the value s. And this is expressed by the so-called Chapman–Kolmogorov equations. What do we do? We sum over the complete state space, over all the states s prime in s. And then we multiply the probability of being in state s prime at time point n. With the probability to move from state s prime to state s in n steps. And this is again, the n step probability. Of course, I can also write this down in matrix-vector notation. And then the transient probability distribution at time point n equals the initial probability distribution p0 times the P matrix to the power of N, of course given by a half this initial distribution. And p(n) is then called the n-step transient state probability vector. Now, let's take a look at the evolution of an example DTMC. We have six states and at the beginning, the probability mass is nicely accumulated in the upper left state. Now, we let one epoch go. And you see that the probability mass distributes to the state, In the bottom, here and there according to the probabilities on the edges. Now we do one more epoch, and now you see that again, the probability from the middle upper state moves to the left and down. One more, and you see again the probability mass distribution changes. The question now is of course, what happens if I keep going? Does this have an end? Do the n-step transient state probability vectors converge? Under all circumstances? When? When not? How quickly? Is there a limiting distribution? Now, how would a limiting distribution look like? A limiting distribution doesn't change anymore. So we keep multiplying the transient distribution with the P matrix and nothing changes. Then we have reached the limiting distribution. The limiting distribution is given by taking the limit of the step numbers to infinity. And it doesn't always exist. But if it exists, we can compute it simply by multiplying the v vector, the limit, with the P matrix from which we have to subtract the identity matrix. Now, the identity matrix has ones on the diagonal and all other entries are zero. And then we have something more, and that are the so-called normalization equation. The normalization equation tells us that the sum of the entries of the v vector equals to 1. Okay, now I have a question for you. Why do we need this normalization equation? What do you think? Yes, correct. The equations are linearly dependent and hence have an infinite number of solutions. So now we need this normalization equation to pick out the one solution that corresponds to a probability distribution. Let's use this on the example of the web server. Now, first of all, we have to build the system of linear equation that corresponds to our P matrix. Now I have provided you here the P matrix, and we have to subtract the identity matrix. Now I'm sure you can do this on your own, but I'll do it for you just once. 0.9 minus 1 is minus 0.1, the second entry doesn't change. And 0.4 doesn't change, and then 0.6 minus 1 is minus 0.4. Now, we have to perform this vector matrix multiplication to derive the linear equations. What do I got? -0.1 times v0 plus 0.4 times v1 equals 0. And 0.1 times v0 minus 0.4 times v1 also equals to 1. And we need the normalization equation, which states that v0 plus v1 equals to 1. So now, let me clean this up for you. And now, what we need to do is we need to solve this system of linear equations. You can try this on your own or you can watch me do it for you once. So I've copied the system of linear equations and now what I recommend to do is pick one of the variables, v0 or v1. And express the other one in terms of the one you've picked. So I use v0 here, so I have to express v1 in terms of v0, and this yields a v1 is 1/4 v0. Now I can use the normalization equation, sum them both up, and then I can easily derive that v0 equals 4/5, so it is 0.8. And that I can use to come up with the value for v1, which is 0.2. Now, this is a very easy example with only two states. I'm sure that you can repeat this exercise with the more difficult models. And if you want to have some inspiration, I would recommend you to look at this book. Probability, Markov Chains, Queues, and Simulation by William Stewart. Also for more formal explanations, which I cannot give to in this screencast. [MUSIC]