So far, you've calculated the transition and emission probabilities for the Markov Chain and the Hidden Markov model. Given the parts of speech tag in these probabilities, you can easily select the most likely next parts of speech tag or the most probable word. You can do so by looking up the correct entry in the respective row of the transition or emission matrix. But was if you're given a sentence, like why not learn something? What is the most likely sequence of parts of speech tags given the sentence in your model? The sequence can be computed using the Viterbi algorithm. You're about to see lots of formulas which are all based on matrices representing our Hidden Markov model, but the Viterbi algorithm is actually a graph algorithm. Picturing the problem we want to solve on the graph will make it much easier for us to understand the formulas and the algorithm. Let's look at this toy model and the sentence, I love to learn. With a leading start token, you want to find the sequence of hidden states or parts of speech tags that have the highest probability for this sequence. Be aware that the word love can be emitted by both the noun states, NN and the verb states, NN. You starting from the initial states by selecting the next most probable hidden states which here is the O state as the word I cannot be emitted from any other states in this toy model. This involves the transition probabilities shown in green with 0.3 and the emission probability in orange with 0.5. The joint probability for observing the word I and with a transition through the O state is 0.15, which you can calculate by multiplying the transition probability, 0.3 and the emission probability of 0.5. Now, there are two possibilities of having observed the word love. It is either by traversing through the hidden states, NN or the hidden states, VB. The transition probabilities are the same for going to either of the two hidden states. The emission probability for the word love is higher from the VB states, so you should choose that path with a combined probability of 0.25. Next, you return to the O state as there is no other hidden states with a non-zero emission probability for the word to. The combined probability here is 0.08. At last, return to the VB states again as there is no other option for this toy model. This step has a combined probability of 0.1. The total probability is the product of all the probabilities for the single steps you've chosen, which is 0.0003 here. The Viterbi algorithm actually computes several such paths at the same time in order to find the most likely sequence of hidden states. It uses the matrix representation of the Hidden Markov model. The algorithm can be split into three main steps: the initialization step, the forward pass, and the backward pass. Given your transition and emission probabilities, you first populates and then use the auxiliary matrices C and D. The matrix C holds the intermediate optimal probabilities and matrix D, the indices of the visited states. As you're traversing the model graph to find the most likely sequence of parts of speech tags for the given sequence of words, W_1 all the way to W_K. These two matrices have n rows, where n is the number of parts of speech tags or hidden states in our model, and k columns, where k is the number of words in the given sequence.