Given a Hidden Markov Model and a sequence of observations, the Viterbi Algorithm determines the most probable sequence of hidden states very efficiently. For instance in speech recognition, it is used in the language model to find the most probable word out of a phonemes sequence.


1 Motivation


In many applications of Hidden Markov Models, it is often of interest to find the most probable state sequence if an observation sequence is given. A naive approach would be to compute the probability of each combination of states and then to choose the most probable state combination. However, an increase of the number of states  comes along with an increase of the number of combinations. This again results in an increase of the computational costs in an exponential manner. Consequently in real life, it is only possible to determine the most probable state sequence for small . In contrast to the naive approach, the Viterbi-Algorithm is also suitable for greater since its computational costs grow only linearly. The basic idea of this algorithm is to find the most probable state sequence iteratively. For each state at time , the most probable path is searched depending only on the most probable path to state at time and the transition probability from state to state .


2 Problem Definition


Before providing you more information about the concept of the Viterbi Algorithm, the general problem is outlined in this section. Let us assume that a Hidden Markov Model is given. Then let  be the number of states in the model  and  the number of discrete observations . At time , the observation  can be made and it is assumed that the Hidden Markov Model is in state . Furthermore,  is defined as the transition probability from state  to state

and  is defined as the observation probability distribution

and is the initial state distribution

The goal of the Viterbi Algorithm is to find the optimal state sequence  such that its probability  is maximal given an observation sequence . Recall that is defined as the state at time t.



3 Basic Idea


The main idea of the Viterbi Algorithm is that at every time  and state the most probable state is determined in dependence of the most probable state sequence at time and the transition probability . For simplicity, let's define the variable , which gives the probability of the optimal state sequence from to state  at time . Mathematically,  is defined as:


where and  stands for state  at time . The advantage of this definition is that  can be iteratively calculated by

The probability of the desired optimal state sequence  is equal to . However,  cannot give you any information about the optimal state sequence. Hence, a new variable  is introduced, which should deliver the most probable following state  at time  for a given state at time :



Finally, the optimal state sequence  can be determined by backtracking:


4 The Algorithm



 Summarizing the ideas of the last section, the Viterbi Algorithm for finding the optimal state sequence reads:

  • Initialization:




  • Recursion:     for




  • Termination:



The input of the algorithm is a given Hidden Markov Model with its parameters and an observation sequence. The output is the most probable state sequence .


5 A Small Example


The goal of the following example is to determine the most probable word given the phonemes sequence . For this, a Hidden Markov Model with three states (, , ) is used. It is given in the figure below. For initialization, it is assumed that the three states are initially equally probable, e.g. .


Hidden Markov Model with three states (A,n,e)


5.1 Initialization


For initialization, and  are determined first. Hereby, it is assumed that observation  occurs.



5.2 Recursion

 Now and  are determined iteratively for .

 time step t = 2:




   time step t=3:





   time step t=4:


5.3 Termination

 By backtracking, the most probable state sequence can be determined:


 Hence, the most probable state sequence  is  which corresponds to the word "Anne".


6 References


[1] Bishop, C. (2006). Pattern Recognition and Machine Learning.

[2] Duda, R.  & Hart, P. &  Strok, D., Pattern Classification.

[3] Lee, D. (2014). Lecture notes of the lecture Machine Learning in Robotics