** Given a Hidden Markov Model and a sequence of observations, the Viterbi Algorithm determines the most probable sequence of hidden states very efficiently. For instance in speech recognition, it is used in the language model to find the most probable word out of a phonemes sequence.**

# 1 Motivation

In many applications of Hidden Markov Models, it is often of interest to find the most probable state sequence if an observation sequence is given. A naive approach would be to compute the probability of each combination of states and then to choose the most probable state combination. However, an increase of the number of states comes along with an increase of the number of combinations. This again results in an increase of the computational costs in an exponential manner. Consequently in real life, it is only possible to determine the most probable state sequence for small . In contrast to the naive approach, the Viterbi-Algorithm is also suitable for greater since its computational costs grow only linearly. The basic idea of this algorithm is to find the most probable state sequence iteratively. For each state at time , the most probable path is searched depending only on the most probable path to state at time and the transition probability from state to state .

# 2 Problem Definition

Before providing you more information about the concept of the Viterbi Algorithm, the general problem is outlined in this section. Let us assume that a Hidden Markov Model is given. Then let be the number of states in the model and the number of discrete observations . At time , the observation can be made and it is assumed that the Hidden Markov Model is in state . Furthermore, is defined as the transition probability from state to state

and is defined as the observation probability distribution

and is the initial state distribution

The goal of the Viterbi Algorithm is to find the optimal state sequence such that its probability is maximal given an observation sequence . Recall that is defined as the state at time t.

# 3 Basic Idea

The main idea of the Viterbi Algorithm is that at every time and state the most probable state is determined in dependence of the most probable state sequence at time and the transition probability . For simplicity, let's define the variable , which gives the probability of the optimal state sequence from to state at time . Mathematically, is defined as:

where and stands for state at time . The advantage of this definition is that can be iteratively calculated by

The probability of the desired optimal state sequence is equal to . However, cannot give you any information about the optimal state sequence. Hence, a new variable is introduced, which should deliver the most probable following state at time for a given state at time :

Finally, the optimal state sequence can be determined by backtracking:

# 4 The Algorithm

Summarizing the ideas of the last section, the Viterbi Algorithm for finding the optimal state sequence reads:

**Initialization:**

** **

**Recursion:**for

** **

** **

**Termination:**

** **

The input of the algorithm is a given Hidden Markov Model with its parameters and an observation sequence. The output is the most probable state sequence .

# 5 A Small Example

The goal of the following example is to determine the most probable word given the phonemes sequence . For this, a Hidden Markov Model with three states (, , ) is used. It is given in the figure below. For initialization, it is assumed that the three states are initially equally probable, e.g. .

## 5.1 Initialization

For initialization, , and are determined first. Hereby, it is assumed that observation occurs.

## 5.2 Recursion

Now , and are determined iteratively for .

** time step t = 2:**

** time step t=3:**

**time step t=4:**

## 5.3 Termination

By backtracking, the most probable state sequence can be determined:

Hence, the most probable state sequence is which corresponds to the word "*Anne*".

# 6 References

[1] Bishop, C. (2006). *Pattern Recognition and Machine Learning*.

[2] Duda, R. & Hart, P. & Strok, D., *Pattern Classification.*

[3] Lee, D. (2014). Lecture notes of the lecture *Machine Learning in Robotics *