1. Stochastic processes

A stochastic process is defined as a sequence of random variables $x_1,&space;x_2,&space;...$. The random variable $x_n$ is called state of the process at time $n$. We assume a discrete time process here, i.e. the state is observed only at these time instances. The simplest example of such a process is a sequence of independent observations:

Sequence of independent observations [1]

Assuming all observations as independent would however mean loosing all time dependent information. To overcome this, we regard a process where the current state depends on the states observed before. To fully describe such a process, it is necessary to define the probability distribution for the initial state $p(x_1)$ as well as for all following states $n=2,3,...$ the conditional distribution $p(x_n|x_1,...,x_{n-1})$

Dependent observations [1] (altered)

As we can see from the figure, with a growing number of states we quickly get confusingly many dependencies. A model where a state depends on all the states visited before is therefore not feasible in a practical system.

2. Markov models

One special type of a stochastic process is the so called Markov chain. It has the convenient property that the conditional probability of the current state $n$ depends only on the previous state:

$p(x_n|x_1,...,x_{n-1})&space;=&space;p(x_n|x_{n-1})$

As we can see, the model stays simple even when the number of states is increased.

Markov model [1]

Let's assume we want to predict the weather with a Markov model. Our model should have two states: "rainy" and "sunny". It is known that after a rainy day the next day will be sunny with a probability of 30%. After a sunshine day, the next day will bring rain with a 40% chance. The state diagram for this example is shown below.

Based on the weather of the current day, we can now make a prediction of the weather on the following day.

3. Hidden markov models

In the previous example, the weather and the state respectively can be observed directly. For many applications however, this is not the case. We thus extend our model to a so called Hidden markov model (HMM). The model consists of an underlying Markov process with states $z_1,z_2,...$ which cannot be observed directly. What can be observed are the emissions $x_1,x_2,...$. The emission $x_n$ is a probabilistic function of the state $z_n$. One important assumption inherent with Hidden markov models is that the current state contains all information about previous observations.

Hidden markov model [1]

Let's go back to our weather example from before and extend it to an HMM. There is a prisoner locked into a dungeon without windows. He cannot see if it's a rainy or a sunny day. The only thing that he can observe is whether the shoes of the guards are clean or dirty. He knows that if it's raining, the guards' shoes are dirty in 90% of the cases, but when it's sunny they are dirty only with a 60% chance. The state space diagram now looks as follows:

The prisoner can exploit this knowledge and make predictions about the weather outside, just by looking at the guards' shoes. Leaving the prison again and entering the outside world, one can find many applications for HMMs. They are often used as a model for temporal pattern recognition applications. In this context they are frequently applied for speech recognition tasks.

References

[1] Bishop, Christopher M. Pattern recognition and machine learning. Vol. 4. No. 4. New York: springer, 2006.

[2] DeGroot, Morris H., et al. Probability and statistics. Vol. 2. Reading, MA: Addison-Wesley, 1986.

[3] Rabiner, Lawrence. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77.2 (1989): 257-286.