# 1. Normal distribution

The basis for Gaussian Mixture Models is the Normal or Gaussian distribution with the probability density function (pdf)

${p}(x)\,&space;=&space;\,&space;{\frac{{{1}}}{{{{\sqrt&space;{2\pi&space;\sigma^2}&space;}}&space;}}\exp&space;\left(&space;-&space;\frac{(x&space;-&space;{\mu})^2}{2\sigma^2}&space;\right)}$

with mean $\mu$ and variance $\sigma$. Plotting this one dimensional function with $\mu&space;=&space;0$ and $\sigma&space;=&space;1$ leads to the famous Gauss curve

# 2. Multivariate normal distribution

This distribution can be extended to a multidimensional case. Let's assume a n-dimensional random variable $X&space;=&space;[X_1,&space;X_2,...,X_n]$ with Gaussian distribution $X&space;\sim&space;{\cal&space;N}({\mu&space;_m},{C_m})$.

The pdf is now given by
${p}(x)\,&space;=&space;\,&space;{\frac{{{1}}}{{{{\sqrt&space;{2\pi&space;}&space;}^n}\sqrt&space;{\det&space;{C}}&space;}}\exp&space;\left(&space;-&space;\frac{1}{2}{{(x&space;-&space;{\mu})}^T}C^{&space;-&space;1}(x&space;-&space;{\mu})\right)}$

where  $\mu&space;=&space;[E[X_1],&space;E[X_2],...,E[X_n]]$ is the mean vector and  $C&space;=&space;[\operatorname{Cov}[X_i,&space;X_j]];&space;i,j=&space;1,2,...,n$ the covariance matrix.

We now regard a two dimensional Gaussian distribution with

$\mu&space;=&space;[0,&space;0]$

$C&space;=&space;\begin{bmatrix}&space;0.25&space;&&space;0\\&space;0&space;&&space;2.5&space;\end{bmatrix}$.

The diagonal elements describe the variance of one Gaussian random variable, the off-diagonal elements the covariance between the random variables. These are zero here, hence the two random variables are independent. The distribution or more precisely the pdf of the distribution, can be visualized in different ways. The first plot shows how the 2-d distribution is formed by two 1-d marginal distributions. The same distribution is again shown in the second plot, where the floor is given the two dimensions of the distribution and the third dimension depicts the probability. Instead of plotting the 3-d shape of the Gauss bell, one can also "cut" several times through the bell and plot the contour lines. The resulting ellipses represent lines of equal probability.

# 3. Gaussian mixture models

However the multivariate Gaussian distribution still has clear limitations when it comes to modelling real data, which does not necessarily follow the Gauss curve. A much more flexible approach is to use a superposition of several Gaussian distributions, each weighted by a coefficient $c_m$. This is called a mixture model and the Gaussian densities ${\cal&space;N}({\mu&space;_m},{C_m})$ are called components. Each component has its own mean $\mu_m$ and covariance $C_m$. The pdf for $M$ components is given by

${p}(x)\,&space;=&space;\sum\limits_{m&space;=&space;1}^M&space;{{c_m}&space;{\cal&space;N}({\mu&space;_m},{C_m})}&space;=&space;\,\sum\limits_{m&space;=&space;1}^M&space;{\frac{{{c_m}}}{{{{\sqrt&space;{2\pi&space;}&space;}^n}\sqrt&space;{\det&space;{C_m}}&space;}}\exp&space;\left(&space;-&space;\frac{1}{2}{{(x&space;-&space;{\mu&space;_m})}^T}C_m^{&space;-&space;1}(x&space;-&space;{\mu&space;_m})\right)}$.

We also note that the sum of the mixing coefficients adds up to one.

$\sum\limits_{m&space;=&space;1}^M&space;{c_m}&space;=&space;1$

By increasing the number of components and adjusting the parameters of each component, almost any set of data can be represented by a Gaussian mixture model (GMM) with high accuracy.

# 4. Fitting with sample data

To illustrate the advantage of GMMs over multivariate Gaussians, peakly distributed data with two dimensions has been generated (first plot). Now we want to find a model for this bimodal data set. The second figure shows a multivariate Gaussian distribution fitted to the data. It is obvious that the distribution does not well represent the original data.

Now we try the same with a Gaussian mixture model. Therefore the data has to be split into clusters first. Each cluster represents one component of the mixture model. Clustering can be done with the k-means Algorithm. After the data has been separated into clusters, each cluster is represented by a multivariate Gaussian distribution. The distribution is fitted to the data using a technique called Expectation-Maximization (EM).

Our sample data splits into two clusters. The last figure shows a GMM fitted to the data. As we can see, in contrast to simple multivariate Gaussians, the Gaussian mixture model represents the data very well. In speech recognition tasks, GMMs are frequently used to model the acoustic feature vectors. The article "Gaussian mixture models for speech recognition" deals with this application.

# References

[1] Bishop, Christopher M. Pattern recognition and machine learning. Vol. 4. No. 4. New York: springer, 2006.

[2] DeGroot, Morris H., et al. Probability and statistics. Vol. 2. Reading, MA: Addison-Wesley, 1986.