Keywords: maximum likelihood, Gaussian mixtures
Gaussian mixtures had been discussed in ‘Mixtures of Gaussians’. And once we have training data and a certain hypothesis, what we should do next is estimating the parameters of the model. Both kinds of parameters from a mixture of Gaussians
and latent variables:
we defined, need to estimate. However, when we investigated the linear model ‘Maximum Likelihood Estimation’, we considered the prediction of the linear model has a Gaussian distribution, and then maximum likelihood was used to derive parameters in the model. So, we could employ the maximum likelihood again in the Gaussian mixture task, and find whether it could work well.
Firstly, we should prepare the notations that will be used in following analysis:
- input data for and and assuming they are i.i.d. Rearranging them in a matrix:
- Latent varibales , the assistant random varible to for . And according to matrix (3), the matrix of latent varibales is
Once we got these two matrices, basing on the equation (1) in ‘Mixtures of Gaussians’ as following:
the log-likelihood function is given by:
This looks different from the single Gaussian model where the logarithm operates directly on who is an exponential function. However, the existence of summation in the logarithm makes the problem hard to solve.
And for the combination of the Gaussian mixture is arbitrary. We could have equivalent solutions. So which one we get did not effect on our model.
The other differences from the single Gaussian model are the singularity of the covariance matrix and the condition .
For in the Gaussian distribution, the covariance matrix must be able to be inverted. So in our following discussion, we assume all the covariance matrice are invisible. For simplicity we take where is identity matrix.
When a point in sample accidentally equals to the mean , the Gaussin distribution of the random variable is:
When the variance , this part goes to infinity and the whole algorithm failed.
This problem does not exist in a single Gaussian model, for the is not an exponent in its log-likelihood.
The maximum likelihood method is not suited for a Gaussian mixture model. Then we introduce the EM algorithm in the next post.
Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006. ↩︎