Mixtures of Gaussians

Keywords: mixtures of Gaussians

A Formal Introduction to Mixtures of Gaussians[1]

We have introduced a mixture distribution in the post ‘An Introduction to Mixture Models’. And the example in that post was just two components Gaussian Mixture. However, in this post, we would like to talk about Gaussian mixtures formally. And it severs to motivate the expectation-maximization(EM) algorithm.

Gaussian mixture distribution can be writen as:

p(x)=k=1KπkN(xμk,Σk)(1)p(\boldsymbol{x})= \sum_{k=1}^{K}\pi_k\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_k,\Sigma_k)\tag{1}

where k=1Kπk=1\sum_{k=1}^K \pi_k =1 and 0πk10\leq \pi_k\leq 1.
And then we introduce a random variable(vector) called latent varible(vector) z\boldsymbol{z}, that each component:

zk{0,1}(2)z_k\in\{0,1\}\tag{2}

and z\boldsymbol{z} is a 11-of-KK representation, which means there is one and only one component is 11 and others are 00. To build a joint distribution p(x,z)p(\boldsymbol{x},\boldsymbol{z}), we should build p(xz)p(\boldsymbol{x}|\boldsymbol{z}) and p(z)p(\boldsymbol{z}) firstly. We define the distribution of z\boldsymbol{z}, we found:

p(zk=1)=πk(3)p(z_k=1)=\pi_k\tag{3}

is a good design, for {πk}\{\pi_k\} for k=1,,Kk=1,\cdots,K meets the requirements of the probability distribution. And for the entire vector z\boldsymbol{z} equ(3) can be written as:

p(z)=Πk=1Kπkzk(4)p(\boldsymbol{z}) = \Pi_{k=1}^K \pi_k^{z_k}\tag{4}

And according to the definition of p(z)p(\boldsymbol{z}) we can get the condition distribution of x\boldsymbol{x} given z\boldsymbol{z}. Under the condition zk=1z_k=1, we have:

p(xzk=1)=N(xμk,Σk)(5)p(\boldsymbol{x}|z_k=1)=\mathcal{N}(\boldsymbol{x}|\mu_k,\Sigma_k)\tag{5}

and then we can derive the vector form of condtional distribution:

p(xz)=Πk=1KN(xμk,Σk)zk(6)p(\boldsymbol{x}|\boldsymbol{z})=\Pi_{k=1}^{K}\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_k,\Sigma_k)^{z_k}\tag{6}

Once we have both the probability distribution of z\boldsymbol{z}, p(z)p(\boldsymbol{z}) and conditional distribution of x\boldsymbol{x} given z\boldsymbol{z}, p(xz)p(\boldsymbol{x}|\boldsymbol{z}). And we can build joint distribution by multiplication principle:

p(x,z)=p(z)p(xz)(7)p(x,z) = p(\boldsymbol{z})\cdot p(\boldsymbol{x}|\boldsymbol{z})\tag{7}

However, what we concern is still the distribution of x\boldsymbol{x}. We can calculate x\boldsymbol{x} by simply:

p(x)=jp(x,zj)=jp(zj)p(xzj)(8)p(\boldsymbol{x}) = \sum_{j}p(x,z_j) = \sum_{\boldsymbol{j}}p(\boldsymbol{z_j})\cdot p(\boldsymbol{x}|\boldsymbol{z_j})\tag{8}

where zjz_j is every possible value of random variable zz
This is how latent variables construct mixture Gaussians. And this form is easy for us to analyze the distribution of a mixture model.

‘Responsibility’ of Gaussian Mixtures

Bayesian formula can help us produce posterior. And the posterior probability of latent varibale z\boldsymbol{z} by equation (7) can be calculated:

p(zk=1x)=p(zk=1)p(xzk=1)jKp(zj=1)p(xzj=1)(9)p(z_k=1|\boldsymbol{x})=\frac{p(z_k=1)p(\boldsymbol{x}|z_k=1)}{\sum_j^K p(z_j=1)p(\boldsymbol{x}|z_j=1)}\tag{9}

and substitute equation (3),(5) into equation (9) and we get:

p(zk=1x)=πkN(xμk,Σk)jKπjN(xμj,Σj)(10)p(z_k=1|\boldsymbol{x})=\frac{\pi_k\mathcal{N}(\boldsymbol{x}|\mu_k,\Sigma_k)}{\sum^K_j \pi_j\mathcal{N}(\boldsymbol{x}|\mu_j,\Sigma_j)}\tag{10}

And p(zk=1x)p(z_k=1|\boldsymbol{x}) is also called reponsibility, and denoted as:

γ(zk)=p(zk=1x)(11)\gamma(z_k)=p(z_k=1|\boldsymbol{x})\tag{11}

References


  1. Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006. ↩︎