Bayesian Model Averaging(BMA) and Combining Models

Keywords: Bayesian model averaging, BMA and combining models

Bayesian Model Averaging(BMA)[1]

Bayesian model averaging(BMA) is another wildly used method which is very like a combining model. However, the difference between BMA and combining models is significant.

A Bayesian model averaging is a Bayesian formula in which the random variable are models(hypothesises) h=1,2,,Hh=1,2,\cdots,H with prior probability p(h)p(h), then the marginal distribution over data XX is:

p(X)=h=1Hp(Xh)p(h)p(X)=\sum_{h=1}^{H}p(X|h)p(h)

And the MBA is used to select a model(hypothesis) that can model the data best through Bayesian theory. When we have a larger size of XX, the posterior probability

p(hX)=p(Xh)p(h)i=1Hp(Xi)p(i)p(h|X)=\frac{p(X|h)p(h)}{\sum_{i=1}^{H}p(X|i)p(i)}

become sharper. Then we got a good hypothesis.

Mixture of Gaussian(Combining Models)

In post ‘Mixtures of Gaussians’, we have seen how a mixture of Gaussians works. Then joint distribution of input data x\boldsymbol{x} and latent varible z\boldsymbol{z} is:

p(x,z)p(\boldsymbol{x},\boldsymbol{z})

and the margin distribution of x\boldsymbol{x} is

p(x)=zp(x,z)p(\boldsymbol{x})=\sum_{\boldsymbol{z}}p(\boldsymbol{x},\boldsymbol{z})

For the mixture of Gaussians:

p(x)=k=1KπkN(xμk,Σk)p(\boldsymbol{x})=\sum_{k=1}^{K}\pi_k\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_k,\Sigma_k)

the latent variable z\boldsymbol{z} is designed:

p(zk)=πkp(z_k) = \pi_k

for k={1,2,,K}k=\{1,2,\cdots,K\}. And zk{0,1}z_k\in\{0,1\} is a 11-of-KK representation.

Then this mixture of Gaussians is a king of combining models. Each time, only one kk is selected(for z\boldsymbol{z} is 11-of-KK representation). An example of a mixture of Gaussians, and its original curve is like:

And the latent variables z\boldsymbol{z} separate the whole distribution into several Gaussian distributions:

This is the simplest model of combining models where each expert is a Gaussian model. And during the voting, only one model selected by z\boldsymbol{z} makes the final decision.

Distinction

A combining model method contains several models and predicts by voting or other rules. However, Bayesian model averaging can be used to generate a hypothesis from several candidates.

References


  1. Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006. ↩︎