Bayesian Model Averaging(BMA) and Combining Models

Keywords: Bayesian model averaging, BMA and combining models

Bayesian Model Averaging(BMA)[1]

Bayesian model averaging(BMA) is another wildly used method which is very like a combining model. However, the difference between BMA and combining models is significant.

A Bayesian model averaging is a Bayesian formula in which the random variable are models(hypothesises) h=1,2,,Hh=1,2,\cdots,H with prior probability p(h)p(h), then the marginal distribution over data XX is:


And the MBA is used to select a model(hypothesis) that can model the data best through Bayesian theory. When we have a larger size of XX, the posterior probability


become sharper. Then we got a good hypothesis.

Mixture of Gaussian(Combining Models)

In post ‘Mixtures of Gaussians’, we have seen how a mixture of Gaussians works. Then joint distribution of input data x\boldsymbol{x} and latent varible z\boldsymbol{z} is:


and the margin distribution of x\boldsymbol{x} is


For the mixture of Gaussians:


the latent variable z\boldsymbol{z} is designed:

p(zk)=πkp(z_k) = \pi_k

for k={1,2,,K}k=\{1,2,\cdots,K\}. And zk{0,1}z_k\in\{0,1\} is a 11-of-KK representation.

Then this mixture of Gaussians is a king of combining models. Each time, only one kk is selected(for z\boldsymbol{z} is 11-of-KK representation). An example of a mixture of Gaussians, and its original curve is like:

And the latent variables z\boldsymbol{z} separate the whole distribution into several Gaussian distributions:

This is the simplest model of combining models where each expert is a Gaussian model. And during the voting, only one model selected by z\boldsymbol{z} makes the final decision.


A combining model method contains several models and predicts by voting or other rules. However, Bayesian model averaging can be used to generate a hypothesis from several candidates.


  1. Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006. ↩︎