# Bayesian Model Averaging(BMA) and Combining Models

**Keywords:** Bayesian model averaging, BMA and combining models

## Bayesian Model Averaging(BMA)^{[1]}

Bayesian model averaging(BMA) is another wildly used method which is very like a combining model. However, the difference between BMA and combining models is significant.

A Bayesian model averaging is a Bayesian formula in which the random variable are models(hypothesises) $h=1,2,\cdots,H$ with prior probability $p(h)$, then the marginal distribution over data $X$ is:

$p(X)=\sum_{h=1}^{H}p(X|h)p(h)$

And the MBA is used to select a model(hypothesis) that can model the data best through Bayesian theory. When we have a larger size of $X$, the posterior probability

$p(h|X)=\frac{p(X|h)p(h)}{\sum_{i=1}^{H}p(X|i)p(i)}$

become sharper. Then we got a good hypothesis.

## Mixture of Gaussian(Combining Models)

In post ‘Mixtures of Gaussians’, we have seen how a mixture of Gaussians works. Then joint distribution of input data $\boldsymbol{x}$ and latent varible $\boldsymbol{z}$ is:

$p(\boldsymbol{x},\boldsymbol{z})$

and the margin distribution of $\boldsymbol{x}$ is

$p(\boldsymbol{x})=\sum_{\boldsymbol{z}}p(\boldsymbol{x},\boldsymbol{z})$

For the mixture of Gaussians:

$p(\boldsymbol{x})=\sum_{k=1}^{K}\pi_k\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_k,\Sigma_k)$

the latent variable $\boldsymbol{z}$ is designed:

$p(z_k) = \pi_k$

for $k=\{1,2,\cdots,K\}$. And $z_k\in\{0,1\}$ is a $1$-of-$K$ representation.

Then this mixture of Gaussians is a king of combining models. Each time, only one $k$ is selected(for $\boldsymbol{z}$ is $1$-of-$K$ representation). An example of a mixture of Gaussians, and its original curve is like:

And the latent variables $\boldsymbol{z}$ separate the whole distribution into several Gaussian distributions:

This is the simplest model of combining models where each expert is a Gaussian model. And during the voting, only one model selected by $\boldsymbol{z}$ makes the final decision.

## Distinction

A combining model method contains several models and predicts by voting or other rules. However, Bayesian model averaging can be used to generate a hypothesis from several candidates.

## References

Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006. ↩︎