# Committees

**Keywords:** committees

## Analysis of Committees^{[1]}

The committee is a native inspiration for how to combine several models(or we can say how to combine the outputs of several models). For example, we can combine all the models by:

$y_{COM}(X)=\frac{1}{M}\sum_{m=1}^My_m(X)\tag{1}$

However, we want to analyze whether this average prediction of models is good than a single one of them.

To compare the committee and a single model, we should first build a criterion depending on which we can distinguish which model is better. Assuming that the true generator of the training data $x$ is:

$h(x)\tag{2}$

So our prediction of the $m$th model for $m=1,2\cdots,M$ can be represented as:

$y_m(x) = h(x) +\epsilon_m(x)\tag{3}$

then the average sum-of-square of error can be a nice criterion. And the criterion of sigle model is:

$\mathbb{E}_x[(y_m(x)-h(x))^2] = \mathbb{E}_x[\epsilon_m(x)^2] \tag{4}$

where the $\mathbb{E}[\cdot]$ is the frequentist expectation. To make the creterion more concrete, we consider the average of the error over $M$ models:

$E_{AV} = \frac{1}{M}\sum_{m=1}^M\mathbb{E}_x[\epsilon_m(x)^2]\tag{5}$

And on the other hand, the committees have the error given by equation(1), (3) and (4):

$\begin{aligned} E_{COM}&=\mathbb{E}_x[(\frac{1}{M}\sum_{m=1}^My_m(x)-h(x))^2] \\ &=\mathbb{E}_x[\{\frac{1}{M}\sum_{m=1}^M\epsilon_m(x)\}^2] \end{aligned} \tag{6}$

Now we assume that the random variables $\epsilon_i(x)$ for $i=1,2,\cdots,M$ have **mean 0** and **uncorrelated**, so that:

$\begin{aligned} \mathbb{E}_x[\epsilon_m(x)]&=0 &\\ \mathbb{E}_x[\epsilon_m(x)\epsilon_l(x)]&=0,&m\neq l \end{aligned} \tag{7}$

Then subsitute equ (7) into equ (6), we can get:

$E_{COM}=\frac{1}{M^2}\mathbb{E}_x[\epsilon_m(x)]\tag{8}$

According the equation (5) and (8):

$E_{AV}=\frac{1}{M}E_{COM}\tag{9}$

**All the mathematics above is based on the assumption that the error of each model is uncorrelated**. However, most time they are highly correlated and the reduction of error is generally small. But the relation:

$E_{COM}\leq E_{AV}\tag{10}$

exists for sure. Then the boosting method was established to make the combining model more powerful.

## References

Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006. ↩︎