Keywords: combining models
The mixture of Gaussians had been discussed in post ‘Mixtures of Gaussians’. It can not only be used to introduce ‘EM algorithm’ but contain a strategy to improve model performance. All models we have studied, beside neural networks, are all single-distribution models. This is like that, to solve a problem we invite an expert who is very good at the problem, and we just do what the expert said. However, if our problem is too hard that no expert can deal with it by himself, it is spontaneous to think about how about inviting more experts. This inspiration gives a new way to improve performance by combining multiple models but not just by improving the performance of a single model.
A naive idea is voting by several models equally, which means averaging all predication of all models. However, different models have different abilities, voting equally is not a good idea. Then boosting and other methods were introduced.
In some combining methods, such as AdaBoost(boosting), bootstrap, bagging, and e.t.c, the input data has an identical distribution with the training set. However, in some methods, the training set is cut into several subsets with different distribution with the original training set. The decision tree is such a method. A decision tree is a sequence of binary selection and it can be employed in both regression and classification tasks.
We will briefly discuss:
in the following posts.
Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006. ↩︎