Ensemble Learning

ali@fuzzywireless.com
Mar 3, 2022
2 min read

Machine learning results can be improved by combining several models thus called as ensemble learning (Smalyakov, 2017). This approach improves the accuracy of prediction compared to a single model, which is why ensemble method has won several prestigious machine learning competitions like Netflix competition, KDD 2009, and Kaggle. Improvement in prediction accuracy from ensemble methods comes from the reduction of variance, also referred as bagging and bias, also referred as boosting. Bagging reduce the variance by averaging multiple estimates in case of regression and voting in case of classification (2017). Boosting builds ensemble by training models in sequential fashion, while adjusting weights of instances with error in last prediction (Deir, n.d.). Stacking is the technique of combining multiple classification or regression models using meta-classifier or meta-regressor which are trained on output of base learner (Smolyakov, 2017).

Ensemble algorithms are broken into two groups namely sequential and parallel. In sequential ensemble method, the base learners are generated sequentially by exploiting the dependence between base learners; for instance, AdaBoost (adaptive boosting). In parallel method, the base learners are generated in parallel by exploiting independence between base learners; for example, Random forest. If a single type of base learning algorithm is used, then ensemble is referred as homogenous. If multiple types of base learners are used, then ensemble is referred as heterogenous (2017). However, ensemble methods are computationally expensive and not easily comprehensible (Ghani, 2000). Random forest is a supervised machine learning algorithm, based on generation of random forests (decision trees) and combining their results to improve overall prediction results (Donges, 2018). Unlike decision trees, random forests are not affected by overfitting due to random creation of subsets of features (2018).

References:

Donges, N. (2018). Retrieved from https://towardsdatascience.com/the-random-forest-algorithm-d457d499ffcd

Ghani, R. (2000). Ensemble classification methods. Retrieved from https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwi9vLH1oe_hAhUHDKwKHZRoC0IQFjAAegQIAxAC&url=http%3A%2F%2Fwww.cs.cmu.edu%2F~rayid%2Ftalks%2FEnsemble%2520Classification%2520Methods.ppt&usg=AOvVaw2XzKcsX0JMOux_nxLw5Z9O

Demir, N. (n.d.) Ensemble methods: elegant techniques to produce improved machine learning results. Retrieved from https://www.toptal.com/machine-learning/ensemble-methods-machine-learning

Smolyakov, V. (2017). Ensemble learning to improve machine learning results. Retrieved from https://blog.statsbot.co/ensemble-learning-d1dcd548e936

Ensemble Learning

Recent Posts

Comments