What is Bagging and Boosting in Machine Learning?

7 min readMay 2, 2022

Similarities and differences between them.

In the world of machine learning, ensemble learning methods are the most popular topics to learn.

For the data scientist roles, in interviews the difference between bagging and boosting most frequently asked question.

So in this article, we are going to learn different kinds of ensemble methods. In particular, we are going to focus more on Bagging and boosting approaches.

Concepts i will be covering in this article.

What is Ensemble Learning?
Weak Learners Vs Strong Learners
What Is Bagging?
What is Boosting?
Bagging Vs Boosting Comparison
Conclusion

What is Ensemble Learning?

In machine learning instead of building only a single model to predict the target or future, how about considering multiple models to predict the target. This is the main idea behind ensemble learning.

In ensemble learning we will build multiple machine learning models using the train data, we will discuss how we are going to use the same train data to build various models in the next sections of this article.

What advantage will we get with ensemble learning?

This is the primary question that will arrive in our minds.

Let’s pass a second here to think about what advantage we will get if we build multiple models. with a single model approach, if the build model is having high bias or high variance we will be limited to that. Even though we are having methods to handle high bias or high variance. Still if the final is facing any of the bias or variance issues we can’t do anything.

Whereas, if we build multiple models we can reduce the high variance and high bias issue by averaging all models. If the individual models are having high bias, then when we build multiple models the high bias will average out. The same is true for high variance cases too.

For building multiple models we are going to use the same train data.

If we use the same train data, then all the build models will be also the same right?

But this is not the case.

We will learn how to build different models using the same train dataset. Each model will be unique to itself. We will split the available train data into multiple smaller datasets. But while creating these datasets we should follow some key properties. We will talk more about this in the bootstrapping section of this article itself.

For now just remember, to build multiple models we will split the available train data into smaller datasets. In the next steps, we will learn how to build models using smaller datasets. One model for one smaller dataset.

In short:

Ensemble learning means instead of building a single model for prediction. We will build multiple machine learning models, we call these models weak learners. A combination of all weak learners makes the strong learner, Which generalizes to predict all the target classes with a decent amount of accuracy.

Different Ensemble Methods

We are saying we will build multiple models, how these models will differ from one another. We have two possibilities.

All the models are built using the same machine learning algorithm
All the models are built using different machine learning algorithms

Based on the above-mentioned criteria the ensemble methods are of two types.

Homogeneous ensemble methods

a) — Bagging

b) — Boosting

2. Heterogeneous ensemble methods

a) — Stacking

Let’s understand these methods individually.

Homogeneous Ensemble Method

The first possibility of building multiple models is building the same machine learning model multiple times with the same available train data. Don’t worry even if we are using the same training data to build the same machine learning algorithm, still, all the models will be different. Will explain this in the next section.

These individual models are called weak learners.

Just keep in mind, in the homogeneous ensemble methods all the individual models are built using the same machine learning algorithm.

For example, if the individual model is a decision tree then one good example for the ensemble method is random forest.

Both bagging and boosting belong to the homogeneous ensemble method.

Heterogeneous Ensemble Method

The second possibility for building multiple models is building different machine learning models. Each model will be different but uses the same training data.

Here also the individual models are called weak learners. The stacking method will fall under the heterogeneous ensemble method.

For now, let’s focus only on homogeneous methods.

Weak Learners Vs Strong Learners

In both, homogeneous and heterogeneous ensemble methods we said the individual models are called weak learns, in the homogeneous ensemble method these weak learns are built using the same machine learning algorithms, Whereas in the heterogeneous ensemble methods these weak learns are built using different machine learning algorithms.

So what do these weak learners do? Why are they so important for understanding any ensemble methods?

Weak learning is the same as any machine learning model, unlike the strong machine learning models they won’t try to generalize for all the possible target cases. The weak learners only try to predict a combination of target cases or a single target accurately.

Confusing right?

Let’s understand this with an example. before that, we need to understand bootstrapping. Once we learn about bootstrapping, then we will take an example to understand weak learning and strong learning methodology in more detail.

Bootstrapping

For each model, we need to take a sample of data, but we need to be very careful while creating these samples of data. because if we randomly take the data, in a single sample we will end up with only one target class or the target class distribution won’t be the same. this will affect model performance.

To overcome this we need a smart way to create these samples, known as bootstrapping samples.

Bootstrapping is a statistical method to create sample data without leaving the properties of the actual dataset. the individual samples of data are called bootstrap samples.

Each sample is an approximation for the actual data. these individual samples have to capture the underlying complexity of the actual data. all data points in the samples are randomly taken with replacement.

Weak Learners

https://www.sciencedirect.com/topics/engineering/adaboost

Let’s understand weak learning with the help of the above example.

Week learns are the individual models to predict the target outcome. But these models are not the optimal models. In other words, we can say they are not generalized to predict accurately for all the target classes and for all the expected cases.

They will focus on predicting accurately only for a few cases.

Combining all the weak learners makes a strong model which generalized and optimized well enough to accurately predict all the target classes.

So how do these strong learners work?

Strong Learning

As i said a combination of all the weak learners builds a strong model. How do these individuals build trains at once, and how do they perform the predictions?

Based on the way the individual models (weak learners) training phase the bagging and boosting methods will vary.

What Is Bagging?

In the bagging method, all the individual models are built parallel, each individual model is different from one others. In this method, all the observations in the bootstrapping sample will be treated equally. In other words, all the observations will have equal at zero weightage. Because of this bagging method also called bootstrap aggregating.

As a first step using the bootstrapping method, we will split the dataset into N number of samples. Then we will select the algorithm we want to try.

Suppose if we selected a decision tree, then each bootstrap sample will be used for building one random forest model. Don’t forget all the decision trees are built in parallel.

Once the training phase is completed, to predict the target outcome, we will pass the observations to all the N decision trees. Each decision tree will predict one target outcome. The final prediction target will be selected based on the majority voting.

Suppose we build 10 decision tree models. The target is a binary target. Let’s say the target class could be 1 or 0. Then each decision tree will predict 1 or 0. Out of 10 decision trees, 8 trees are predicted as 1, and 2 trees are predicted as 0 by majority voting means the final predicted class will be 1.

Let’s say in the above image out of 10 models 8 models are predicted one target class and the other 2 models predicted the other target class. So the final predicted target will be the 8 models target, this is known as majority voting.

The bagging methods can be used for both classification and regression problems. If we are using the bagging method of classification method, we use the majority voting approach for the final prediction. Whereas for the regression kind of problems, we take the average of all the values predicted by individual models.

Pros —

Bagging helps in reducing overfitting. As we are averaging all the model's outputs using the majority voting approach.

Cons —

For regression models, the predicted value won’t be optimized. if any one of the models is deviating more the output value will be the average of all the models.

The random forest algorithm falls under the Bagging method.