How To Train A Random Forest With XGBoost

Are you looking to train a Random Forest using XGBoost for classification or regression tasks but aren’t sure where to start? In this tutorial, I will first briefly explain the mechanisms behind XGBoost and Random Forest and highlight their differences. Then, I’ll guide you through a step-by-step process of training an XGBoost Random Forest for both classification and regression tasks using a real-world dataset. By the end of this tutorial, you’ll be well-equipped to tackle your own projects with confidence and expertise....

September 29, 2023 · 6 min · Mario Filho

How To Use XGBoost For Learning To Rank In Python

So, you’ve heard about the power of XGBoost for Learning to Rank (LTR) tasks and want to harness it, right? You couldn’t have landed in a better place! XGBoost is a go-to tool for many LTR applications, from predicting click-through rates and powering search engines to enhancing recommender systems. I can vouch for its effectiveness, having used it to build models for ranking freelancers on Upwork. In this tutorial, we’ll unlock the potential of XGBoost for your LTR tasks....

September 5, 2023 · 9 min · Mario Filho

How To Handle Imbalanced Data In XGBoost Using scale_pos_weight In Python

In machine learning, we often come across datasets where the number of observations in one class significantly outweighs the other. This is known as imbalanced data. For instance, in a dataset of credit card transactions, the number of fraudulent transactions (positive class) is usually much smaller than the number of legitimate transactions (negative class). This is also an example of a binary classification task, which is a common type of machine learning problem....

August 30, 2023 · 7 min · Mario Filho

How To Use XGBoost For Multi-Output Regression In Python

Multi-output regression is a machine learning task where we need to predict multiple outputs from a single set of inputs. Imagine you’re a financial analyst at an investment firm. Your job is to predict the future performance of various stocks to guide investment decisions. For each stock, you want to predict several outputs such as the expected return, the volatility (risk), and the correlation with other stocks or market indices....

August 25, 2023 · 7 min · Mario Filho

How To Use XGBoost For Multiclass Classification In Python

Multiclass classification is a machine learning task where the output can belong to more than two classes. In other words, it can sort data into multiple categories. For example, a piece of fruit can be classified as an ‘apple’, ‘banana’, or ‘cherry’. Or, a car can be classified as ‘sedan’, ‘SUV’, or ’truck’. Just like binary classification, we can use a variety of algorithms to classify the data points into these multiple categories....

May 9, 2024 · 9 min · Mario Filho

How To Use XGBoost For Binary Classification In Python

Binary classification is a type of machine learning task where the output is a binary outcome, i.e., it belongs to one out of two classes. For example, an email can be classified as either ‘spam’ or ’not spam’, or a tumor can be ‘malignant’ or ‘benign’. When you have more than two classes, it’s called multiclass classification. We can use various algorithms to classify the data points. These algorithms include logistic regression, decision trees, random forest, support vector machines, and gradient boosting algorithms like XGBoost....

May 11, 2024 · 9 min · Mario Filho

How To Save and Load Your XGBoost Model in Python

You’ve spent countless hours researching, tweaking, and training the perfect XGBoost model. Your model is performing exceptionally well and you’re ready to celebrate. But wait, now you need to deploy it, and suddenly, you’re faced with a problem. How do you save your model for future use? Don’t worry, there’s a simple solution to this! In this article, I will walk you through how to save and load your XGBoost models....

August 18, 2023 · 6 min · Mario Filho

How To Use XGBoost For Regression In Python (Tutorial)

Are you struggling to get your regression models to perform well? Perhaps you’ve tried several algorithms, tuned your parameters, and even collected more data, but your model’s predictions are still off. You might be feeling frustrated and unsure of what to do next. Don’t worry, you’re not alone. Many machine learning practitioners face the same challenge. In this tutorial, I’m going to introduce you to XGBoost, a powerful machine learning algorithm that’s been winning competitions and helping companies make accurate predictions....

August 17, 2023 · 10 min · Mario Filho

How To Deal With Categorical Variables in XGBoost

Working with categorical data in machine learning can be a bit of a headache, especially when using algorithms like XGBoost. XGBoost, despite being a powerful and efficient gradient boosting library, is made to work with numeric data. This means that you need to find a way to transform categorical data into a format that XGBoost can understand. This can be a time-consuming and complex process, especially if you’re dealing with a large number of categorical variables or categories....

August 1, 2023 · 5 min · Mario Filho

How to Get Feature Importance in XGBoost in Python

You’ve chosen XGBoost as your algorithm, and now you’re wondering: “How do I figure out which features are the most important in my model?” That’s what ‘feature importance’ is all about. It’s a way of finding out which features in your data are doing the heavy lifting when it comes to your model’s predictions. Understanding which features are important can help you interpret your model better. Maybe you’ll find a feature you didn’t expect to be important....

July 18, 2023 · 6 min · Mario Filho

XGBoost Hyperparameter Tuning With Optuna (Kaggle Grandmaster Guide)

Trying to find the right hyperparameters for XGBoost can feel like searching for a needle in a haystack. Trust me, I’ve been there. XGBoost was a crucial model to win at least two of the Kaggle competitions I participated in. By the end of this tutorial, you’ll be equipped with the exact same techniques I used to optimize my models and achieve those top rankings. Let’s get started! Installing XGBoost And Optuna Installing XGBoost is easy, just run:...

April 10, 2023 · 8 min · Mario Filho

Multiple Time Series Forecasting With XGBoost In Python

Forecasting multiple time series can be a daunting task, especially when dealing with large amounts of data. However, XGBoost is a powerful gradient boosting algorithm that has been shown to perform exceptionally well in time series forecasting tasks. In combination with MLForecast, which is a scalable and easy-to-use time series forecasting library, we can make the process of training an XGBoost model for multiple time series forecasting a breeze. Let’s dive into the step-by-step process of preparing our data, defining our XGBoost model, and training it using MLForecast in Python....

February 28, 2023 · 14 min · Mario Filho

Does XGBoost Need Feature Scaling Or Normalization?

If you are using XGBoost with decision trees as your base model, you don’t need to worry about scaling or normalizing your features. Decision trees are not sensitive to the scale of the features. In practice, I have seen very minor differences in score by scaling[features for decision trees, but these are due to numerical computing implementations and not significant in practice. If you are using XGBoost with linear models as base models, it is a good idea to scale or normalize the features....

December 30, 2022 · 7 min · Mario Filho

Can Gradient Boosting Learn Simple Arithmetic?

During a technical meeting a few weeks ago, we had a discussion about feature interactions, and how far we have to go with them so that we can capture possible relationships with our targets. Should we create (and select) arithmetic interactions between our features? This is not something we usually learn in machine learning courses, but it’s a very important topic to better understand how our models work. A few years ago I remember visiting a website that showed how different models approximated these simple operations....

January 20, 2020 · 4 min · Mario Filho