Trying to find the right hyperparameters for XGBoost can feel like searching for a needle in a haystack.
Trust me, I’ve been there. XGBoost was a crucial model to win at least two of the Kaggle competitions I participated in.
By the end of this tutorial, you’ll be equipped with the exact same techniques I used to optimize my models and achieve those top rankings.
Let’s get started!
Installing XGBoost And Optuna
Installing XGBoost is easy, just run:
pip install xgboost
Or, if you are using Anaconda, run:
conda install -c conda-forge py-xgboost
Installing Optuna is also easy, just run:
pip install optuna
Optuna uses a smart technique called Bayesian optimization to find the best hyperparameters for your model.
Bayesian optimization is like a treasure hunter using an advanced metal detector to find hidden gold, instead of just digging random holes (random search) or going through the entire area with a shovel (grid search).
The best part? You only need a few lines of code to make it work!
Using Bayesian optimization is a no-brainer.
It’s usually just as good, if not better, than random and grid search when you have more than 2 or 3 hyperparameters to adjust.
I will use the Red Wine Quality dataset from UCI.
It’s a real dataset with chemical properties of red wines where the goal is to predict the quality of the wine.
import pandas as pd
from sklearn.model_selection import train_test_split
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
data = pd.read_csv(url, delimiter=";")
X = data.drop("quality", axis=1)
y = data["quality"]
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | quality |
---|---|---|---|---|---|---|---|---|---|---|---|
7.4 | 0.7 | 0 | 1.9 | 0.076 | 11 | 34 | 0.9978 | 3.51 | 0.56 | 9.4 | 5 |
7.8 | 0.88 | 0 | 2.6 | 0.098 | 25 | 67 | 0.9968 | 3.2 | 0.68 | 9.8 | 5 |
7.8 | 0.76 | 0.04 | 2.3 | 0.092 | 15 | 54 | 0.997 | 3.26 | 0.65 | 9.8 | 5 |
11.2 | 0.28 | 0.56 | 1.9 | 0.075 | 17 | 60 | 0.998 | 3.16 | 0.58 | 9.8 | 6 |
7.4 | 0.7 | 0 | 1.9 | 0.076 | 11 | 34 | 0.9978 | 3.51 | 0.56 | 9.4 | 5 |
It can be treated as a regression or a classification problem, but I will treat it as a regression.
The tuning process for XGBoost is the same for both cases.
Which XGBoost Hyperparameters Should I Tune?
There are only 6 hyperparameters you really need to focus on when tuning XGBoost.
First up is the number of trees you’ll be training, also known as n_estimators
.
The more trees you have, the more reliable your predictions will be.
So, how many trees should you pick?
Well, it depends on your model’s purpose.
If your model needs to deliver results quickly (e.g.: high-frequency trading, ad click prediction), you might want to limit the number of trees to around 200.
However, if your model runs once a week (e.g.: sales forecasting) and has more time to make the predictions, you could consider using up to 5,000 trees.
As a general guideline, start by fixing the number of trees and then concentrate on tuning the learning_rate
.
This regulates how much each tree contributes to the final prediction.
The more trees you have, the smaller the learning rate should be.
The recommended range for the learning rate is between 0.001 and 0.1.
Next, let’s talk about max_depth
.
This decides the complexity of each tree in your model. It refers to the maximum depth that a tree can grow to.
A deeper tree means more decision paths and potentially capturing more complex patterns in the data.
However, increasing the depth may also cause the model to overfit the training data, as it will make the tree more complex.
I like to tune it in the range of 1 to 10.
The subsample
hyperparameter plays a role in controlling the amount of data used for building each tree in your model.
It is a fraction that ranges from 0 to 1, representing the proportion of the dataset to be randomly selected for training each tree.
By using only a portion of the data for each tree, the model can benefit from diversity and reduce the correlation between the trees, which may help combat overfitting.
The range I use for subsample
is between 0.05 and 1.
The colsample_bytree
hyperparameter is another hyperparameter to consider when tuning your XGBoost model.
It determines the proportion of features to be considered for each tree.
This value ranges from 0 to 1, where a value of 1 means that all features will be considered for every tree, and a lower value indicates that only a subset of features will be randomly chosen before building each tree.
This method is also known as Random Subspace.
The subsample
and colsample_bytree
hyperparameters in XGBoost make it somewhat similar to Random Forests, as the latter samples rows (with replacement) and columns for each tree.
Despite these hyperparameters making XGBoost and Random Forests seem similar, they are different algorithms.
XGBoost is a gradient boosting method, while Random Forests is a bagging method, which means they learn from the data in different ways.
Lastly, the min_child_weight
hyperparameter sets the minimum sum of instance weights that must be present in a child node in each tree.
In regression it just means the number of observations that must be present in each node.
This parameter helps control the complexity of the model and prevents overfitting.
This doesn’t mean that you should set min_child_weight
to a high value, as it will make your model less flexible and more prone to underfitting.
I like to keep it in the range of 1 to 20.
Code For Tuning XGBoost Hyperparameters With Optuna
Now that you’re familiar with the essential hyperparameters for tuning, let’s explore the code to optimize them using Optuna.
First, we’ll define the objective function, which Optuna will aim to optimize.
import xgboost as xgb
from sklearn.metrics import mean_squared_error
import optuna
def objective(trial):
params = {
"objective": "reg:squarederror",
"n_estimators": 1000,
"verbosity": 0,
"learning_rate": trial.suggest_float("learning_rate", 1e-3, 0.1, log=True),
"max_depth": trial.suggest_int("max_depth", 1, 10),
"subsample": trial.suggest_float("subsample", 0.05, 1.0),
"colsample_bytree": trial.suggest_float("colsample_bytree", 0.05, 1.0),
"min_child_weight": trial.suggest_int("min_child_weight", 1, 20),
}
model = xgb.XGBRegressor(**params)
model.fit(X_train, y_train, verbose=False)
predictions = model.predict(X_val)
rmse = mean_squared_error(y_val, predictions, squared=False)
return rmse
Our primary objective is to minimize the root mean squared error (RMSE) on the validation set.
The params
dictionary within the objective function holds the XGBoost hyperparameters to be fine-tuned by Optuna.
The trial.suggest_*
methods define the search space for each hyperparameter according to the values I suggested above.
For example, learning_rate
is searched within a logarithmic scale from 1e-3 to 0.1, and max_depth
is searched within an integer range from 1 to 10.
The log scale is applied to the learning rate to test more values closer to 0.001, as smaller learning rates paired with a high number of trees generally yield more stable models.
Once the model is trained, it generates predictions on the validation set and calculates the RMSE.
To execute the optimization, we create a study object and pass the objective function to the optimize method.
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=30)
The direction
parameter specifies whether we want to minimize or maximize the objective function.
Since a lower RMSE indicates a better model, our aim is to minimize it.
If you’re tuning a classification model with accuracy or AUROC, set the direction
to maximize.
The n_trials
parameter defines the number of times the model will be trained with different hyperparameter values.
In practice, about 30 trials are usually sufficient to find a solid set of hyperparameters.
During the optimization, Optuna will display the best hyperparameters discovered thus far, along with the RMSE score.
[I 2023-04-09 10:37:16,565] A new study created in memory with name: no-name-8560b24e-3a91-48de-8449-c32977dd4c2b
[I 2023-04-09 10:37:49,060] Trial 0 finished with value: 0.5532276610157627 and parameters: {'learning_rate': 0.010958651631216162, 'max_depth': 7, 'subsample': 0.5354179500853077, 'colsample_bytree': 0.7165765694026183, 'min_child_weight': 12}. Best is trial 0 with value: 0.5532276610157627.
[I 2023-04-09 10:37:53,790] Trial 1 finished with value: 0.6019606330122452 and parameters: {'learning_rate': 0.017141615266207846, 'max_depth': 2, 'subsample': 0.5707056227081352, 'colsample_bytree': 0.31105599394671785, 'min_child_weight': 6}. Best is trial 0 with value: 0.5532276610157627.
[I 2023-04-09 10:37:58,257] Trial 2 finished with value: 0.5745420558557753 and parameters: {'learning_rate': 0.05475472895519491, 'max_depth': 3, 'subsample': 0.8723964423668571, 'colsample_bytree': 0.5917021002963235, 'min_child_weight': 18}. Best is trial 0 with value: 0.5532276610157627.
Once the optimization is complete, we can display the best hyperparameters and the RMSE score.
print('Best hyperparameters:', study.best_params)
print('Best RMSE:', study.best_value)
Best hyperparameters: {'learning_rate': 0.01742253012219986, 'max_depth': 9, 'subsample': 0.6685330381926933, 'colsample_bytree': 0.669833311520857, 'min_child_weight': 14}
Best RMSE: 0.5404657574376048
Now you can use these values in your XGBoost model moving forward!
An additional tip: if most of the best trials utilize a specific hyperparameter near the minimum or maximum value, consider expanding the search space for that hyperparameter.
For example, if most of the best trials use learning_rate
close to 0.001, you should probably restart the optimization with the search space trial.suggest_float("learning_rate", 1e-4, 1e-2, log=True)
.
Using this method, you’ll obtain an excellent set of hyperparameters, allowing you to concentrate on other tasks with a greater impact on model performance, such as feature engineering.
This approach has consistently proven successful in my Kaggle competitions and daily work!