How To Use Optuna to Tune LightGBM Hyperparameters

As a Kaggle Grandmaster, I absolutely love working with LightGBM, a fantastic machine learning library that’s become one of my go-to tools. I always focus on tuning the model’s hyperparameters before diving into feature engineering. Think of it like cooking up the perfect dish. You want to make sure you’ve got the right ingredients and their quantities before you start experimenting with new flavors. By fine-tuning your hyperparameters first, you’ll squeeze every last drop of performance from your model in the data you already have....

April 7, 2023 · 9 min · Mario Filho

Time Series Anomaly Detection in Python

Discovering outliers, unusual patterns or events in your time series data has never been easier! In this tutorial, I’ll walk you through a step-by-step guide on how to detect anomalies in time series data using Python. You won’t have to worry about missing sudden changes in your data or trying to keep up with patterns that change over time. I’ll use website impressions data from Google Search Console as an example, but the techniques I cover will work for any time series data....

September 28, 2023 · 10 min · Mario Filho

Do Neural Networks Need Feature Scaling Or Normalization?

In short, feature scaling or normalization is not strictly required for neural networks, but it is highly recommended. Scaling or normalizing the input features can be the difference between a neural network that converges in a few iterations and one that takes hundreds of iterations to converge or even fails to converge at all. The optimization process may become slower because the gradients in the direction of the larger-scale features will be significantly larger than the gradients in the direction of the smaller-scale features....

April 4, 2023 · 8 min · Mario Filho

Does SVM Need Feature Scaling Or Normalization?

In Support Vector Machines (SVM), feature scaling or normalization are not strictly required, but are highly recommended, as it can significantly improve model performance and convergence speed. SVM tries to find the optimal hyperplane that separates the data points of different classes with the maximum margin. If the features are on different scales, the hyperplane will be heavily influenced by the features with larger values, potentially leading to suboptimal results....

April 1, 2023 · 5 min · Mario Filho

How To Get Feature Importance In Logistic Regression

Are you looking to make sense of your logistic regression model and determine which features are truly important in predicting your target variable? It can be quite frustrating trying to understand which features are driving your model’s predictions, especially when you have a large number of them. Not to mention, the presence of correlated features can make the task even more challenging. In this tutorial, I’ll walk you through different methods for assessing feature importance in both binary and multiclass logistic regression models....

May 9, 2024 · 10 min · Mario Filho

How To Get Feature Importance in Random Forests

Interpreting and identifying crucial features in machine learning models can be a tough nut to crack, especially when dealing with black-box models. In this tutorial, we will dive deep into understanding global and local feature importance in Random Forests. We will explore several techniques and tools to analyze and interpret these importances, making our models more transparent and reliable. Let’s use the Red Wine Quality dataset from the UCI Machine Learning Repository to learn the techniques....

March 30, 2023 · 12 min · Mario Filho

Does Linear Regression Require Feature Scaling?

In linear regression, feature scaling is not strictly required but can be beneficial in certain situations. When using gradient descent-based optimization algorithms, feature scaling can help speed up convergence and improve model performance. However, when employing a closed-form solution like the normal equation, feature scaling is not necessary, as the algorithm naturally handles features with different scales In this tutorial, we will explore the impact of feature scaling on linear regressions’s performance using the Red Wine dataset as an example....

March 28, 2023 · 5 min · Mario Filho

Does Logistic Regression Require Feature Scaling?

To put it simply, feature scaling is not required for logistic regression, but it can be beneficial in a number of scenarios. It helps improve the convergence of gradient-based optimization algorithms and ensures that regularization techniques, like L1 and L2, are applied uniformly across all features. Another advantage to scaling is that it can help with the interpretation of the model coefficients. In this tutorial, we will explore the impact of feature scaling on logistic regression’s performance using the Red Wine dataset as an example....

March 27, 2023 · 6 min · Mario Filho

Is Feature Scaling Required for the KNN Algorithm?

To put it simply, yes, feature scaling is crucial for the KNN algorithm, as it helps in preventing features with larger magnitudes from dominating the distance calculations. Feature scaling is an essential step in the data preprocessing pipeline, especially for distance-based algorithms like the KNN. In this tutorial, we will explore the impact of feature scaling on the algorithm’s performance using the Red Wine dataset as an example. Why Feature Scaling Is Important For KNN Distance-based algorithms, such as the KNN, calculate the distance between data points to determine their similarity....

March 25, 2023 · 5 min · Mario Filho

Do Decision Trees Need Feature Scaling Or Normalization?

In general, no. Decision trees are not sensitive to feature scaling because their splits don’t change with any monotonic transformation. Normalization is not necessary either, but it can change your results because it’s not monotonic, as we’ll see later. That said, the numerical implementation of a specific library may make your decision tree predictions change if you don’t scale or normalize your data. This is usually a very small change, that you don’t need to worry about, but it’s good to know if you find yourself in a situation where you need to explain why your predictions are different....

March 24, 2023 · 5 min · Mario Filho