Do Neural Networks Need Feature Scaling Or Normalization?

In short, feature scaling or normalization is not strictly required for neural networks, but it is highly recommended. Scaling or normalizing the input features can be the difference between a neural network that converges in a few iterations and one that takes hundreds of iterations to converge or even fails to converge at all. The optimization process may become slower because the gradients in the direction of the larger-scale features will be significantly larger than the gradients in the direction of the smaller-scale features....

April 4, 2023 · 8 min · Mario Filho

Does SVM Need Feature Scaling Or Normalization?

In Support Vector Machines (SVM), feature scaling or normalization are not strictly required, but are highly recommended, as it can significantly improve model performance and convergence speed. SVM tries to find the optimal hyperplane that separates the data points of different classes with the maximum margin. If the features are on different scales, the hyperplane will be heavily influenced by the features with larger values, potentially leading to suboptimal results....

April 1, 2023 · 5 min · Mario Filho

How To Get Feature Importance In Logistic Regression

Are you looking to make sense of your logistic regression model and determine which features are truly important in predicting your target variable? It can be quite frustrating trying to understand which features are driving your model’s predictions, especially when you have a large number of them. Not to mention, the presence of correlated features can make the task even more challenging. In this tutorial, I’ll walk you through different methods for assessing feature importance in both binary and multiclass logistic regression models....

May 9, 2024 · 10 min · Mario Filho

How To Get Feature Importance in Random Forests

Interpreting and identifying crucial features in machine learning models can be a tough nut to crack, especially when dealing with black-box models. In this tutorial, we will dive deep into understanding global and local feature importance in Random Forests. We will explore several techniques and tools to analyze and interpret these importances, making our models more transparent and reliable. Let’s use the Red Wine Quality dataset from the UCI Machine Learning Repository to learn the techniques....

March 30, 2023 · 12 min · Mario Filho

Does Linear Regression Require Feature Scaling?

In linear regression, feature scaling is not strictly required but can be beneficial in certain situations. When using gradient descent-based optimization algorithms, feature scaling can help speed up convergence and improve model performance. However, when employing a closed-form solution like the normal equation, feature scaling is not necessary, as the algorithm naturally handles features with different scales In this tutorial, we will explore the impact of feature scaling on linear regressions’s performance using the Red Wine dataset as an example....

March 28, 2023 · 5 min · Mario Filho

Does Logistic Regression Require Feature Scaling?

To put it simply, feature scaling is not required for logistic regression, but it can be beneficial in a number of scenarios. It helps improve the convergence of gradient-based optimization algorithms and ensures that regularization techniques, like L1 and L2, are applied uniformly across all features. Another advantage to scaling is that it can help with the interpretation of the model coefficients. In this tutorial, we will explore the impact of feature scaling on logistic regression’s performance using the Red Wine dataset as an example....

March 27, 2023 · 6 min · Mario Filho

Is Feature Scaling Required for the KNN Algorithm?

To put it simply, yes, feature scaling is crucial for the KNN algorithm, as it helps in preventing features with larger magnitudes from dominating the distance calculations. Feature scaling is an essential step in the data preprocessing pipeline, especially for distance-based algorithms like the KNN. In this tutorial, we will explore the impact of feature scaling on the algorithm’s performance using the Red Wine dataset as an example. Why Feature Scaling Is Important For KNN Distance-based algorithms, such as the KNN, calculate the distance between data points to determine their similarity....

March 25, 2023 · 5 min · Mario Filho

Do Decision Trees Need Feature Scaling Or Normalization?

In general, no. Decision trees are not sensitive to feature scaling because their splits don’t change with any monotonic transformation. Normalization is not necessary either, but it can change your results because it’s not monotonic, as we’ll see later. That said, the numerical implementation of a specific library may make your decision tree predictions change if you don’t scale or normalize your data. This is usually a very small change, that you don’t need to worry about, but it’s good to know if you find yourself in a situation where you need to explain why your predictions are different....

March 24, 2023 · 5 min · Mario Filho

Time Series Clustering In Python With Scikit-Learn

Clustering is an unsupervised learning technique that can help you uncover hidden patterns in your time series data. Scikit-learn has a wide range of clustering algorithms, including K-means, DBSCAN, and Agglomerative Clustering. In this tutorial, we’ll explore how to use K-means with different transformations to cluster time series data. Using the right data transformations can help you get your desired results faster than just trying different clustering algorithms over the same data....

March 17, 2023 · 11 min · Mario Filho

8 Ways To Calculate Correlation Between Two Time Series In Python

Analyzing correlations is a critical step in understanding complex data relationships. It’s a fast way to find how similar two time series are. Python offers a wide range of libraries that make calculating correlations between two time series a breeze. In this tutorial, we’ll explore some of the most popular libraries for correlation analysis, including NumPy, Pandas, Scipy, Polars, CuPy, CuDF, PyTorch, and Dask. Let’s get started! Correlation Between Two Time Series Using NumPy NumPy is the most popular Python library for numerical computing....

March 15, 2023 · 5 min · Mario Filho

Hierarchical Time Series Forecasting with Python

Hierarchical forecasting is a method of forecasting time series data where the data is divided into multiple levels of aggregation. The levels can be thought of as a tree-like structure, where each level represents a different aggregation of the data. For example, the top level might represent total sales for a company, while the next level down might represent sales for each region, and the level below that might represent sales for each store within each region....

March 13, 2023 · 11 min · Mario Filho

Deseasonalizing Time Series Data With Python

Time series data can be a valuable tool for predicting trends and making informed business decisions. However, it can be difficult to analyze due to seasonal patterns and other fluctuations that can obscure underlying trends. That’s where deseasonalizing comes in, allowing you to isolate trends and make more accurate predictions. In this tutorial, we’ll explore two different approaches to deseasonalize time series data in Python: additive models and multiplicative models....

March 10, 2023 · 8 min · Mario Filho

Detrending Time Series Data With Python

In this tutorial, we will explore various detrending models using two popular Python libraries - statsmodels and scipy. While there are several detrending methods, we will focus on four models: We will start with a constant model from the scipy library, which assumes that the trend of the time series is a straight horizontal line. Then we move to a model that captures a linear trend in the data. After that, we will explore a quadratic model, using the statsmodels library....

March 10, 2023 · 6 min · Mario Filho

How To Measure Time Series Similarity in Python

In this tutorial, we’ll explore some practical techniques to measure the similarity between time series data in Python using the most popular distance measures. To make sure that the results are not affected by noise or irrelevant factors, we’ll apply techniques such as scaling, detrending, and smoothing. Once the data is preprocessed, we can use simple distance measures like Pearson correlation and Euclidean distance to measure the similarity of two aligned time series....

March 8, 2023 · 8 min · Mario Filho

Multi-Step Time Series Forecasting In Python

In this tutorial, I will explain two (and a half) methods to generate multi-step forecasts using time series data. They are the recursive or autoregressive method, the direct method, and a variant of the direct method with a single model. Preparing the Data We will use a dataset containing information about yearly rice production. import pandas as pd import numpy as np import os data = pd.read_csv(os.path.join(path, 'rice production across different countries from 1961 to 2021....

March 9, 2023 · 8 min · Mario Filho

Differencing Time Series In Python With Pandas, Numpy, and Polars

When working with time series data, differencing is a common technique used to make the data stationary. Stationary data is important because it allows us to apply statistical models that assume constant parameters (like the mean and standard deviation) over time, and this can improve the accuracy of our predictions. Let’s see how we can easily perform differencing in Python using Pandas, Numpy, and Polars. First-order Differencing First-order differencing involves subtracting each value in the time series from its previous value....

March 4, 2023 · 8 min · Mario Filho

Gaussian Process For Time Series Forecasting In Python

In this article, we will explore the use of Gaussian Processes for time series forecasting in Python, specifically using the GluonTS library. GluonTS is an open-source toolkit for building and evaluating state-of-the-art time series models. One of the key benefits of using Gaussian Processes for time series forecasting is that they can provide probabilistic predictions. Instead of just predicting a point estimate for the next value in the time series, GPs can provide a distribution over possible values, allowing us to quantify our uncertainty....

March 3, 2023 · 11 min · Mario Filho

Kalman Filter for Time Series Forecasting in Python

The Kalman Filter is a state-space model that estimates the state of a dynamic system based on a series of noisy observations. It uses a feedback mechanism called the Kalman gain to adjust the weight given to predicted and observed values based on their relative uncertainties. It has been widely used in various fields such as finance, aerospace, and robotics. In this tutorial, you will learn how to easily use the Kalman Filter for time series forecasting in Python....

April 27, 2023 · 13 min · Mario Filho

Multiple Time Series Forecasting With LightGBM In Python

Today, we’re going to explore multiple time series forecasting with LightGBM in Python. If you’re not already familiar, LightGBM is a powerful open-source gradient boosting framework that’s designed for efficiency and high performance. It’s a great tool for tackling large datasets and can help you create accurate predictions in a flash. When combined with the MLForecast library, it becomes a versatile and scalable solution for multiple time series forecasting. Let’s dive into the step-by-step process of preparing our data, defining our LightGBM model, and training it using MLForecast in Python....

February 28, 2023 · 10 min · Mario Filho

Multiple Time Series Forecasting With XGBoost In Python

Forecasting multiple time series can be a daunting task, especially when dealing with large amounts of data. However, XGBoost is a powerful gradient boosting algorithm that has been shown to perform exceptionally well in time series forecasting tasks. In combination with MLForecast, which is a scalable and easy-to-use time series forecasting library, we can make the process of training an XGBoost model for multiple time series forecasting a breeze. Let’s dive into the step-by-step process of preparing our data, defining our XGBoost model, and training it using MLForecast in Python....

February 28, 2023 · 14 min · Mario Filho