machine learning

Multivariate Time Series Forecasting in Python

In this article, we’ll explore how to use scikit-learn with mlforecast to train multivariate time series models in Python. Instead of wasting time and making mistakes in manual data preparation, let’s use the mlforecast library. It has tools that transform our raw time series data into the correct format for training and prediction with scikit-learn. It computes the main features we want when modeling time series, such as aggregations over sliding windows, lags, differences, etc....

Multiple Time Series Forecasting With Holt-Winters In Python

In today’s article, we’re going to explore the ins and outs of training a Holt-Winters model for forecasting multiple time series in Python. Holt-Winters is a very popular forecasting algorithm that can capture seasonality and trends in time series data through exponential smoothing. I’ll use StatsForecast, a scalable and easy-to-use Python library that can help you train a Holt-Winters model quickly and efficiently. You don’t need to be a programming wizard to get started with this library, and it can save you hours of coding time....

Multiple Time Series Forecasting with DeepAR in Python

In this post, we will learn how to use DeepAR to forecast multiple time series using GluonTS in Python. DeepAR is a deep learning algorithm based on recurrent neural networks designed specifically for time series forecasting. It works by learning a model based on all the time series data, instead of creating a separate model for each one. In my experience, this often works better than creating a separate model for each time series....

Multiple Time Series Forecasting with Prophet in Python

In this blog post, I will walk you through a complete example of how to use Prophet for multiple time series forecasting. Prophet, developed by Facebook (Meta) is an alternative to popular univariate time series models like ARIMA, that is claimed to be better for business use cases. I will teach everything from installing Prophet to saving a trained model, and along the way, I will explain each step of the process in detail....

Volatility Forecasting In Python

In this blog post, we will explore how we can use Python to forecast volatility using three methods: Naive, the popular GARCH and machine learning with scikit-learn. Volatility here is the standard deviation of the returns of a financial instrument. I will teach you starting points to kickstart your own research. Installing ARCH and mlforecast First we need to install the required packages. You can do it with pip: pip install arch pip install mlforecast or with conda:...

Bayesian Time Series Forecasting in Python with Orbit

In this article, you will learn how to use Orbit, a Python library for Bayesian time series forecasting. Orbit is a very straightforward library developed at Uber that offers an interface to train Bayesian exponential smoothing models implemented via the probabilistic programming languages Stan and Pyro. This is a practical guide: the goal here is not to go into the math behind the models, but rather to show how you can use Orbit in practice to forecast time series data using Bayesian models....

Multiple Time Series Forecasting with Temporal Convolutional Networks (TCN) in Python

In this article you will learn an easy, fast, step-by-step way to use Convolutional Neural Networks for multiple time series forecasting in Python. We will use the NeuralForecast library which implements the Temporal Convolutional Network (TCN) architecture. Temporal Convolutional Network (TCN) This architecture is a variant of the Convolutional Neural Network (CNN) architecture that is specially designed for time series forecasting. It was first presented as WaveNet. Source: WaveNet: A Generative Model for Raw Audio...

Naive Time Series Forecasting in Python

What Is Naive Forecasting? Whenever you start a time series forecasting project, you should start with a naive model. A naive model is a very simple rule that you use to generate predictions for the future. It’s easy to implement and it gives you a baseline to compare your more complex models against. Here you will learn how to use the StatsForecast library, which provides the most popular naive models for time series forecasting in Python....

Multiple Time Series Forecasting with ARIMA in Python

ARIMA is one of the most popular univariate statistical models used for time series forecasting. Here you will learn how to use the StatsForecast library, which provides a fast, scalable and easy-to-use interface for us to train ARIMA models in Python. To understand ARIMA, let’s take an example of sales forecasting. Suppose a retail store has historical sales data for the past 12 months. To make a sales forecast for the next 3 months, we can fit an ARIMA model to this data....

Multiple Time Series Forecasting With LSTM In Python

You’ve probably heard about LSTMs, and might be curious about how they can help you with multiple time series forecasting. As machine learning practitioners, we come across various forecasting tasks, and choosing the right model can sometimes be a challenge. LSTMs have gained attention for their ability to handle long-term dependencies in sequential data, making them a promising choice for time series problems. By the end of this tutorial, you’ll have a deeper understanding of LSTMs and be better prepared to use them effectively for multiple time series forecasting projects....

Multiple Time Series Forecasting With Scikit-learn

Forecasting time series is a very common task in the daily life of a data scientist, which is surprisingly little covered in beginner machine learning courses. It can be predicting future demand for a product, city traffic or even the weather. With accurate time series forecasts, companies can adjust their production strategies, inventory management, resource allocation and other key decisions, leading to significant cost reduction and increased efficiency. Furthermore, forecasts also allow companies to be more proactive rather than reactive, anticipating market trends and adjusting their strategies accordingly....

Does XGBoost Need Feature Scaling Or Normalization?

If you are using XGBoost with decision trees as your base model, you don’t need to worry about scaling or normalizing your features. Decision trees are not sensitive to the scale of the features. In practice, I have seen very minor differences in score by scaling[features for decision trees, but these are due to numerical computing implementations and not significant in practice. If you are using XGBoost with linear models as base models, it is a good idea to scale or normalize the features....

Adstock in Marketing Mix Modeling

What Is Adstock in Marketing Mix Modeling? Adstock in marketing mix modeling is a way to take into account that the impact of an advertisement on consumer behavior may not be immediate, but rather may build over time. In other words, we need to adjust our model to the fact that changes in how people feel about a product or brand after seeing an ad can last even after the ad is no longer being shown....

How To Do Time Series Cross-Validation In Python

One can’t simply use a random train-test split when building a machine learning model for time series Doing it would not only allow the model to learn from data in the future but show you an overoptimistic (and wrong) performance evaluation. In real-life projects, you always have a time component to deal with. Changes can happen in nanoseconds or centuries, but they happen and you are interested in predicting what will come next....

How I Used Machine Learning To Get 20,000 Real Followers On Twitter

Writing this article will very likely get my Twitter account banned, as I am publicly pleading guilty to breaking the terms of service. Anyway, it’s worth it. To learn more about how conversion modeling works end-to-end, I threw machine learning at a well-known black hat strategy to grow a following on Twitter. It grew my account from about 900 followers to 20,000 in about 6 months. Here’s how it went…...

The Secret To Build The Best Machine Learning Models? Iterate Fast

There’s ongoing anxiety in our area born from the perceived need to “catch up” with the new shiny state-of-the-art technology. And yes, it’s good to always be learning new methods to solve problems with machine learning. But I believe one thing is more important than knowing everything: knowing how to find possible solutions and iterate fast. This is what I mean by “iterate fast”: having the ability to quickly test different solutions and find the one that works best for your problem....

Overview Of My 3rd Place Solution To The Criteo ECML-PKDD 22 Challenge

Some posts ago, I shared about a competition sponsored by Criteo that I decided to participate in to learn more about the out-of-distribution robustness of machine learning models. It turns out I got 3rd place and a prize! In any competition, you need to write a report on your solution to claim the prize, so I decided to post it here too. Enjoy! Background The competition had a dataset with about 40 categorical features “aggregated from traces in computational advertising” and a binary target....

8 Tips To Get People To Trust Your Machine Learning Models

This is one of those things you only find out after getting a job as a data scientist. Imagine that someone simply showed up with an app to you and said that if you follow what this app recommends, you will reach your goals in life. Interviewing for jobs? Take the job the app ranks highest. Looking for a relationship? Date the person the app shows it’s the best match. Sick?...

The 4-Step Framework I Use To Build Powerful Machine Learning Ensembles

A lot of people find machine learning ensembles very interesting. This is probably because they offer an “easy” way to improve the performance of machine learning solutions. The place where you will see a lot of ensembles is Kaggle competitions, but you don’t need to be a Top Kaggler to know how to build a good ensemble for your project. I have spent a lot of time building and thinking about ensembles and here I will tell you my “4-step ensemble framework”....

Can We Solve Distribution Shift With Clever Training In Machine Learning?

One of the biggest problems we have when using machine learning in practice is distribution shift. A distribution shift occurs when the distribution of the data the model sees in production starts to look different than the data used to train it. A simple example that broke a lot of models was COVID. The quarantine simply changed how people behaved and the historical data became less representative. Another good example is credit card fraud....