Hierarchical forecasting is a method of forecasting time series data where the data is divided into multiple levels of aggregation.

The levels can be thought of as a tree-like structure, where each level represents a different aggregation of the data.

For example, the top level might represent total sales for a company, while the next level down might represent sales for each region, and the level below that might represent sales for each store within each region.

In some cases we want to create accurate forecasts for each level of the hierarchy, while ensuring that the forecasts at each level are consistent with the forecasts at the levels above and below.

This is where the concept of reconciling comes in.

Reconciliation is the process of combining forecasts from different levels of the hierarchy in a way that ensures consistency between the forecasts.

Reconciling forecasts is important for a number of reasons.

First, it can improve the accuracy of the forecasts by incorporating information from multiple levels of the hierarchy.

For example, if the forecasts for each store within a region are reconciled to the forecast for the region as a whole, the resulting forecast for the region is likely to be more accurate than if each store forecast was considered independently.

Second, reconciling forecasts can help to avoid inconsistencies and errors that can arise when forecasting each level independently.

For example, if the forecasts for each store within a region do not add up to the forecast for the region as a whole, this can lead to confusion and make it difficult to make decisions based on the forecasts.

In this tutorial, I will walk you through the steps of installing and using HierarchicalForecast, a fast and scalable Python library that will help you easily reconcile forecasts from multiple hierarchical levels.

Installing HierarchicalForecast

To install HierarchicalForecast, simply run the following command in your terminal:

pip install hierarchicalforecast

Or use conda:

conda install -c conda-forge hierarchicalforecast

HierarchicalForecast is responsible for the reconciliation part, after you have trained forecasting models for each level of the hierarchy and generated predictions.

To train models and generate predictions, you can use any method you like, such as scikit-learn, ARIMA, LSTM, or even naive methods.

I will use a simple Holt-Winters model.

Splitting Time Series Data Into Hierarchical Levels

I will use data from the LACity.org website available on Kaggle.

This dataset contains the number of user sessions for the City of Los Angeles website for each day from 2014 to 2019.

Sessions are periods of time when a user is actively using the website.

Usually they are measured by looking at user activity and have a timeout after 30 minutes since the last user action.

For example, right now you are in a session on my website. If you leave the page and come back after 2 hours, you will be in a new session.

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import os

data = pd.read_csv(os.path.join(path, 'lacity.org-website-traffic.csv'), 
                   parse_dates=['Date']).loc[:, ['Date', 'Device Category', 'Browser', 'Sessions']]
data['Website'] = 'lacity.org'
selected = ['Chrome', 'Firefox', 'Internet Explorer', 'Opera', 'Safari',
       'Android Browser', 'Safari (in-app)']
data = data.loc[data['Browser'].isin(selected), :]
data = data.rename(columns={'Date': 'ds', 'Sessions': 'y'})

This is what the raw data looks like:

Date Device Category Browser Sessions Website
2014-01-01 00:00:00 desktop Chrome 934 lacity.org
2014-01-01 00:00:00 desktop Firefox 761 lacity.org
2014-01-01 00:00:00 desktop Internet Explorer 1107 lacity.org
2014-01-01 00:00:00 desktop Opera 35 lacity.org
2014-01-01 00:00:00 desktop Safari 554 lacity.org

We have the following levels: Website, Device Category and Browser.

The Website level is the top level, then we have Device Category and Browser as the next levels.

For example, you are visiting this website from Desktop or Mobile, and you are likely using one of Chrome, Firefox, Edge, Opera or Safari.

HierarchicalForecast and it’s sister libraries expect the data to have 3 columns:

  • ds: the timestamp of the observation
  • y: the value of the observation
  • unique_id: a unique identifier for each time series

So I renamed the original Date column to ds and the Sessions column to y.

The original dataset didn’t have the Website column, so I added it manually. Although it’s the same for all rows, it’s still required so we can evaluate the aggregated forecasts later.

This is the data after the preprocessing:

ds Device Category Browser y Website
2014-01-01 00:00:00 desktop Chrome 934 lacity.org
2014-01-01 00:00:00 desktop Firefox 761 lacity.org
2014-01-01 00:00:00 desktop Internet Explorer 1107 lacity.org
2014-01-01 00:00:00 desktop Opera 35 lacity.org
2014-01-01 00:00:00 desktop Safari 554 lacity.org

As always, I will split the data into train and validation sets in a simple time series validation scheme.

train = data.loc[data['ds'] < '2019-01-01']
valid = data.loc[(data['ds'] >= '2019-01-01') & (data['ds'] < '2019-02-01')]
h = valid['ds'].nunique()

Our validation data has 31 days, so our forecast horizon h is 31.

Now let’s make a list of the columns that represent the different levels of the hierarchy.

spec = [['Website'],
        ['Website', 'Device Category'],
        ['Website', 'Device Category', 'Browser']]

We will use this list to split the data into different levels with the aggregate function.

from hierarchicalforecast.utils import aggregate

train_agg, S_train, tags = aggregate(train, spec)
valid_agg, _, _ = aggregate(valid, spec)

We need to pass two arguments to it: the original dataframe and the list of lists of columns that represent the different levels of the hierarchy.

I like to do it separately for the train and validation sets to avoid data leakage.

The function returns 3 objects:

The first (train_agg or valid_agg) is a huge dataframe with the original data split into different levels and stacked on top of each other.

This is a random sample of rows from the train_agg dataframe:

unique_id ds y
lacity.org/mobile/Opera 2015-08-03 00:00:00 11546
lacity.org/tablet/Android Browser 2016-05-01 00:00:00 0
lacity.org/mobile/Android Browser 2018-06-11 00:00:00 0
lacity.org/desktop/Safari 2016-09-21 00:00:00 139579
lacity.org/tablet/Opera 2018-06-21 00:00:00 0
lacity.org/mobile/Internet Explorer 2014-05-28 00:00:00 0
lacity.org/desktop/Opera 2015-11-23 00:00:00 0
lacity.org/desktop 2015-01-17 00:00:00 833387
lacity.org 2018-06-04 00:00:00 2.84359e+06
lacity.org/desktop/Opera 2016-08-05 00:00:00 0

Notice that aggregate removed the Device Category and Browser columns, and replaced them with the unique_id column.

In Nixtla’s forecasting libraries (which HierarchicalForecast is part of), the unique_id column is used to identify each separate time series, as told above.

original web traffic time series

Some of the time series are very seasonal and kind of easy to forecast, but others are more challenging.

The second object (S_train) is a summing matrix that indicates which series are aggregated into which other series.

lacity.org/desktop/Chrome lacity.org/desktop/Firefox lacity.org/desktop/Internet Explorer lacity.org/desktop/Opera lacity.org/desktop/Safari
lacity.org 1 1 1 1 1
lacity.org/desktop 1 1 1 1 1
lacity.org/mobile 0 0 0 0 0
lacity.org/tablet 0 0 0 0 0
lacity.org/desktop/Chrome 1 0 0 0 0

For example, the first row indicates that the lacity.org series is the sum of all the other series.

If we do the dot product of a row with the values of the original series indicated in the columns, we get the value of the aggregated series.

tags is a dictionary that maps the original categories to the unique_id values.

{...,
    'Website/Device Category': array(['lacity.org/desktop', 'lacity.org/mobile', 'lacity.org/tablet'],
       dtype=object), ...}

Now we are ready to train the models.

Training Forecasting Models For Each Hierarchical Level

I will use the StatsForecast library to train our Holt-Winters models.

from statsforecast import StatsForecast
from statsforecast.models import HoltWinters
model = StatsForecast(models=[HoltWinters(season_length=7, error_type='A')], 
                              freq='D', n_jobs=-1)
model.fit(train_agg)

This code will train a separate Holt-Winters model for each level of the hierarchy (each unique_id we have in the train_agg dataframe).

For more details on StatsForecast and Holt-Winters, please refer to the links above.

Like I said, you can use any forecasting library you want, as long as you can generate forecasts for each level of the hierarchy.

We use the forecast method to generate the necessary data.

p = model.forecast(h=h, fitted=True)
p_fitted = model.forecast_fitted_values()

The argument h is the forecast horizon, and fitted=True indicates that we want to generate forecasts for both the training and validation periods.

Remember to do this if you are using a different forecasting library.

The forecast_fitted_values method returns the fitted values for the training period.

This is what p looks like:

unique_id ds HoltWinters
lacity.org 2019-01-01 00:00:00 1.67954e+06
lacity.org 2019-01-02 00:00:00 1.81115e+06
lacity.org 2019-01-03 00:00:00 1.52177e+06
lacity.org 2019-01-04 00:00:00 1.13103e+06
lacity.org 2019-01-05 00:00:00 943.062

And this is what p_fitted looks like:

unique_id ds y HoltWinters
lacity.org 2014-01-01 00:00:00 1.66275e+06 2.28172e+06
lacity.org 2014-01-02 00:00:00 3.14611e+06 1.64666e+06
lacity.org 2014-01-03 00:00:00 2.85295e+06 2.20869e+06
lacity.org 2014-01-04 00:00:00 1.74434e+06 1.2439e+06
lacity.org 2014-01-05 00:00:00 1.93406e+06 1.61387e+06

It’s the format expected by HierarchicalForecast.

Reconciling Forecasts From Different Hierarchical Levels

Now that we have trained forecasting models for each level of the hierarchy, we need to reconcile the forecasts to ensure that they are consistent with each other.

From simple to complex, there are five different reconciliation methods that I will cover in this post and tell you how to select the best one for your problem.

Bottom-Up Reconciliation

The bottom-up method starts by forecasting the time series at the lowest level of the hierarchy (e.g., at the browser level), and then aggregates the forecasts up the hierarchy to higher levels (e.g., to the website level).

Top-Down Reconciliation

The top-down method starts by forecasting the time series at the highest level of the hierarchy (e.g., the website), and then disaggregates the forecasts down the hierarchy to lower levels (e.g., to the device and browser levels).

MinTrace, Optimal Combination and Empirical Risk Minimization

The MinTrace, Optimal Combination and Empirical Risk Minimization try to go further by explicitly minimizing the errors of the reconciled forecasting through optimization.

You can find more details and papers about them in the links above.

Selecting The Best Forecast Reconciliation Method

As we have a train and validation period, we can use the validation period to select the best reconciliation method just like any other hyperparameter.

from hierarchicalforecast.methods import BottomUp, TopDown, MinTrace, ERM, OptimalCombination
from hierarchicalforecast.core import HierarchicalReconciliation

reconcilers = [BottomUp(), 
               TopDown(method='forecast_proportions'),
               TopDown(method='average_proportions'),
               TopDown(method='proportion_averages'),
               MinTrace(method='ols', nonnegative=True),
               MinTrace(method='wls_struct', nonnegative=True),
               MinTrace(method='wls_var', nonnegative=True),
               MinTrace(method='mint_shrink', nonnegative=True), 
               MinTrace(method='mint_cov', nonnegative=True),
               OptimalCombination(method='ols', nonnegative=True), 
               OptimalCombination(method='wls_struct', nonnegative=True),
               ERM(method='closed'),
               ERM(method='reg'),
               ERM(method='reg_bu'),
              ]

rec_model = HierarchicalReconciliation(reconcilers=reconcilers)

p_rec = rec_model.reconcile(Y_hat_df=p, Y_df=p_fitted, S=S_train, tags=tags)

HierarchicalForecast makes it very easy to train and compare different reconciliation methods.

We first need a list of the methods we want to try.

In this case I took the five we discussed above, with the allowed parameters plus an indicator that we want to use non-negative forecasts when possible.

Then we create a HierarchicalReconciliation object and pass the list of reconciliation methods to the reconcilers argument.

Finally, we use the reconcile method to reconcile the forecasts.

This method requires the following arguments:

  • Y_hat_df: the forecasts generated by the forecasting models for each level of the hierarchy in the validation period.
  • Y_df: the predictions for the observations in the training period.
  • S: the summing matrix
  • tags: the mapping of each level of the hierarchy to the original categories.

This runs very fast, and it will return a dataframe with the forecasts for each reconciliation method.

unique_id ds HoltWinters HoltWinters/BottomUp HoltWinters/TopDown_method-forecast_proportions HoltWinters/TopDown_method-average_proportions
lacity.org 2019-01-01 00:00:00 1.67954e+06 1.63374e+06 1.67954e+06 1.67954e+06
lacity.org 2019-01-02 00:00:00 1.81115e+06 1.63167e+06 1.81115e+06 1.81115e+06
lacity.org 2019-01-03 00:00:00 1.52177e+06 1.43171e+06 1.52177e+06 1.52177e+06
lacity.org 2019-01-04 00:00:00 1.13103e+06 1.13859e+06 1.13103e+06 1.13103e+06
lacity.org 2019-01-05 00:00:00 943.062 141860 943.062 943.062

I limited the columns here, but there is a column with the forecast of each reconciliation method.

I merged the forecasts with the actual values to compute the errors.

p_rec_ = p_rec.merge(valid_agg, on=['ds', 'unique_id'], how='left')
p_rec_['y'] = p_rec_['y'].fillna(0)
p_rec_ = p_rec_.reset_index(drop=False)

from sklearn.metrics import mean_squared_error

rmse = dict()
for model_ in p_rec.columns[1:]:
    rmse_ = mean_squared_error(p_rec_['y'].values, p_rec_[model_].values, squared=False)/1e3
    # get only the model name
    model__ = model_.split('/')[-1]
    rmse[model__] = rmse_
pd.DataFrame(rmse, index=['RMSE']).T.sort_values('RMSE')

You can use any metric you want, but I will use the RMSE as an example.

After computing the RMSE for each reconciliation method, I sorted it from the lowest to the highest.

(Values are divided by 1,000 to make it easier to read.)

RMSE
MinTrace_method-mint_shrink_nonnegative-True 210.327
MinTrace_method-wls_var_nonnegative-True 211.071
MinTrace_method-wls_struct_nonnegative-True 211.715
OptimalCombination_method-wls_struct_nonnegative-True 211.715
MinTrace_method-ols_nonnegative-True 215.173

For this dataset, the MinTrace method with the mint_shrink parameter gave us the lowest error.

As always, evaluate more than one metric and use your domain knowledge as a prior to help you select the best reconciliation method.

These are some of the forecasts for a few series:

forecasts for a few series after reconciling with the MinTrace method

In blue we have the actual values, in orange the forecasts from the Base model, which is the Holt-Winters over the raw series and in green the forecasts after reconciling with the selected MinTrace method.