What Is Adstock in Marketing Mix Modeling?

Adstock in marketing mix modeling is a way to take into account that the impact of an advertisement on consumer behavior may not be immediate, but rather may build over time.

In other words, we need to adjust our model to the fact that changes in how people feel about a product or brand after seeing an ad can last even after the ad is no longer being shown.

In practice, it means we need to transform the inputs of our model to make it understand that previous exposures to the ad can affect the target variable in the current period.

For example, if you are using impressions as the input data, it means the number of impressions you will feed the model today will be the number of real impressions shown today plus an additional quantity to account for the impressions that were shown in the last few days.

Why Is Adstock Important In Marketing Mix Modeling?

Think about a company that is trying to decide between running a display ad campaign on a popular website or a social media ad campaign for a new product.

To help make this decision, the company uses marketing mix modeling with adstock to compare the impact of the two types of campaigns on sales.

The model shows that the display ads have a bigger initial impact on sales, with a peak in the first week after the campaign starts.

However, the impact of the display ads decreases more quickly than the impact of the social media ads, which have a slower, more sustained increase in sales over the course of the campaign.

Based on this information, the company decides to invest more in the social media ad campaign, as it is likely to have a longer-lasting impact on sales.

It also decides to run the social media ad campaign for a longer period of time to fully capitalize on the sustained increase in sales.

How To Calculate Geometric Adstock Model in Python?

Let’s use Numpy to calculate and understand the Geometric Adstock Model, which is a simple decay model that is widely used in marketing mix modeling.

The first thing we do is import the numpy library and define a variable theta which represents the decay parameter. This value should be between 0 and 1

import numpy as np

# Define the decay parameter, "theta"
theta = 0.9

Then, we load the impressions or ad spend data (input features). In this case I am creating a simple array to represent it.

impressions = np.array([100, 100, 300, 400, 500, 300])

Next, we create an empty array adstock with the same shape and type as the sales array, using np.empty_likeso we can store the adstocked values for each period.

adstock = np.empty_like(impressions)

After that we assign the first value of the adstock array to the first value of the input data.

adstock[0] = impressions[0]

Then we loop through the remaining impressions data starting from 1, since the first value of adstock we already assigned previously.

for i in range(1, len(impressions)):
    # Calculate the adstock value for the current period
    adstock[i] = impressions[i] + r * adstock[i-1]

Inside the loop, we use the geometric adstock formula adstock[i] = impressions[i] + r * adstock[i-1] to calculate the adstock value for each period.

The results are in the table below:

Day Raw Impressions Geometric Adstocked Impressions
1 100 100
2 100 190
3 300 471
4 400 823
5 500 1240
6 300 1416

As you can see, in every day after the first, the adstocked value is higher than the raw value, which is exactly what we want, as we consider the delayed impact of the ad when building a marketing mix model.

Taking day 2 as example, with theta=0.9, we assume that having 100 impressions yesterday and 100 today is the equivalent of having 190 impressions today.

This value of theta is not the ideal for everyone, so play around with it to find the value that best fits your use case.

How To Calculate Weibull Adstock Model in Python?

The Weibull Adstock Model is becoming popular due to being implemented in Meta’s Robyn.

It’s claimed to be more flexible and thus a better fit to modern media activity.

Robyn’s implementation is in R, so I rewrote part of it in Python to understand what’s going on.

Let’s dive in!

import numpy as np
from scipy.stats import weibull_min

def adstock_weibull(x, shape, scale):
    x = np.array(x)
    windlen = len(x)

We need numpy and the weibull_min function from the scipy.stats library.

Then, we define the function adstock_weibull that takes three required arguments: the array with impressions x, shape and scale as arguments to the weibull distribution.

Robyn’s original function is more advanced and considers different window lengths to apply the adstock, but here I simplified to a window equal to the full length of the data to make it easier to understand.

x_bin = np.arange(1, windlen+1)
scaleTrans = round(np.quantile(x_bin, scale))

Here, x_bin is an array that goes from 1 to the length of the data, indicating the positions of the data points in the sequence.

In this array we find the position at the quantile indicated by the scale argument and store it as scaleTrans.

if shape == 0 or scale == 0:
    x_decayed = x
    thetaVecCum = thetaVec = np.zeros(windlen)
else:
    thetaVec = np.concatenate(([1], 1-weibull_min.cdf(x_bin[:-1], shape, scale=scaleTrans)))
    thetaVecCum = np.cumprod(thetaVec)

If the shape or scale are zero, the function will return all zeros.

If not, we generate thetaVec and thetaVecCum based on the cumulative distribution function of the weibull distribution.

The values of thetaVecCum are used to discount the impact of the ad over time.

x_decay = np.zeros((len(x), windlen))
for i, (x_val, x_pos) in enumerate(zip(x, x_bin)):
    x_decay[i, x_pos - 1:] = x_val * thetaVecCum[:windlen - x_pos + 1]

x_decayed = np.sum(x_decay, axis=0)

In this loop we apply the decay to the data by multiplying the impressions by the lagged theta factors in the cumulative theta vector and then summing the results.

This is what the decayed values look like before the sum:

Day 1 Day 2 Day 3 Day 4 Day 5 Day 6
100 60.6531 29.9061 12.5791 4.6276 1.51286
0 100 60.6531 29.9061 12.5791 4.6276
0 0 300 181.959 89.7184 37.7373
0 0 0 400 242.612 119.625
0 0 0 0 500 303.265
0 0 0 0 0 300

As you see, each day has a column with the impressions for the day and the adstocked value for past impressions.

To make it very clear, in the table below I included the Geometric Adstock and the Weibull Adstock with the scale=0.5 and shape=0.5 for the same data so we can see compare it.

The Decay Factor column shows the value of Weibull’s theta factor for each day after the original day of the impression.

For these parameters, we always multiply the same day impressions by 1, then by 0.606531 to add it to the next day impressions and so on.

Day Raw Impressions Geometric Adstocked Impressions Weibull Adstocked Impressions Decay Factor
1 100 100 100 1
2 100 190 160.653 0.606531
3 300 471 390.559 0.299061
4 400 823 624.444 0.125791
5 500 1240 849.537 0.046276
6 300 1416 766.768 0.0151286

You can see that Day 2 has its 100 impressions plus Day 1 100 impressions multiplied by the decay factor of 0.606531 and added to it.

Day 3 has its 300 impressions plus Day 2 100 impressions multiplied by the decay factor of 0.606531 plus Day 1 100 impressions multiplied by the decay factor of 0.299061.

You get the idea.

With these parameters, the Weibull Adstock considers that past ads have less of an effect on the following days. The effect practically disappears after 6 days.

Just as with the Geometric model, you need to choose the right parameters for your data.

Which Adstock Models Are Available In Robyn?

Robyn is Meta’s open-source implementation of marketing mix models that is becoming very popular.

According to Robyn’s documentation, you can choose between:

Geometric Adstock Model

This is a simple model that only requires a single parameter, called “theta,” to describe the decay of an ad’s impact over time.

The theta parameter represents the fraction of the ad’s initial impact that is carried over to the next time period.

This model is easy to understand and fast to run, but it may not be as accurate as more complex models, particularly for digital media.

Weibull Adstock Model

In contrast, this is a more flexible model and can better fit modern media activity, but it requires more time to run and it may be more difficult to explain to non-technical stakeholders.

Both models can be used to understand the impact of advertising on consumer behavior and to optimize the timing and frequency of ads to maximize their impact.

Which Adstock Models Are Available In LightweightMMM?

LightweightMMM is Google’s open-source implementation of marketing mix models.

According to LightweightMMM’s documentation, you can choose between:

Adstock

Which applies an infinite lag in a simple way, is similar to the Geometric adstock model from Robyn.

Hill Function Adstock

Very similar to the one above, it takes the simple decay model and applies a sigmoid function to account for diminishing returns.

Carryover

This model applies a causal convolution, which you can think about as a fancy way to do a weighted average of the past values of the impressions.

How To Choose The Best Adstock Model?

The best adstock model for a particular marketing mix model will depend on the specific product or service being advertised, the medium through which the advertisement is delivered, and other factors.

You need to try out different models and compare their fit to the data in order to determine the best model for a given situation.

Ideally split the data into a training set, which will be used to build the model, and a validation set, which will be used to assess the model’s performance.

Make sure the validation data comes from a time after the training data, otherwise you will be using the future to predict the past.

For example, if you have data from January 1 to March 30, take January 1 until February 28 to train and March 1 to the end to validate.

It is important to evaluate an adstock model in an out-of-sample way because it allows you to assess the model’s capacity to capture real underlying patterns instead of just noise.

When you build a model using a training data set and then test it on the same data set, there is a risk that the model will overfit the data, meaning it will perform well on the training data but poorly on new, unseen data.

This is because the model may have learned patterns in the training data that do not generalize to new situations. They only exist in training because of luck.

By evaluating the model on an out-of-sample validation set, you can get a better idea of how well the model will perform in the future.

The ultimate goal of building a model is to make accurate predictions on new data.

If the model performs poorly on the validation set, it is likely to perform poorly on new data as well, so you may need to adjust the model or try a different model.

On the other hand, if the model performs well on the validation set, it is more likely to perform well on new data as well, giving you more confidence in the model’s ability to make accurate predictions.