What I Learned Watching 7 Hours of Meta's Marketing Mix Modeling Summits

I found about 7 hours of videos of 3 Meta Summits on Marketing Mix Modeling recorded in 2021.

They invited experts and companies that used marketing mix models to improve their ROI on the platform.

I watched all of it, and here are my learnings.

Granularity Is King

We need to divide and conquer our datasets.

Most of the presentations shared the importance of looking at the data deeper than only at the channel level.

Ekimetrics shared a study where they used 12 different dimensions as channels (aka features, inputs, columns) on their models.

For example, they took data from Meta campaigns and set the model channels to be the Campaign Objective. Then they found which objective provided the highest ROI.

You can do this with other dimensions like Frequency, Creative Format, Placement, Funnel Level, Audiences, etc.

Even specific creatives and hooks, if you have enough data.

I like this idea. My only concern is that by testing many things we are sure to find many false positives. So keep this in mind and don’t go crazy with the idea.

If You Have Daily Data, Prefer It Over Weekly Data

This is not always possible but a dataset with more rows will help your model be more robust.

They talked about using panel structured data. It’s a method I have used very successfully when forecasting time series.

The idea is to split and “stack” data from different subdivisions.

For example, instead of a single time series, create datasets for each geo and append them together. If you have 5 geos and 200 days of data, now instead of 200 rows you have 1000 rows.

As long as they are not highly correlated, we can expand this idea to data from multiple websites, brands, and product categories.

It will help the model capture global and local effects. Just remember to add an extra column with an indicator for the subdivision you picked to help the model understand their specific baselines.

Prefer Short Modeling Periods And More Recent Data

Most recommendations are to use 1 year of data maximum. This does not capture yearly seasonality well, so adapt to your specific case.

An untested idea we can borrow from machine learning is weighing recent samples more during the model training.

This would make the model work harder to get the most recent patterns right, while not ignoring the old patterns completely.

Models Work Better When There Is High Variability In The Inputs

For example: if one day in a channel you had 500 impressions, then 700, then 300, it’s easier to model the true effect of the channel than if you had 500, 510, 495.

It makes me wonder if it would make sense to adapt “dropout” to the data collection process.

In machine learning, this is a technique where we turn off parts of a model during training to help avoid overfitting.

For campaigns, it would mean randomly turning off a percentage of campaigns each day.

The problem is: this would certainly hurt the internal platform learning algorithms and we would need to convince people that it’s a good idea. So I don’t think it’s feasible.

Randomly reducing budgets for a limited period to inject a bit of variability is doable. Up to 20% changes in the budget don’t seem to throw off campaign performance on the largest ad platforms.

Validation With Experiments

As suggested by a commenter on my marketing mix modeling case study post on LinkedIn, comparing the results of the marketing mix model to on-platform attributed conversions is not ideal.

Validating the findings with lift studies (like randomized control trials, geo lift) is the way to go.

Let’s say your model is suggesting that advertising on Audience Network gives you a great ROI. How to test it?

You can use Meta’s internal experiment tool available on Ads Manager to run a study splitting users into a group that will see Audience Network ads and another that won’t.

In a scenario where we can’t reliably identify unique users (like using browser cookies when someone can use multiple devices), geo lift studies seem better.

Here you assign non-overlapping geographical regions as treatment and control and compute the results accordingly.

We just need to make sure the regions across buckets are comparable.

Let’s say we are running electoral campaigns in the US.

The assignment needs to take into account the historical voting biases of states. A random assignment with match constraints can help.

A wonderful book to learn more about online experiments is “Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing”.

Marketing Mix Models vs Multi-touch Attribution

These are the two most powerful solutions we have to optimize our campaigns today.

If we are serious about maximizing ROI, we need both.

Marketing mix models give us a macro vision to help with our strategy, tactical budget allocation, and new experiment hypotheses.

Multi-touch attribution helps with the micro vision of day-to-day campaign management.

Lots of tutorials teach us to run campaigns for a few days, look at the attributed conversions, and optimize accordingly. This works up to a point. Marketing mix models can help us go beyond it.

What to do when the results from MTA conflict with MMM? Run an experiment to validate.

These tools are best used as part of an iterative, virtuous cycle of improvement.

MMMs can help in cases where we simply don’t have attribution data.

For example, I can’t put a Meta pixel on Amazon and run campaigns promoting my book, but I can run a marketing mix model to estimate how well each campaign is doing by using data from the sales report.

Modeling Tips

In my previous article, I looked at the Media Effects (Beta) plot. But looking at the ROI plot guides us better.

We should be careful though and always validate with experiments.

My plot shows Audience Network and Messenger campaigns giving me the highest ROI potential, but these are the places I have very little spend, which artificially inflates the ROI estimate due to uncertainty.

My current rule of thumb: use ROI plots to get ideas for experiments and use predictions with an optimizer to find the optimal budget allocation. This makes it even more important to have a train/validation split when building the model instead of relying only on in-sample metrics.

Always remember to bring common sense (as a strong prior) to your analysis.

One presentation suggested using non-sales data as a target. This means using micro/proxy conversions like time on site, organic searches, etc as targets.

This is a bit dangerous if your goal is to get more sales.

Although the number of people initiating checkout is correlated to purchases, anyone that ran a conversion campaign with the “Initiate Checkout” objective knows how well the algorithm can find people that go to checkout but never buy.

I prefer a hybrid approach in low conversion environments (like mine): setting a monetary value to micro-conversions and treating the target as a reward instead of binary sales.

More about this to come in a future article where I experiment with it in my Google Ads account.

Marketing Mix Modeling Is Living a Renaissance

The marketing mix models we are talking about are not the standard “TV, Radio, Digital” regressions from the 1970s.

We can learn a lot from them, but we are talking about a renaissance.

We have an abundance of very granular data, available in almost real-time and open-source tools like Robyn and LightweightMMM.

As long as you have reliable data, you can build one, no matter the size of your business.

Borrowing Methods From Machine Learning Are Making Them Even More Powerful

One example is: how to choose between implementations, like Robyn and LightweightMMM? Instead of choosing we can combine them and have a more robust solution.

Ensembles (combinations) of models are very popular and highly effective in machine learning. I bet they are the same here.

With time we will see traditional machine learning models entering the arena when we need to improve the predictive accuracy at the cost of interpretability.

“Always-On” Models

The expert panels shared a complaint about depending on third-party companies that take a long time to send results (making them useless as a decision tool) and are not transparent about the methodology.

This is changing with “always-on” models via SaaS: integrating APIs to the modeling platform to have almost real-time data and models. A straightforward data engineering task worth thousands of dollars.

Next Steps and Sources

I recommend you watch at least one of the summits.

It helped confirm many hypotheses and insights I had when creating the marketing mix model of the previous article and gave me a lot of material to read and keep learning.

You can find them here.

Another interesting summit, but from Google, is here.

And here is a video demonstrating the panel structured data modeling in a demand forecasting example, that can easily be adapted to media data.

Granularity Is King#

If You Have Daily Data, Prefer It Over Weekly Data#

Prefer Short Modeling Periods And More Recent Data#

Models Work Better When There Is High Variability In The Inputs#

Validation With Experiments#

Marketing Mix Models vs Multi-touch Attribution#

Modeling Tips#

Marketing Mix Modeling Is Living a Renaissance#

Borrowing Methods From Machine Learning Are Making Them Even More Powerful#

“Always-On” Models#

Next Steps and Sources#