Have you ever been excited to dive into understanding marketing mix modeling (MMM) for a project or just to learn, but got stuck trying to find quality datasets?

You’re not alone!

It can be a real pain trying to sift through endless data sources to find the perfect one for your needs.

Whether you’re a student working on a project, a professional testing a new concept, or someone eager to master MMM techniques, I’ve got your back.

As someone who has faced the same challenges and found success, I’ve put together a list of high-quality marketing mix modeling datasets that will help you kickstart your journey.

These datasets have been carefully selected based on their relevance, accessibility, and variety, so you can be confident that you’re working with the best data possible.

While some basic knowledge of marketing mix models and data manipulation is required, you don’t need to be an expert to use these resources.

This comprehensive list has something for everyone, regardless of your level of expertise in MMM.

So, buckle up and get ready to explore the world of marketing mix modeling datasets that will help you learn, grow, and produce stunning results in your projects.

What Does A Dataset For Marketing Mix Modeling Look Like?

You’ll primarily work with two types of data: the key performance indicator (KPI) and the marketing channels.

Let’s dive into what these are and how they simplify the dataset for an easier understanding of MMM.

Key Performance Indicators (Target Variable)

The key performance indicator is the target variable for any MMM project, as it represents the goal you’re trying to achieve or improve.

This could be anything related to the success of your product or service, such as sales revenue, unit sales, or leads.

Your main objective in using MMM is to understand how different marketing channels impact your KPI and optimize your marketing spend accordingly.

Marketing Channels (Input Variables)

Marketing channels, also known as input variables, are the different avenues you use to promote and advertise your product or service.

Examples of marketing channels include television ads, social media, radio, billboards, email marketing, and online display ads.

But how exactly do you measure the impact of each marketing channel on your KPI?

It can take various formats depending on the nature and source of marketing activities.

The most common for modern MMMs are impressions, which refer to the number of times an ad or promotional content has been shown to the audience.

This format is widely used in digital marketing, where it’s relatively easy to track the number of ad views.

You can easily find this data in your Google Ads account, Meta Ads Manager or other similar platforms.

Budget or spend is another common format for input variables in an MMM dataset, as it directly measures the investment in a marketing channel.

This format can be used across various marketing channels, such as television, radio, print, and digital advertising.

In an MMM context, budget or spend indicates the amount of money allocated to different marketing channels and helps evaluate the ROI for each channel.

A third, less common format, are clicks or actions that capture users’ engagement with marketing content.

Now that you know what a dataset for marketing mix modeling looks like, let’s see examples of what they look like in practice.

Marketing Mix Modeling Datasets With Real Data

These datasets are either explicitly declared as real data by the publisher or look like it based on my experience.

TikTok, Facebook And Google Ads Dataset

Date TikTok Facebook Google Ads Sales
1/7/2018 13528.1 0 0 9779.8
1/14/2018 0 5349.65 2218.93 13245.2
1/21/2018 0 4235.86 2046.96 12022.7
1/28/2018 0 3562.21 0 8846.95
2/4/2018 0 0 2187.29 9797.07

Although the publisher didn’t specify, this looks like real weekly spend data from the popular digital channels TikTok, Facebook, and Google Ads.

It’s the first of the datasets that have a date column, which can help us model seasonality.

This is very good, as it’s closer to the challenges you’ll face when building an MMM model with your own data.

YouTube, Facebook And Newspaper Spend Dataset

youtube facebook newspaper sales
84.72 19.2 48.96 12.6
351.48 33.96 51.84 25.68
135.48 20.88 46.32 14.28
116.64 1.8 36 11.52
318.72 24 0.36 20.88

According to the publisher, this dataset has the spend (in hundreds of USD) of small scale startups on different marketing channels.

We don’t have a date column, but we can assume the data is ordered by time.

It’s a tab-delimited dataset, so remember to specify the delimiter when loading it.

Division-Level Marketing Spend Dataset

Division Calendar_Week Paid_Views Organic_Views Google_Impressions Email_Impressions Facebook_Impressions Affiliate_Impressions Overall_Views Sales
A 1/6/2018 392 422 408 349895 73580 12072 682 59417
A 1/13/2018 787 904 110 506270 11804 9499 853 56806
A 1/20/2018 81 970 742 430042 52232 17048 759 48715
A 1/27/2018 25 575 65 417746 78640 10207 942 72047
A 2/3/2018 565 284 295 408506 40561 5834 658 56235

This is a real weekly dataset from a large company about the media efforts of multiple Divisions in different digital channels.

It has data in multiple formats, including organic and paid impressions, YouTube views, and email, which makes it a more challenging dataset to work with.

Still, it’s very, very close to the data you’ll find in real life problems.

You can model the sales by Division or aggregate everything to get a more general model.

Simulated Marketing Mix Modeling Datasets

These datasets are explicitly declared as simulated data by the publisher or look like it based on my experience.

WeChat And Weibo Spend Dataset

wechat weibo others sales
304.4 93.6 294.4 9.7
1011.9 34.4 398.4 16.7
1091.1 32.8 295.2 17.3
85.5 173.6 403.2 7
1047 302.4 553.6 22.1

Another very simple dataset, but this time we have digital marketing channels instead of traditional ones.

The publisher didn’t specify what the input values are, but they look like spend.

Advertising Sales Dataset

TV Ad Budget ($) Radio Ad Budget ($) Newspaper Ad Budget ($) Sales ($)
1 230.1 37.8 69.2 22.1
2 44.5 39.3 45.1 10.4
3 17.2 45.9 69.3 9.3
4 151.5 41.3 58.5 18.5
5 180.8 10.8 58.4 12.9

This is a classic dataset that’s been used in many MMM projects.

The first column is a row index. It has 3 input variables (TV, radio, and newspaper) and 1 output variable (sales).

Each column represents the budget spent on a marketing channel in a period and the sales revenue generated in the same period.

For example, it can be the weekly budget spent on the channels and the weekly sales revenue generated.

You can easily load it in Excel or Google Sheets and start working with it right away.

Traditional And Digital Media Dataset

Time tv_sponsorships tv_cricket tv_RON radio NPP Magazines OOH Social Programmatic Display_Rest Search Native sales
1/1/01 119.652 66.729 43.719 37.8 55.36 13.84 35 41.8782 5 33.5026 26.802 5 22100
1/2/01 23.14 12.905 8.455 39.3 36.08 9.02 35 8.099 5 6.4792 5.18336 6 10400
1/3/01 8.944 4.988 3.268 45.9 55.44 13.86 35 3.1304 5 2.50432 2.00346 7 9300
1/4/01 78.78 43.935 28.785 41.3 46.8 11.7 35 27.573 5 22.0584 17.6467 5 18500
1/5/01 94.016 52.432 34.352 10.8 46.72 11.68 35 32.9056 5 26.3245 21.0596 7 12900

This dataset looks like it’s real, but I can’t confirm.

It has 12 input variables (traditional and digital media channels), a date column and 1 output variable (sales).

Each column seems to represent the budget spent on a marketing channel in a month and the sales revenue generated in the same period.