Most Popular Articles

👉 How I Used Machine Learning To Get 20,000 Real Followers On Twitter

👉 Is Competing On Kaggle Worth It? Ponderings of a Kaggle Grandmaster

👉 Are Marketing Mix Models Useful? I Spent My Own Money To Find Out

Check the latest below ⬇️

3 Essential Methods To Do Time Series Validation In Machine Learning

One can’t simply use a random train-test split when building a machine learning model for time series Doing it would not only allow the model to learn from data in the future but show you an overoptimistic (and wrong) performance evaluation. In real-life projects, you always have a time component to deal with. Changes can happen in nanoseconds or centuries, but they happen and you are interested in predicting what will come next....

September 21, 2022 · 4 min

Learning How a Computer Works: Nand2Tetris First Course Review

This course is great for people that (like me) are self-taught developers and don’t have a traditional computer science background where you learn all the details of computer architecture. Although I have read about it, the fact that computers take electricity and transform it into software was still a bit mysterious to me. After taking this course, it’s not mysterious anymore, but fascinating! It’s almost like magic, but it’s all about physics and math....

August 15, 2022 · 9 min

How I Used Machine Learning To Get 20,000 Real Followers On Twitter

Writing this article will very likely get my Twitter account banned, as I am publicly pleading guilty to breaking the terms of service. Anyway, it’s worth it. To learn more about how conversion modeling works end-to-end, I threw machine learning at a well-known black hat strategy to grow a following on Twitter. It grew my account from about 900 followers to 20,000 in about 6 months. Here’s how it went…...

July 24, 2022 · 9 min

Implementing Uber's Marketing Mix Model With Orbit

There’s a very interesting marketing mix modeling approach published by Uber’s data science team that uses coefficients that vary over time to estimate a media channel’s effects. The modeling approach is called Bayesian Time-Varying Coefficients (BTVC) and it’s available on Orbit, their forecasting package, as Kernel Time-Varying Regression. Instead of getting a single coefficient to understand each media effect, we can see how the effect varied through time with confidence intervals....

July 5, 2022 · 9 min

How To Create A Marketing Mix Model With LightweightMMM

The future of advertising attribution is modeled, predicted, estimated, or whatever other word you want. One of the coolest tools (although still under early development) we have to model the impact of advertising campaigns on revenue is LightweightMMM, an implementation of bayesian marketing mix models developed by Google. I talked about using this tool with my course sales data and really liked the results. I was surprised by the number of people that are trying to solve the same problem in their companies....

June 27, 2022 · 10 min

The Secret To Build The Best Machine Learning Models? Iterate Fast

There’s ongoing anxiety in our area born from the perceived need to “catch up” with the new shiny state-of-the-art technology. And yes, it’s good to always be learning new methods to solve problems with machine learning. But I believe one thing is more important than knowing everything: knowing how to find possible solutions and iterate fast. This is what I mean by “iterate fast”: having the ability to quickly test different solutions and find the one that works best for your problem....

June 20, 2022 · 4 min

Overview Of My 3rd Place Solution To The Criteo ECML-PKDD 22 Challenge

Some posts ago, I shared about a competition sponsored by Criteo that I decided to participate in to learn more about the out-of-distribution robustness of machine learning models. It turns out I got 3rd place and a prize! In any competition, you need to write a report on your solution to claim the prize, so I decided to post it here too. Enjoy! Background The competition had a dataset with about 40 categorical features “aggregated from traces in computational advertising” and a binary target....

June 8, 2022 · 4 min

8 Tips To Get People To Trust Your Machine Learning Models

This is one of those things you only find out after getting a job as a data scientist. Imagine that someone simply showed up with an app to you and said that if you follow what this app recommends, you will reach your goals in life. Interviewing for jobs? Take the job the app ranks highest. Looking for a relationship? Date the person the app shows it’s the best match. Sick?...

June 6, 2022 · 9 min

The 4-Step Framework I Use To Build Powerful Machine Learning Ensembles

A lot of people find machine learning ensembles very interesting. This is probably because they offer an “easy” way to improve the performance of machine learning solutions. The place where you will see a lot of ensembles is Kaggle competitions, but you don’t need to be a Top Kaggler to know how to build a good ensemble for your project. I have spent a lot of time building and thinking about ensembles and here I will tell you my “4-step ensemble framework”....

May 30, 2022 · 7 min

Can We Solve Distribution Shift With Clever Training In Machine Learning?

One of the biggest problems we have when using machine learning in practice is distribution shift. A distribution shift occurs when the distribution of the data the model sees in production starts to look different than the data used to train it. A simple example that broke a lot of models was COVID. The quarantine simply changed how people behaved and the historical data became less representative. Another good example is credit card fraud....

May 27, 2022 · 12 min