Overview Of My 3rd Place Solution To The Criteo ECML-PKDD 22 Challenge

Some posts ago, I shared about a competition sponsored by Criteo that I decided to participate in to learn more about the out-of-distribution robustness of machine learning models. It turns out I got 3rd place and a prize! In any competition, you need to write a report on your solution to claim the prize, so I decided to post it here too. Enjoy! Background The competition had a dataset with about 40 categorical features “aggregated from traces in computational advertising” and a binary target....

June 8, 2022 · 4 min · Mario Filho

Can We Solve Distribution Shift With Clever Training In Machine Learning?

One of the biggest problems we have when using machine learning in practice is distribution shift. A distribution shift occurs when the distribution of the data the model sees in production starts to look different than the data used to train it. A simple example that broke a lot of models was COVID. The quarantine simply changed how people behaved and the historical data became less representative. Another good example is credit card fraud....

May 27, 2022 · 12 min · Mario Filho

Are Kaggle Competitions Worth It? Ponderings of a Kaggle Grandmaster

I would not have a data science career without Kaggle. So if you are looking for a blog post bashing Kaggle, this is not the place. Competing on Kaggle is worth it! It will teach you a lot about machine learning, give you a project to talk about in interviews, which can even help you get a job! That said, I am not a radical that thinks Kaggle is the ultimate thing that everyone must do in order to become a data scientist....

April 5, 2023 · 15 min · Mario Filho