How Writing Journals Helped Me Become a Better Data Scientist

I have been using daily machine learning development journals for the past 5 years.

These are simple diaries that help me organize my thoughts, come up with new ideas and track my progress.

I have found them to be an invaluable tool for helping me improve as a data scientist, so I decided to share my experience, as you can find it useful too. 🙂

What is a Daily Machine Learning Journal?

A daily machine learning journal is a document where each day after you finish your work you write about:

what you did
why you did it
what you expected
what were the results

The main purpose is to help you clarify your thoughts so you can have new ideas to try.

How I Write A Daily Machine Learning Journal

You don’t need to write a novel every day. A few sentences are enough.

I like to have a separate journal for each of my projects.

Here’s how I would answer the questions above in the first entry of a new project:

What I Did and Why

Date: 2097-06-02

Today, I started working on a machine learning model to predict whether a customer will pay for a subscription after their trial ends.

Based on our meetings with the product team, we want to estimate the number of new users to help forecast revenue and growth.

What I Did and Expected

I started by researching blogs, papers, and Kaggle competitions on the topic and making a list of datasets people used so I can see if we have the same information in our tables.

What Were The Results

It seems engagement features like how many time the user logged on are very important.

(List of features I find interesting)

People usually develop these systems to make an intervention (like sending an e-mail) to increase the likelihood of the user paying for a subscription.

It’s a route we can take in the future. For that, we will need to run experiments or at least gather data from previous marketing actions.

As you see, the boundaries between each question are not very clear.

In a real journal, I don’t even write the question as titles like I did here. I just want a document where I can dump my raw thoughts.

This is not supposed to replace the clean, formal documentation of the solution, so don’t worry about that.

The idea is to have a place where you can be messy and just write whatever comes to your mind without thinking too much.

If more people are working on the same project, the journal should live inside the project’s repository so anyone working on it can easily find it and add their own entries.

An Entry From Further In The Project

This would be an entry when I am already testing models:

Today, I tested 3 different models: xgboost, logistic regression, and random forest.

As this is tabular data, I expect to see tree-based ensembles working better than neural networks. In any case, it’s worth trying neural networks.

For the random forest, I increased the number of trees to 2000 and played a bit with the min_samples_leaf to see if regularization helped. I got an 0.68 ROC AUC.

For the logistic regression, I played with the regularization parameter and got an 0.62 ROC AUC.

For the xgboost, I ran Bayesian optimization with Weights and Biases. I fixed the number of trees to 1000 and tuned the learning rate, min_child_weight, subsample, colsample_bytree, and max_depth. I got an 0.70 ROC AUC.

So far, the best model is xgboost (as expected).

Tomorrow I will see if I can get off-the-shelf TabNet to work or I will train my own neural network. I don’t expect it to work better than xgboost, but you never know.

In this entry, I am simply dumping my thoughts and reporting results, without barely any structure. That’s perfectly fine.

You don’t need to worry about writing perfect prose or including all the details of your work.

How Long Should You Write For?

I recommend setting a timer for 10-15 minutes and just writing until it goes off.

There were days when I just wrote for 2 minutes and called it a day, but as the goal is to help you dig into your thoughts, spending more time staring at the screen and thinking about your work can help you get better insights.

Benefits I Found About Writing Machine Learning Journals

Reflective Practice Brings Clarity

There is something about the act of writing that makes your brain look at your work from a different perspective.

When you are just thinking about your work, it’s easy to get lost in the details and forget the big picture.

But when you have to explain what you did in writing, you need to focus on the most important points and leave out the noise.

This process of condensing your thoughts into writing forces you to reflect on what you are doing and why you are doing it, which can help bring clarity to your work.

If you want hard science about it, take a look at the resources on MIT OpenCourseWare’s Reflective Practice course.

Feel You Are Making Progress

A big downside I feel about working with software is that we don’t feel the same concrete progress as a woodworker building a table.

He can see the table taking shape as he works, but with code, it’s easy to feel like you are just spinning your wheels and not getting anywhere.

Writing about your work can help with that.

It’s easy to look back at what you wrote a week ago and see how much progress you have made.

It’s a great way to get a sense of accomplishment, even on days when it feels like you are not getting anywhere.

Don’t Get Too Caught Up In The Structure

The questions I listed above are to help you kickstart the writing process. The blank page can be daunting sometimes.

If you feel like you can write without it, go ahead.

Don’t worry about having to follow it all the time. The goal is to get you thinking about your work and document what you did.

It Becomes Very Easy To Write The Final Official Documentation

After the project is complete, and you need to write the official documentation, instead of trying to remember everything you need, it’s much easier to look back at your journal and have most of it already laid out for you.

You can quickly go through your journal, pick out the most important points, and expand on them to create the documentation.

The final documentation becomes a very condensed version of your journal.

You Can Write A Blog

At the end of a project, take the best parts of your journal, polish it, remove any information you can’t share, and post it as a blog.

There are full articles written about the benefits of writing a blog, but the main ones are that it can help establish you as an expert, help you get better at writing and communicating, and show that you know how to communicate.

There is nothing that a tech recruiter likes more than finding a technical person that can also communicate their thoughts well.

It’s a rare combination and very profitable to those who can do it.

If you don’t want to make it public, that’s fine too. Having a written record of your thoughts is useful by itself.

Another Example Of A Machine Learning Development Journal

I recently saw that Meta’s OPT-175B developers published a full logbook of their development.

The logbook contains their initial design goals, decisions, and struggles to train such a huge model successfully.

I love to learn from other people’s thought processes.

Technically, having access to this raw version of their development journey helped me learn about mistakes I should avoid if I ever have to train a huge model.

Personally, looking only at the final results of a project can give the impression that everything went smoothly. But of course, that’s not the case.

It remembers me of the first time I teamed up with Kaggle competition winners and saw there was no magic or divine inspiration, just a lot of hard work and persistence.

Seeing that other data scientists struggle with the same issues I do helps me feel more connected to the community and motivates me to keep going.

Type docs.new And Start Writing Today

The only way to find out if journaling works for you is to try it.

Open a Google Doc and set a goal to write one sentence each day for the next 30 days.

Yes, just a single sentence.

You will see how on most days you will end up writing more.

After you do it for 30 days, feel free to send me a message sharing your experience.

Happy journaling! 🎉

PS: Stoics Loved Journaling

This part has nothing to do with machine learning or data science. So feel free to skip it.

If you love stoicism like me, you know that one of the things they recommend is to keep a daily journal.

The Meditations of Marcus Aurelius, which is one of the most popular stoic books, is literally Marcus’s diary.

All prominent Stoics talk about the importance of maintaining a journal and examining our day, here is Epictetus:

Do not go to bed until you have gone over what you did during the day. Ask yourself: “Where did I go wrong? What did I do wrong? And what still needs to be done?” Review your deeds from beginning to end, and then criticize yourself for your bad actions, but rejoice in the good ones.

I would not recommend criticizing yourself for bad actions, but just take note of them and try to do better next time.

Rejoicing in the good actions is a great way to end the day on a positive note and it will encourage you to do more good tomorrow.

What is a Daily Machine Learning Journal?#

How I Write A Daily Machine Learning Journal#

What I Did and Why#

What I Did and Expected#

What Were The Results#

An Entry From Further In The Project#

How Long Should You Write For?#

Benefits I Found About Writing Machine Learning Journals#

Reflective Practice Brings Clarity#

Feel You Are Making Progress#

Don’t Get Too Caught Up In The Structure#

It Becomes Very Easy To Write The Final Official Documentation#

You Can Write A Blog#

Another Example Of A Machine Learning Development Journal#

Type docs.new And Start Writing Today#

PS: Stoics Loved Journaling#