This is one of those things you only find out after getting a job as a data scientist.
Imagine that someone simply showed up with an app to you and said that if you follow what this app recommends, you will reach your goals in life.
Interviewing for jobs? Take the job the app ranks highest. Looking for a relationship? Date the person the app shows it’s the best match. Sick? Just take the medicine the app recommends.
Scary, right? You would need to know a lot about how that app works before you would feel comfortable using it for something important in your life.
The same goes for machine learning models. If you can’t explain to a decision-maker how your model works, they won’t trust it.
You have to remember that you spent hours and hours carefully building your solution, but the people that are going to use it didn’t, and it’s your job to explain why they can trust that this model will help them achieve their goals.
This is important both for models that people actively interact with and for those that run in the background making automated decisions.
Here is what works for me.
Make The Outputs Useful And Easy To Understand
This is the most important part.
How can you make the predictions of your model easily accessible in the most useful way for the user?
People must look at your solution and easily grasp how they can use it to improve their lives.
It can mean having a web app where they can input some information and get a prediction, receiving a daily report with predictions, or something else entirely.
I once watched a presentation by a Netflix data science team where they talked about an app they created where studio executives could input data from a movie like a script, and get predictions about how successful it would be.
Can you remember a time you gave up using some software after you saw how hard it would be to start?
It happens all the time to me.
I am looking at a new Python library and getting excited about how it can help me in my work, but then I see that the documentation is terrible, or there are no clear examples of how to use it.
The effort to make it work doesn’t seem worth the benefit when you are uncertain that this tool can help you.
There is a great book called “Don’t Make Me Think” about web usability. I like the title because it’s exactly what users (including you and me) want.
Even if your app will be used by the top 1% of geniuses of humanity, they are in a rush, with a thousand things on their mind.
The last thing they want is to spend hours and hours figuring out a system they don’t even know will really help them.
It’s very important to talk with your users and make sure you deeply understand the ideal final version of your solution so you can get close to it.
Focus On What They Will Gain
At the end of the day, what decision-makers care about is whether or not your machine learning model can help them achieve their goals.
Keep the conversation focused on how your model can be used to improve key metrics like revenue, engagement, or whatever else is important to the business.
It’s important to be able to explain the technical details of your model to other data scientists.
But when you’re talking to decision-makers, you need to focus on the business goals and leave out the technical jargon.
They don’t need to know about random forest classifiers and support vector machines. They just need to know that your model can help them improve their business.
Show That They Won’t Lose What They Have
I have seen this mostly in big corporations, where you have to convince multiple departments before you can deploy a solution.
Team A wants to gain something by using your model, but it will affect Team B’s metrics.
Think about Team B’s perspective: they have nothing to gain but they can lose a lot if you hurt their metrics.
The best thing you can do is talk with Team B to deeply understand their concerns and see if there are ways to address them.
Most times this is possible, even easier when you have higher-ups sponsoring your project, but I remember one situation where I gathered every evidence you can think of that my model was not responsible for their metrics going down and I still got the blame.
It happens, move on.
Use Simple Language
When you’re explaining your machine learning model to someone, it’s important to use language that they will understand.
Don’t try to impress them with technical jargon - instead, focus on communicating the key points in a way that is clear and concise.
Answer their questions even if they may look trivial to you. Don’t be a machine learning snob.
I found that having the patience to educate people about machine learning is one of the skills with the highest return on investment you can develop.
One of the best compliments I ever received after completing a job was how “down-to-earth” I was, different from other data scientists they have worked with.
Enough of the stereotype of the technical person that can’t explain things simply!
Use Analogies
Whenever possible, try to find an analogy to help explain your machine learning model.
If you’re trying to explain a decision tree, you could say that it’s “like asking a series of Yes/No questions to arrive at a decision.”
Analogies are helpful because they provide a framework that people can use to understand a concept that may be new to them.
And once they have that framework in place, it becomes much easier to grasp the details.
Use Concrete Examples
Another helpful way to explain your machine learning model is to use concrete examples.
Concrete examples make it easier for people to understand abstract concepts.
Take a few cases from your data set and walk through how your model would make a prediction for each one.
Even better if the examples you pick have predictions that are aligned with what is common sense for domain experts.
For instance, if you’re building a classification model to predict whether or not someone will default on a loan, show examples of the model automatically denying a loan that would be an easy denial by a human loan officer.
This helps to ground the conversation in something concrete that people can understand and visualize.
It also allows you to point out any potential concerns or edge cases that decision-makers should be aware of.
Use Visualizations
Whenever possible, use visualizations to help explain your machine learning models.
This is especially important when you’re dealing with high-dimensional data, as it can be difficult for people to understand what’s going on without being able to see it.
There are many different ways to visualize data, so find the one that makes the most sense for the data you’re working with and the point you’re trying to make.
Explain The Limitations Of Your Model
No machine learning model is perfect, and it’s important to be upfront about the limitations of your solution.
This will help people understand that your model is only one part of the decision-making process and that human expertise is still required.
It will also make it more likely that they will trust your model when you do get it right, as they will understand that it’s not just a black box.
I once heard from a VP that “if the model is making any wrong predictions, it means the data scientist didn’t do his job well”.
Poor guy! I wonder what hyped version of data science he was sold.
Every machine learning model will make wrong predictions sometimes - that’s just the way it is.
Be upfront about any limitations in your data, and explain how these limitations might impact your results.
For example, if you are trying to predict if a job candidate will be a good employee, you have to remember that you only know the performance of people that actually got hired.
You will never know how well someone that was rejected during the interviews would have done (a counterfactual), which biases your model.
And the company will not start hiring people randomly just so you can have a more well-rounded dataset!
Make sure these kinds of limitations are clear to the decision-makers that will be using your model, so they can interpret the results accordingly.
Make Yourself Available To Help
Finally, make sure you’re available to help when people have questions about your machine learning model.
You may not always be able to be there in person, but make sure you’re accessible by email, Slack, or whatever.
A side-effect is that you’ll build a better relationship with the people that are using your models, which will make it more likely that they will trust and use your models in the future and help convince other people that you know what you are doing.
Looking back, I regret the times I wanted to be the data scientist isolated in a room with Jupyter Notebooks.
Being more proactive in trying to build relationships with the people that were using my models would have made my job a lot easier in the long run, and I would have been able to make a bigger impact.
Big Picture Takeaway
It’s important to refine your technical skills so you are able to build better machine learning models.
But when you enter the job market, it’s just as important, if not more so, to focus on the soft skills required to be successful.
The ability to communicate effectively, build relationships, and explain your work will take you a lot further than being able to tune a random forest.
After all, what good is a perfect model if no one will use it?
I believe machine learning coding will be very different in the next 3-5 years. How? More automation, a lot more.
Back when I started, scikit-learn couldn’t take Pandas DataFrames as inputs and Theano was THE way to speed up neural networks. Now some of you might be thinking: “What is Theano?”. Things change fast around here.
It pays off to develop strategic thinking and communication skills.
Just don’t take it to the extreme and become all talk and no substance.
Balance is everything.