Is Coursera's Machine Learning Specialization Worth It In 2024? Review

Provider: DeepLearning.AI, Stanford University Teacher: Andrew Y. Ng Price: $49/month with a 7-day free trial Duration: Approx. 3 months if you study 9 hours per week Pre-requisites: basic Python programming skills, high school math Level: Beginner Certificate: Yes Stepping into the world of machine learning can be daunting, especially when you’re trying to decipher complex topics like regression models, learning algorithms and more. You could spend hours sifting through information online, but how can you know what’s reliable and what’s not?...

July 22, 2023 · 9 min · Mario Filho

How to Get Feature Importance in XGBoost in Python

You’ve chosen XGBoost as your algorithm, and now you’re wondering: “How do I figure out which features are the most important in my model?” That’s what ‘feature importance’ is all about. It’s a way of finding out which features in your data are doing the heavy lifting when it comes to your model’s predictions. Understanding which features are important can help you interpret your model better. Maybe you’ll find a feature you didn’t expect to be important....

July 18, 2023 · 6 min · Mario Filho

Machine Learning, Data Science and AI Experts To Follow On Threads

Everyone is struggling to improve the quality of their feeds on Threads (Instagram’s team new app) while the algorithm doesn’t figure out our interests. I found a few lists and have been following some of the best experts in the field of machine learning, data science and AI that are already active in the app. This is a living document, so reach me out on Threads if you have suggestions....

July 10, 2023 · 2 min · Mario Filho

How To Use LightGBM For Multi-Output Regression And Classification In Python

Today, we’re going to dive into the world of LightGBM and multi-output tasks. LightGBM is a powerful gradient boosting framework (like XGBoost) that’s widely used for various tasks. But what if you want to predict multiple outputs at once? That’s where multi-output regression and classification comes in. Unfortunately, LightGBM doesn’t support multi-output tasks directly, but we can use scikit-learn’s MultiOutputRegressor to get around this limitation. What Is Multi-Output Regression and Classification First, let’s break down what these terms mean....

July 6, 2023 · 5 min · Mario Filho

How To Solve Logistic Regression Not Converging in Scikit-Learn

When using the Scikit-Learn library, you might encounter a situation where your logistic regression model does not converge. You may get a warning message similar to this: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( This article aims to help you understand why this happens and how to resolve it....

July 1, 2023 · 6 min · Mario Filho

Does Random Forest Need Feature Scaling or Normalization?

If you are using Random Forest as your machine learning model, you don’t need to worry about scaling or normalizing your features. Random Forest is a tree-based model and hence does not require feature scaling. Tree-based models are invariant to the scale of the features, which makes them very user-friendly as this step can be skipped during preprocessing. Still, in practice you can see different results when you scale your features because of the way numerical values are represented in computers....

June 29, 2023 · 5 min · Mario Filho

StandardScaler vs MinMaxScaler: What's the Difference?

The main differences between StandardScaler and MinMaxScaler lie in the way they scale the data, the range of values they produce, and the specific applications they’re suited for. StandardScaler subtracts the mean from each data point and then divides the result by the standard deviation. This results in a dataset with a mean of 0 and a standard deviation of 1. MinMaxScaler, on the other hand, subtracts the minimum value from each data point and then divides the result by the difference between the maximum and minimum values....

June 23, 2023 · 7 min · Mario Filho

How To Train A Logistic Regression Using Scikit-Learn (Python)

Logistic regression is a type of predictive model used in machine learning and statistics. Its purpose is to determine the likelihood of an outcome based on one or more input variables, also known as features. For example, logistic regression can be used to predict the probability of a customer churning, given their past interactions and demographic information. Difference Between Linear And Logistic Regression? Before diving into logistic regression, it’s important to understand its sibling model, linear regression....

June 21, 2023 · 17 min · Mario Filho

How To Get Feature Importance In LightGBM (Python Example)

LightGBM is a popular gradient boosting framework that uses tree-based learning algorithms. These algorithms are excellent for handling tabular data and are widely used in various machine learning applications. One of the key aspects of understanding your model’s behavior is knowing which features contribute the most to its predictions, and that’s where feature importance comes into play. By the end of this guide, you’ll have a better grasp on the importance of your features and how to visualize them, which will help you improve your model’s performance and interpretability....

September 19, 2023 · 11 min · Mario Filho

A Guide To Normalized Cross Entropy

What Is Normalized Cross Entropy? Normalized Cross Entropy is a modified version of Cross Entropy that takes into account a baseline solution or “base level.” It essentially measures the relative improvement of your model over the selected baseline solution. The very popular Practical Lessons from Predicting Clicks on Ads at Facebook paper popularized the concept of Normalized Cross Entropy (NCE) as a metric for binary classification problems. Although the reasons for including this metric in the paper were likely because they didn’t want people to know the actual cross entropy of their model, it can be a useful metric....

June 9, 2023 · 6 min · Mario Filho