How To Solve Logistic Regression Not Converging in Scikit-Learn

When using the Scikit-Learn library, you might encounter a situation where your logistic regression model does not converge. You may get a warning message similar to this: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( This article aims to help you understand why this happens and how to resolve it....

July 1, 2023 · 6 min · Mario Filho

Does Random Forest Need Feature Scaling or Normalization?

If you are using Random Forest as your machine learning model, you don’t need to worry about scaling or normalizing your features. Random Forest is a tree-based model and hence does not require feature scaling. Tree-based models are invariant to the scale of the features, which makes them very user-friendly as this step can be skipped during preprocessing. Still, in practice you can see different results when you scale your features because of the way numerical values are represented in computers....

June 29, 2023 · 5 min · Mario Filho

StandardScaler vs MinMaxScaler: What's the Difference?

The main differences between StandardScaler and MinMaxScaler lie in the way they scale the data, the range of values they produce, and the specific applications they’re suited for. StandardScaler subtracts the mean from each data point and then divides the result by the standard deviation. This results in a dataset with a mean of 0 and a standard deviation of 1. MinMaxScaler, on the other hand, subtracts the minimum value from each data point and then divides the result by the difference between the maximum and minimum values....

June 23, 2023 · 7 min · Mario Filho

How To Train A Logistic Regression Using Scikit-Learn (Python)

Logistic regression is a type of predictive model used in machine learning and statistics. Its purpose is to determine the likelihood of an outcome based on one or more input variables, also known as features. For example, logistic regression can be used to predict the probability of a customer churning, given their past interactions and demographic information. Difference Between Linear And Logistic Regression? Before diving into logistic regression, it’s important to understand its sibling model, linear regression....

June 21, 2023 · 17 min · Mario Filho

Bagging vs Boosting vs Stacking In Machine Learning

Bagging, boosting, and stacking are three ensemble learning techniques used to improve model performance. Bagging involves training multiple models independently on random subsets of data and then combining their predictions through a majority vote. Boosting focuses on correcting the errors made by previous weak models in a sequence to create a stronger model. Stacking combines multiple models by training a meta-model, which takes model predictions as input and outputs the final prediction....

May 2, 2023 · 22 min · Mario Filho

How To Get Feature Importance In Logistic Regression

Are you looking to make sense of your logistic regression model and determine which features are truly important in predicting your target variable? It can be quite frustrating trying to understand which features are driving your model’s predictions, especially when you have a large number of them. Not to mention, the presence of correlated features can make the task even more challenging. In this tutorial, I’ll walk you through different methods for assessing feature importance in both binary and multiclass logistic regression models....

March 30, 2023 · 7 min · Mario Filho

Time Series Clustering In Python With Scikit-Learn

Clustering is an unsupervised learning technique that can help you uncover hidden patterns in your time series data. Scikit-learn has a wide range of clustering algorithms, including K-means, DBSCAN, and Agglomerative Clustering. In this tutorial, we’ll explore how to use K-means with different transformations to cluster time series data. Using the right data transformations can help you get your desired results faster than just trying different clustering algorithms over the same data....

March 17, 2023 · 11 min · Mario Filho

Multi-Step Time Series Forecasting In Python

In this tutorial, I will explain two (and a half) methods to generate multi-step forecasts using time series data. They are the recursive or autoregressive method, the direct method, and a variant of the direct method with a single model. Preparing the Data We will use a dataset containing information about yearly rice production. import pandas as pd import numpy as np import os data = pd.read_csv(os.path.join(path, 'rice production across different countries from 1961 to 2021....

March 9, 2023 · 8 min · Mario Filho

Volatility Forecasting In Python

In this blog post, we will explore how we can use Python to forecast volatility using three methods: Naive, the popular GARCH and machine learning with scikit-learn. Volatility here is the standard deviation of the returns of a financial instrument. I will teach you starting points to kickstart your own research. Installing ARCH and mlforecast First we need to install the required packages. You can do it with pip: pip install arch pip install mlforecast or with conda:...

February 18, 2023 · 9 min · Mario Filho

Multiple Time Series Forecasting With Scikit-learn

Forecasting time series is a very common task in the daily life of a data scientist, which is surprisingly little covered in beginner machine learning courses. It can be predicting future demand for a product, city traffic or even the weather. With accurate time series forecasts, companies can adjust their production strategies, inventory management, resource allocation and other key decisions, leading to significant cost reduction and increased efficiency. Furthermore, forecasts also allow companies to be more proactive rather than reactive, anticipating market trends and adjusting their strategies accordingly....

February 8, 2023 · 15 min · Mario Filho