What Is Normalized Cross Entropy?

Normalized Cross Entropy is a modified version of Cross Entropy that takes into account a baseline solution or “base level.”

It essentially measures the relative improvement of your model over the selected baseline solution.

The very popular Practical Lessons from Predicting Clicks on Ads at Facebook paper popularized the concept of Normalized Cross Entropy (NCE) as a metric for binary classification problems.

Although the reasons for including this metric in the paper were likely because they didn’t want people to know the actual cross entropy of their model, it can be a useful metric.

In the paper, the baseline solution is a model that predicts the same click-through rate for all ads.

What Is The Equation For Normalized Cross Entropy?

$$ \text{Normalized Cross Entropy} = \frac{-\frac{1}{N}\sum_{i=1}^{N}(y_i\log(p_i) + (1-y_i)\log(1-p_i))}{-(p_{base}\log(p_{base}) + (1-p_{base})\log(1-p_{base}))} $$

This equation differs slightly from the one used in the Facebook paper, because we are using $y$ in the range [0, 1] instead of {-1, 1}.

Here’s an explanation of each element, from the inside to the outside:

  1. $p_i$: The predicted probability for the positive class of sample $i$.
  2. $y_i$: The true label for sample $i$. In this case, $y_i$ can only be 0 or 1.
  3. $y_i\log(p_i)$: This term calculates the log loss contribution for the true positive class when $y_i = 1$. If $y_i = 0$, this term becomes 0.
  4. $(1-y_i)\log(1-p_i)$: This term calculates the log loss contribution for the true negative class when $y_i = 0$. If $y_i = 1$, this term becomes 0.
  5. $-\frac{1}{N}\sum_{i=1}^{N}$: This part is the negative average (sum divided by the number of samples, $N$) of the log loss for each prediction compared to the true label.
  6. $p_{base}\log(p_{base}) + (1-p_{base})\log(1-p_{base})$: This term is the baseline Cross Entropy, which is calculated from the base probabilities ($p_{base}$) as the denominator of the normalized metric.
    • $p_{base}$: The probability of the positive class in the baseline solution.

What Is The Difference Between Cross Entropy And Normalized Cross Entropy?

Cross-entropy is a metric that measures the dissimilarity between predicted probabilities and true probabilities of the classes in a classification problem.

It evaluates how well a model performs by quantifying the divergence between its predictions and the actual labels and it’s a suitable choice when you need to compare two probability distributions.

This can be helpful in tasks like language modeling, topic modeling, or other scenarios where the goal is to generate probabilities that closely resemble the true distribution.

Normalized Cross Entropy, on the other hand, is a modification of the cross entropy metric that takes into account a baseline solution.

By dividing the cross entropy of your model’s predictions by the cross entropy of the baseline solution, it provides a relative performance measure, allowing for a better comparison between different models.

How To Choose The Right Baseline For Calculating Normalized Cross Entropy?

The choice of the baseline solution depends on the problem you’re trying to solve and the domain knowledge you have about the problem.

In general, the baseline should be a simple or naive approach to the problem, something that can be easily computed and serves as a reference point for comparison.

For example, in a binary classification problem, a common baseline solution could be predicting the average positive rate of the true labels in the training set.

In the case of the Facebook paper mentioned earlier, the baseline solution was a model that predicted the same click-through rate for all ads.

How Does The Base Level Affect The Interpretation Of The Metric?

The base level in Normalized Cross Entropy helps you understand how well your model is performing relative to a simple or naive solution.

If the Normalized Cross Entropy is close to 1, it means that your model is performing similarly to the baseline solution, indicating that there’s a lot of room for improvement.

On the other hand, if the Normalized Cross Entropy is much lower than 1, it means that your model is significantly outperforming the baseline, suggesting that it’s doing a good job.

By considering the base level, the Normalized Cross Entropy metric allows for a more meaningful interpretation of your model’s performance, as it takes into account the difficulty of the problem and the performance of a simple solution.

Can Normalized Cross-entropy Be Used For Regression Problems?

Normalized Cross Entropy is primarily designed for classification problems, where there are discrete classes or labels to predict.

In regression problems, where the goal is to predict continuous values, other metrics are more suitable, such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), or Mean Absolute Error (MAE).

However, if you can convert a regression problem into a classification problem by discretizing the continuous targets, you might be able to use Normalized Cross Entropy in that context.

But keep in mind that this approach might not always be appropriate, and using a metric specifically designed for regression problems is typically preferred.

How To Calculate Normalized Cross Entropy In Python

First, you’ll need to have some predictions and actual answers (also called true labels) to compare your model against. Here’s an example:

import numpy as np

predictions = np.array([0.1, 0.3, 0.6, 0.9])
true_labels = np.array([0, 1, 1, 1])

Now, let’s calculate the Cross Entropy using the log_loss function from sklearn.metrics:

from sklearn.metrics import log_loss

cross_entropy = log_loss(true_labels, predictions)
print("Cross Entropy:", cross_entropy)

Next, we need to find the base level, which is the Cross Entropy when using a simple baseline model or the current deployed solution.

Let’s use a simple baseline model that predicts the average positive rate of the true labels.

To do this, we need to know the proportions of each class in the true labels:

proportions = np.mean(true_labels)
base_level = -proportions * np.log(proportions) - (1 - proportions) * np.log(1 - proportions)
print("Base Level:", base_level)

Finally, we can calculate the Normalized Cross Entropy by dividing the Cross Entropy by the base level:

normalized_cross_entropy = cross_entropy / base_level
print("Normalized Cross Entropy:", normalized_cross_entropy)

How To Calculate Normalized Cross Entropy For Validation Sets

When calculating it with a training and validation set, the base level should be calculated using the training set labels.

The exception is when you have another model that you want to use as the baseline solution.

In that case, your predictions will be generated by it for the samples in the validation set.

Let’s start by generating some example training and validation set data.

In this example, I’ll use randomly generated data, but you can replace this with your own dataset if you prefer.

import numpy as np

train_true_labels = np.random.randint(0, 2, 100)

val_predictions = np.random.rand(50)
val_true_labels = np.random.randint(0, 2, 50)

Now, we’ll calculate the Cross Entropy for our predictions on the validation set using the log_loss function from sklearn.metrics.

from sklearn.metrics import log_loss

val_cross_entropy = log_loss(val_true_labels, val_predictions)

print("Validation Cross Entropy:", val_cross_entropy)

Next, we need to find the base level, which is the Cross Entropy of our baseline solution.

As said before, we’ll use the proportion of positive labels in the training set as baseline.

train_proportions = np.mean(train_true_labels)
base_level = -train_proportions * np.log(train_proportions) - (1 - train_proportions) * np.log(1 - train_proportions)

print("Validation Set Base Level:", train_base_level)

Finally, we can calculate the Normalized Cross Entropy for the validation set by dividing its Cross Entropy by the base level:

val_normalized_cross_entropy = val_cross_entropy / base_level

print("Validation Normalized Cross Entropy:", val_normalized_cross_entropy)