What is the Difference Between Probability and Logit?

Probability quantifies the likelihood of an event occurring, ranging from zero to one, whereas logit transforms this probability into a scale that spans all real numbers, representing the log-odds of the event.

Understanding the distinction between probability and logit is fundamental in various fields, particularly in statistics, machine learning, and data science. While both relate to the likelihood of an event, they operate on different scales and serve distinct purposes in modeling and interpretation.

Probability: The Likelihood Scale

Probability ($p$) is a measure of the likelihood that an event will occur. It is expressed as a number between 0 and 1, inclusive.

A probability of 0 indicates that the event is impossible.
A probability of 1 indicates that the event is certain to occur.
A probability of 0.5 indicates an equal chance of the event occurring or not occurring.

Probabilities are intuitive and easily understood by a wide audience, representing percentages when multiplied by 100 (e.g., a probability of 0.75 is a 75% chance).

Logit: The Log-Odds Scale

The logit is a function that transforms a probability ($p$) into the log-odds. Specifically, it's the natural logarithm of the odds. The odds of an event are calculated as $p / (1-p)$, which is the ratio of the probability of the event occurring to the probability of it not occurring.

The logit function is defined as:
$ \text{logit}(p) = \ln\left(\frac{p}{1-p}\right) $

Unlike probabilities, which are bounded between 0 and 1, logits can take on any real number from minus infinity to infinity ($\text{L} \in (-\infty, \infty)$). This unbounded nature is crucial for statistical modeling.

Why Logits are Used

Logits are not as intuitively understood as probabilities, but they offer significant advantages in statistical modeling, particularly in:

Linear Modeling: In models like logistic regression, the relationship between predictor variables and the outcome is often linear on the logit scale. This allows the use of standard linear regression techniques to model a binary (yes/no) outcome.
Unbounded Range: Since probabilities are bounded, directly modeling them with linear regression can lead to predictions outside the [0, 1] range. Transforming probabilities to logits removes this boundary, making the output compatible with linear models.
Symmetry: The logit scale is symmetrical around 0. A logit of 0 corresponds to a probability of 0.5 (even odds), positive logits correspond to probabilities greater than 0.5, and negative logits correspond to probabilities less than 0.5.

Converting Back to Probability

To convert a logit back to a probability, the inverse of the logit function, known as the logistic function or sigmoid function, is used:

$ p = \frac{1}{1 + e^{-\text{logit}}} $

This transformation ensures that the predicted probabilities always fall within the valid range of [0, 1].

Key Differences Summarized

Here's a comparison table highlighting the core differences between probability and logit:

Feature	Probability ($p$)	Logit ($\text{logit}(p)$)
Definition	Likelihood of an event occurring	Natural logarithm of the odds of an event
Range	$p \in [0, 1]$ (from zero to one)	$\text{L} \in (-\infty, \infty)$ (any real number)
Interpretation	Direct, intuitive likelihood (e.g., 75% chance)	Log-odds, less intuitive but suitable for linear modeling
Calculation	Directly observed or estimated	$\ln(p / (1-p))$
Use Case	Reporting likelihoods, decision making	Statistical modeling (e.g., logistic regression), machine learning algorithms

Practical Insights

Understanding Output: When working with models like logistic regression, the raw output is often in logit form. It's essential to convert these logits back into probabilities to make them interpretable in real-world contexts.
Model Building: Logits provide a convenient way to model probabilities when the relationship with predictors is non-linear on the probability scale but linear on the log-odds scale.
Sensitivity: Small changes in probability near 0 or 1 result in very large changes in logit, making the logit scale sensitive to extreme probabilities.

For example, a probability of 0.01 has a logit of approximately -4.59, while a probability of 0.99 has a logit of approximately 4.59. A probability of 0.5 always translates to a logit of 0.