Understanding logistic regression coefficients
2025
Introduction: understanding logistic regression coefficients
Logistic regression is one of the most widely used techniques for classification problems. It is particularly useful when the goal is to predict a binary outcome, such as whether a customer will purchase a product, whether a patient has a disease, or whether an email is spam. However, while the mathematical framework behind logistic regression is relatively simple, interpreting its coefficients is not as straightforward as in linear regression.
Unlike linear regression, where coefficients represent direct changes in the dependent variable, logistic regression operates in the log-odds space. This means that each coefficient describes the effect of an independent variable on the logarithm of the odds of the event occurring, rather than on the probability itself. To derive meaningful insights, we need to understand how these coefficients translate into odds ratios, which provide a more intuitive interpretation of the model’s outputs.
In this article, we will break down the meaning of logistic regression coefficients step by step. We will explore key concepts such as odds, log-odds, and odds ratios, and show how to interpret model outputs correctly. We will also illustrate these ideas with examples, making it easier to understand how changes in predictors impact the likelihood of an event.
By the end of this article, you will have a clear understanding of:
- What logistic regression coefficients represent,
- How to compute and interpret odds ratios,
- The difference between odds, probability, and log-odds,
- Why a unit increase in a predictor does not lead to a linear probability change.
Let’s dive into the fundamentals of logistic regression coefficients and their interpretation.
Odds, Log-Odds, and Odds Ratios in Logistic Regression
The coefficients \(w_i\) in logistic regression are not interpreted in the same way as in linear regression. Instead, they represent the logarithm of the odds ratio associated with each predictor variable. This means that a one-unit increase in a variable does not correspond to a linear increase in probability but rather to a multiplicative change in the odds.
-
Odds: The ratio between the probability of an event occurring and the probability of it not occurring.
\[\text{Odds} = \frac{\text{Probability of Event}}{1 - \text{Probability of Event}}\] -
Odds Ratio (OR): A measure of how much the odds change for a one-unit increase in an independent variable.
-
Coefficient (\(w_i\)): Represents the change in log-odds of the outcome for each unit increase in \(X_i\).
\[\text{Log-Odds} = w_0 + w_1X_1 + w_2X_2 + \dots + w_nX_n\] -
Interpreting the log-odds:
- Value > 0: Increases the log-odds of the event occurring.
- Value = 0: Does not change the log-odds.
- Value < 0: Decreases the log-odds of the event occurring.
To interpret the coefficients in a more intuitive way, we take the exponential of the coefficient:
\[\text{Odds Ratio} = e^{w_i}\]- Interpreting the Odds Ratio:
- Value > 1: Increases the odds of the event occurring.
- Value = 1: Does not change the odds.
- Value < 1: Decreases the odds of the event occurring.
Simple Example: Modeling Disease Probability
Suppose we are modeling the probability of a person having a disease based on their age and smoking status.
\[\text{Log-Odds} = w_0 + w_1 (\text{Age}) + w_2 (\text{Smoker})\]With the following estimated coefficients:
- \[w_0 = -2\]
- \[w_1 = 0.05\]
- \[w_2 = 1.5\]
- Age Coefficient (\(w_1 = 0.05\), continuous variable):
- Log-Odds: For each additional year of age, the log-odds of having the disease increase by 0.05.
- Odds Ratio: \(e^{0.05} \approx 1.051\)
- Interpretation: Each additional year of age increases the odds of having the disease by approximately 5.1%.
(This does not mean the probability increases by 5.1%—the effect is nonlinear.)
- Smoker Coefficient (\(w_2 = 1.5\), categorical variable):
- Log-Odds: If a person is a smoker (\(X_2 = 1\)), the log-odds of having the disease increase by 1.5 compared to non-smokers (\(X_2 = 0\)).
- Odds Ratio: \(e^{1.5} \approx 4.48\)
- Interpretation: Smokers have 4.48 times the odds of having the disease compared to non-smokers.
Converting Odds to Probability
To make the result more interpretable, we can convert odds to probability:
\[P = \frac{\text{Odds}}{1 + \text{Odds}}\]For example, if a non-smoker has an estimated probability of disease \(P_0\), a smoker’s probability can be found by multiplying the odds by 4.48 and then applying the formula above.
Let’s calculate the probabilities for a 40-year-old non-smoker and a 40-year-old smoker using the coefficients provided.
Case 1: A 40-year-old non-smoker (\(X_2 = 0\))
-
Compute log-odds:
\[\text{Log-Odds} = -2 + (0.05 \times 40) + (1.5 \times 0)\] \[\text{Log-Odds} = -2 + 2 = 0\] -
Compute odds:
\[\text{Odds} = e^0 = 1\] -
Compute probability:
\[P = \frac{1}{1 + 1} = \frac{1}{2} = 0.5\]
Interpretation: A 40-year-old non-smoker has a 50% probability of having the disease.
Case 2: A 40-year-old smoker (\(X_2 = 1\))
-
Compute log-odds:
\[\text{Log-Odds} = -2 + (0.05 \times 40) + (1.5 \times 1)\] \[\text{Log-Odds} = -2 + 2 + 1.5 = 1.5\] -
Compute odds:
\[\text{Odds} = e^{1.5} \approx 4.48\] -
Compute probability:
\[P = \frac{4.48}{1 + 4.48} = \frac{4.48}{5.48} \approx 0.817\]
Interpretation: A 40-year-old smoker has an 81.7% probability of having the disease.
See that the probability in the smoker case is higher than in the non-smoker case, reflecting the higher odds associated with smoking status. The third step in the calculation shows that the odds ratio of 4.48 translates to a probability increase from 50% to 81.7%. A nonlinear change.
Key Takeaways
- The logistic regression model allows probability estimation by converting log-odds into probabilities.
- The relationship between predictors and probability is nonlinear — increases in age or smoking status multiply the odds rather than adding a fixed percentage to probability.
- The model outputs a probability between 0 and 1, which makes logistic regression suitable for classification tasks.
More advanced example: marketing campaign success prediction
In a more complex scenario, we might model the success of a marketing campaign based on customer demographics, purchase history, and campaign-specific features. The logistic regression model would provide insights into how each feature influences the likelihood of a successful campaign.
Actually, I’ve made a project on this topic. You can check it out here. I’m going to use the coefficients from that project to illustrate how to interpret the odds ratios in a more realistic setting.
Just a quick context: the project aimed to predict the success of a marketing campaign based on customer data. The model included features such as age, income, purchase history, and campaign-specific variables. A customer segmentation was performed to understand how different groups responded to the campaign. Five segments were identified: Dormant, Occasional, Engaged, Valuable, and Elite. The features and segments were used to train a regularized logistic regression model. The goal was to understand which factors influence the success of the campaign and predict the likelihood of a customer segment responding positively to the campaign. Conversion was defined as a customer accepting an offer or making a purchase.
Scikit-Learn’s LogisticRegression
model was used to fit the data. The coefficients from
the model are as follows:
Feature | Coefficient |
---|---|
one_hot_encoding__Segment_Dormant | -2.782 |
power_transformer__MntWines | -1.228 |
one_hot_encoding__Segment_Occasional | -1.152 |
power_transformer__NumStorePurchases | -1.126 |
power_transformer__MntRegularProds | -0.806 |
one_hot_encoding__HasChildren_1 | -0.690 |
one_hot_encoding__Complain_1 | -0.548 |
power_transformer__MntFishProducts | -0.434 |
ordinal_encoding__AgeGroup | -0.043 |
power_transformer__MntFruits | -0.026 |
power_transformer__MntSweetProducts | -0.019 |
standard_scaling__Income | -0.016 |
power_transformer__MntGoldProds | 0.055 |
one_hot_encoding__Segment_Engaged | 0.183 |
ordinal_encoding__Children | 0.212 |
power_transformer__NumWebPurchases | 0.256 |
one_hot_encoding__HasAcceptedCmp_1 | 0.352 |
power_transformer__NumCatalogPurchases | 0.446 |
ordinal_encoding__Education | 0.458 |
standard_scaling__MonthsSinceEnrolled | 1.017 |
one_hot_encoding__Marital_Status_Single | 1.142 |
one_hot_encoding__Segment_Valuable | 1.381 |
power_transformer__MntMeatProducts | 1.498 |
ordinal_encoding__AcceptedCmpTotal | 1.573 |
one_hot_encoding__Segment_Elite | 2.179 |
As can be seen, the categorical features underwent one-hot encoding or ordinal encoding, while the numerical features were scaled or transformed. The coefficients represent the effect of each feature on the log-odds of a customer segment responding positively to the campaign.
The coefficients are the weights that the model assigns to each feature. The higher the absolute value of the coefficient, the more important the feature is in predicting the probability of conversion. The sign of the coefficient indicates the direction of the relationship between the feature and the target variable. A positive coefficient means that the feature has a positive impact on the probability of conversion, while a negative coefficient means that the feature has a negative impact on the probability of conversion.
We can summarize the coefficients and their impact on the probability of conversion as follows:
- \(w_i > 0\): Feature has a positive impact on the probability of conversion.
- \(w_i < 0\): Feature has a negative impact on the probability of conversion.
- \(w_i = 0\): Feature has little or no impact on the probability of conversion.
So, we clearly see that:
Elite
segment has the highest positive impact on the probability of conversion;- The more accepted offers a customer has, the higher the probability of conversion;
- Customers which expend more on meat products have a higher probability of conversion;
Dormant
segment has the highest negative impact on the probability of conversion;- Customers which expend more on wine products have a lower probability of conversion.
Features with zero or near-zero coefficients have little or no impact on the probability of conversion. We are using a regularized logistic regression model, which means that the model is penalized for having too many features. This is why some features have zero or near-zero coefficients. The model is trying to reduce the number of features to make the model simpler and more interpretable. Then, we can see that the following features have little or no impact on the probability of conversion:
- gold products;
- income;
- sweet and fruit products;
- age group;
Again, this does not mean that these features are not important. It means that the model is trying to reduce the number of features to make the model simpler and more interpretable. The remaining features are the most important features in predicting the probability of conversion.
The odds ratio is a more intuitive measure than the log-odds, as it represents the change in the odds of the target variable for a one-unit change in the feature. An odds ratio greater than 1 means that the feature has a positive impact on the probability of conversion, while an odds ratio less than 1 means that the feature has a negative impact on the probability of conversion.
Here are the odds ratios for the features in the model:
Feature | Odds Ratio |
---|---|
one_hot_encoding__Segment_Dormant | 0.062 |
power_transformer__MntWines | 0.293 |
one_hot_encoding__Segment_Occasional | 0.316 |
power_transformer__NumStorePurchases | 0.324 |
power_transformer__MntRegularProds | 0.447 |
one_hot_encoding__HasChildren_1 | 0.502 |
one_hot_encoding__Complain_1 | 0.578 |
power_transformer__MntFishProducts | 0.648 |
ordinal_encoding__AgeGroup | 0.958 |
power_transformer__MntFruits | 0.975 |
power_transformer__MntSweetProducts | 0.981 |
standard_scaling__Income | 0.984 |
power_transformer__MntGoldProds | 1.056 |
one_hot_encoding__Segment_Engaged | 1.201 |
ordinal_encoding__Children | 1.236 |
power_transformer__NumWebPurchases | 1.292 |
one_hot_encoding__HasAcceptedCmp_1 | 1.422 |
power_transformer__NumCatalogPurchases | 1.562 |
ordinal_encoding__Education | 1.582 |
standard_scaling__MonthsSinceEnrolled | 2.765 |
one_hot_encoding__Marital_Status_Single | 3.132 |
one_hot_encoding__Segment_Valuable | 3.980 |
power_transformer__MntMeatProducts | 4.472 |
ordinal_encoding__AcceptedCmpTotal | 4.822 |
one_hot_encoding__Segment_Elite | 8.840 |
When interpreting the odds ratios, we can say that:
- An odds ratio greater than 1 means that the feature has a positive impact on the probability of conversion.
- An odds ratio less than 1 means that the feature has a negative impact on the probability of conversion.
- An odds ratio equal to 1 means that the feature has no impact on the probability of conversion.
Using MntMeatProducts
as an example for a numerical feature, we see that the odds
ratio is ~4.47. This means that a one-unit increase in the MntMeatProducts
feature
multiplies the odds of the target variable by 4.47. This corresponds to a 347% increase
in the odds (i.e., the odds become 4.47 times larger).
For HasAcceptedCmp_1
, a binary feature, we have to consider that we have set the
OneHotEncoder to drop the first category when dealing with binary features. So, the
reference category is the category that was dropped. In this case, the reference
category is 0
, that is why there is not HasAcceptedCmp_0
in the coefficients list.
The odds ratio is ~1.42. This means that customers who have accepted at least one
previous campaign have 1.42 times the odds of converting compared to customers who have
not accepted any campaign.
Finally, considering now a multi-class feature, Segment
. We have set the OneHotEncoder
to drop the first category only for binary features. Since Segment
has 5 categories,
we have 5 coefficients. It is easier to interpret the odds ratio of a multi-class
feature by comparing the odds ratio of each category to a reference category. This can
be done dividing the odds ratio of each category by the odds ratio of the reference
category. Let’s consider the Dormant category as the reference category. The odds ratio
of the Dormant category is 1.
Applying this to your data:
Segment | Absolute Odds Ratio | Adjusted (Relative to Dormant) |
---|---|---|
Dormant | 0.0619 | 1.00 (Baseline) |
Occasional | 0.3161 | \(\frac{0.3161}{0.0619} = 5.11\) |
Engaged | 1.2011 | \(\frac{1.2011}{0.0619} = 19.40\) |
Valuable | 3.9805 | \(\frac{3.9805}{0.0619} = 64.28\) |
Elite | 8.8405 | \(\frac{8.8405}{0.0619} = 142.76\) |
- Occasional customers are 5.1 times more likely to convert compared to Dormant customers.
- Engaged customers are 19.4 times more likely than Dormant customers.
- Valuable customers are 64.3 times more likely than Dormant customers.
- Elite customers are 142.8 times more likely than Dormant customers.
Final Thoughts and Takeaways
Understanding logistic regression coefficients is essential for interpreting the results of a logistic regression model. The coefficients represent the effect of each feature on the log-odds of the target variable. By converting the log-odds to odds ratios, we can interpret the impact of each feature on the probability of the target variable.
Key takeaways from this article include:
- Logistic regression coefficients represent the change in log-odds of the target variable for a one-unit change in the feature.
- Odds ratios provide a more intuitive interpretation of the coefficients, showing how a one-unit change in the feature affects the odds of the target variable.
- Odds ratios greater than 1 indicate a positive impact on the probability of the target variable, while odds ratios less than 1 indicate a negative impact.
- The relationship between features and the target variable is nonlinear in logistic regression, making it important to interpret the coefficients correctly.
- Regularized logistic regression models can help simplify the model and identify the most important features for predicting the target variable.
- Comparing the odds ratios of different categories of a multi-class feature to a reference category can help understand the relative impact of each category on the target variable.
I hope this article has provided you with a clear understanding of logistic regression coefficients and how to interpret them. Check out the project on marketing campaign success prediction for a practical application of these concepts. If you have any questions or feedback, feel free to reach out. Happy modeling!