Learn how Logistic Regression predicts binary outcomes using probability-based decision boundaries. This lesson covers theory, implementation, and practical use cases like spam detection and churn prediction.
Classification is a supervised learning task where the goal is to predict categorical labels rather than continuous values. Unlike regression, which outputs numbers like price or temperature, classification outputs categories like "spam" or "not spam," "yes" or "no."
This lesson focuses on binary classification using logistic regression.
Despite its name, logistic regression is a classification algorithm, not a regression algorithm. It predicts the probability that an input belongs to a particular class.
Linear regression outputs continuous values from negative infinity to positive infinity. For classification, we need outputs between 0 and 1 representing probabilities. Linear regression cannot guarantee this constraint.
Real-world applications of logistic regression:
The sigmoid function (also called the logistic function) transforms any input value into a probability between 0 and 1.
σ(z) = 1 / (1 + e^(-z))
Where:
import numpy as np
import matplotlib.pyplot as plt
z = np.linspace(-10, 10, 100)
sigmoid = 1 / (1 + np.exp(-z))
plt.figure(figsize=(8, 5))
plt.plot(z, sigmoid, 'b-', linewidth=2)
plt.axhline(y=0.5, color='r', linestyle='--', label='Decision Threshold')
plt.axhline(y=0, color='gray', linestyle='-', alpha=0.3)
plt.axhline(y=1, color='gray', linestyle='-', alpha=0.3)
plt.xlabel('z (Linear Combination)')
plt.ylabel('σ(z) (Probability)')
plt.title('Sigmoid Function')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
The sigmoid function produces an S-shaped curve. Large positive values of z approach 1, large negative values approach 0, and z=0 produces exactly 0.5.
Logistic regression combines linear regression with the sigmoid function:
Step 1: z = w₀ + w₁x₁ + w₂x₂ + ... + wₙxₙ (linear combination)
Step 2: p = σ(z) = 1 / (1 + e^(-z)) (apply sigmoid)
Step 3: ŷ = 1 if p ≥ 0.5, else ŷ = 0 (make prediction)
The decision boundary is where the model switches from predicting one class to another. For logistic regression with a threshold of 0.5, this occurs when z = 0.
# Simple illustration of decision boundary concept
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
# Generate two classes
class_0 = np.random.randn(50, 2) + np.array([2, 2])
class_1 = np.random.randn(50, 2) + np.array([5, 5])
plt.figure(figsize=(8, 6))
plt.scatter(class_0[:, 0], class_0[:, 1], c='blue', label='Class 0', alpha=0.7)
plt.scatter(class_1[:, 0], class_1[:, 1], c='red', label='Class 1', alpha=0.7)
# Approximate decision boundary
x_line = np.linspace(0, 8, 100)
y_line = x_line # Simple diagonal boundary for illustration
plt.plot(x_line, y_line, 'g--', linewidth=2, label='Decision Boundary')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Binary Classification with Decision Boundary')
plt.legend()
plt.show()
Points on one side of the boundary are classified as class 0, and points on the other side as class 1.
Logistic regression uses log loss (binary cross-entropy) as its cost function instead of mean squared error.
Cost = -1/n × Σ[yᵢ × log(pᵢ) + (1-yᵢ) × log(1-pᵢ)]
Where:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import make_classification
# Generate synthetic binary classification data
X, y = make_classification(
n_samples=1000,
n_features=2,
n_informative=2,
n_redundant=0,
n_clusters_per_class=1,
random_state=42
)
print(f"Features shape: {X.shape}")
print(f"Target distribution: {np.bincount(y)}")
The make_classification function creates a synthetic dataset perfect for learning classification concepts.
plt.figure(figsize=(8, 6))
plt.scatter(X[y==0, 0], X[y==0, 1], c='blue', label='Class 0', alpha=0.6)
plt.scatter(X[y==1, 0], X[y==1, 1], c='red', label='Class 1', alpha=0.6)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Binary Classification Dataset')
plt.legend()
plt.show()
Visualizing data helps you understand class separation and identify potential challenges.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")
The stratify=y parameter ensures both train and test sets have the same class distribution.
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)
print(f"Coefficients: {model.coef_[0]}")
print(f"Intercept: {model.intercept_[0]:.4f}")
The fit() method learns the optimal weights that minimize the log loss function.
# Predict class labels
y_pred = model.predict(X_test)
# Predict probabilities
y_prob = model.predict_proba(X_test)
print("First 5 predictions:")
print(f"Predicted labels: {y_pred[:5]}")
print(f"Predicted probabilities:\n{y_prob[:5].round(3)}")
Output:
First 5 predictions:
Predicted labels: [1 0 1 0 1]
Predicted probabilities:
[[0.12 0.88 ]
[0.91 0.09 ]
[0.08 0.92 ]
[0.85 0.15 ]
[0.03 0.97 ]]
The predict_proba() method returns probabilities for both classes, which sum to 1.
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")
Accuracy measures the proportion of correct predictions but can be misleading with imbalanced classes.
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)
Output:
Confusion Matrix:
[[92 8]
[ 7 93]]
The confusion matrix shows:
import seaborn as sns
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=['Predicted 0', 'Predicted 1'],
yticklabels=['Actual 0', 'Actual 1'])
plt.title('Confusion Matrix')
plt.show()
print("Classification Report:")
print(classification_report(y_test, y_pred))
Output:
Classification Report:
precision recall f1-score support
0 0.93 0.92 0.92 100
1 0.92 0.93 0.93 100
accuracy 0.93 200
macro avg 0.93 0.93 0.93 200
weighted avg 0.93 0.93 0.93 200
| Metric | Formula | Interpretation |
|---|---|---|
| Precision | TP / (TP + FP) | Of all positive predictions, how many were correct? |
| Recall | TP / (TP + FN) | Of all actual positives, how many were found? |
| F1-Score | 2 × (Precision × Recall) / (Precision + Recall) | Harmonic mean of precision and recall |
def plot_decision_boundary(model, X, y):
"""Plot decision boundary for 2D data."""
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
np.linspace(y_min, y_max, 200))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')
plt.scatter(X[y==0, 0], X[y==0, 1], c='blue', label='Class 0', alpha=0.6)
plt.scatter(X[y==1, 0], X[y==1, 1], c='red', label='Class 1', alpha=0.6)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Logistic Regression Decision Boundary')
plt.legend()
plt.show()
plot_decision_boundary(model, X_test, y_test)
This visualization shows how the model separates the two classes with a linear boundary.
By default, logistic regression uses 0.5 as the probability threshold. Adjusting this threshold changes the balance between precision and recall.
# Get probabilities for positive class
y_prob_positive = model.predict_proba(X_test)[:, 1]
# Try different thresholds
thresholds = [0.3, 0.5, 0.7]
for threshold in thresholds:
y_pred_custom = (y_prob_positive >= threshold).astype(int)
cm = confusion_matrix(y_test, y_pred_custom)
accuracy = accuracy_score(y_test, y_pred_custom)
print(f"\nThreshold: {threshold}")
print(f"Accuracy: {accuracy:.4f}")
print(f"Confusion Matrix: TN={cm[0,0]}, FP={cm[0,1]}, FN={cm[1,0]}, TP={cm[1,1]}")
When to adjust threshold:
Scikit-learn's logistic regression includes L2 regularization by default, controlled by the C parameter.
# C is the inverse of regularization strength
# Smaller C = stronger regularization
models = {
'Weak Regularization (C=100)': LogisticRegression(C=100),
'Default (C=1)': LogisticRegression(C=1),
'Strong Regularization (C=0.01)': LogisticRegression(C=0.01)
}
for name, model in models.items():
model.fit(X_train, y_train)
accuracy = model.score(X_test, y_test)
print(f"{name}: Accuracy = {accuracy:.4f}")
Regularization prevents overfitting by penalizing large coefficient values.
# Simplified spam detection example
from sklearn.feature_extraction.text import CountVectorizer
# Sample emails
emails = [
"Win free money now click here",
"Meeting scheduled for tomorrow at 3pm",
"Congratulations you won lottery",
"Please review the attached report",
"Free gift card waiting for you",
"Project deadline extended to Friday",
"Claim your prize immediately",
"Quarterly review meeting notes"
]
labels = [1, 0, 1, 0, 1, 0, 1, 0] # 1 = spam, 0 = not spam
# Convert text to features
vectorizer = CountVectorizer()
X_email = vectorizer.fit_transform(emails)
# Train model
spam_model = LogisticRegression()
spam_model.fit(X_email, labels)
# Test on new email
new_email = ["Free money win prize now"]
new_email_features = vectorizer.transform(new_email)
prediction = spam_model.predict(new_email_features)
probability = spam_model.predict_proba(new_email_features)
print(f"Prediction: {'Spam' if prediction[0] == 1 else 'Not Spam'}")
print(f"Spam probability: {probability[0][1]:.2%}")
This example demonstrates how logistic regression can classify text data after appropriate feature extraction.
Logistic regression is a fundamental binary classification algorithm that outputs probabilities using the sigmoid function.
Key takeaways:
Understand how the KNN algorithm classifies data based on similarity. This lesson explains distance metrics, choosing the right K value, and building accurate classification models.
Discover the Naive Bayes classifier, a fast and powerful algorithm based on probability and Bayes’ theorem. This lesson shows how it excels in text classification and other high‑dimensional tasks.
Master Support Vector Machines for high‑accuracy classification. Learn how SVMs create optimal boundaries and handle linear and nonlinear data with kernel functions.