Progress1/6 lessons (17%)

Lesson 1

Logistic Regression for Binary Classification

Learn how Logistic Regression predicts binary outcomes using probability-based decision boundaries. This lesson covers theory, implementation, and practical use cases like spam detection and churn prediction.

10 min read21 views

Introduction to Classification

Classification is a supervised learning task where the goal is to predict categorical labels rather than continuous values. Unlike regression, which outputs numbers like price or temperature, classification outputs categories like "spam" or "not spam," "yes" or "no."

Types of Classification

Binary Classification: Two possible outcomes (e.g., email is spam or not spam)
Multi-class Classification: More than two outcomes (e.g., classifying images as cat, dog, or bird)
Multi-label Classification: Multiple labels can apply simultaneously (e.g., a movie being both "action" and "comedy")

This lesson focuses on binary classification using logistic regression.

What is Logistic Regression?

Despite its name, logistic regression is a classification algorithm, not a regression algorithm. It predicts the probability that an input belongs to a particular class.

Why Not Use Linear Regression for Classification?

Linear regression outputs continuous values from negative infinity to positive infinity. For classification, we need outputs between 0 and 1 representing probabilities. Linear regression cannot guarantee this constraint.

Real-world applications of logistic regression:

Email spam detection
Disease diagnosis (positive/negative)
Credit approval decisions
Customer churn prediction
Click-through rate prediction

The Sigmoid Function

The sigmoid function (also called the logistic function) transforms any input value into a probability between 0 and 1.

Sigmoid Formula

σ(z) = 1 / (1 + e^(-z))

Where:

z = linear combination of inputs (w₀ + w₁x₁ + w₂x₂ + ...)
e = Euler's number (approximately 2.718)
σ(z) = output probability between 0 and 1

Visualizing the Sigmoid Function

import numpy as np
import matplotlib.pyplot as plt

z = np.linspace(-10, 10, 100)
sigmoid = 1 / (1 + np.exp(-z))

plt.figure(figsize=(8, 5))
plt.plot(z, sigmoid, 'b-', linewidth=2)
plt.axhline(y=0.5, color='r', linestyle='--', label='Decision Threshold')
plt.axhline(y=0, color='gray', linestyle='-', alpha=0.3)
plt.axhline(y=1, color='gray', linestyle='-', alpha=0.3)
plt.xlabel('z (Linear Combination)')
plt.ylabel('σ(z) (Probability)')
plt.title('Sigmoid Function')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

The sigmoid function produces an S-shaped curve. Large positive values of z approach 1, large negative values approach 0, and z=0 produces exactly 0.5.

How Logistic Regression Works

The Complete Model

Logistic regression combines linear regression with the sigmoid function:

Step 1: z = w₀ + w₁x₁ + w₂x₂ + ... + wₙxₙ (linear combination)
Step 2: p = σ(z) = 1 / (1 + e^(-z)) (apply sigmoid)
Step 3: ŷ = 1 if p ≥ 0.5, else ŷ = 0 (make prediction)

Decision Boundary

The decision boundary is where the model switches from predicting one class to another. For logistic regression with a threshold of 0.5, this occurs when z = 0.

# Simple illustration of decision boundary concept
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

# Generate two classes
class_0 = np.random.randn(50, 2) + np.array([2, 2])
class_1 = np.random.randn(50, 2) + np.array([5, 5])

plt.figure(figsize=(8, 6))
plt.scatter(class_0[:, 0], class_0[:, 1], c='blue', label='Class 0', alpha=0.7)
plt.scatter(class_1[:, 0], class_1[:, 1], c='red', label='Class 1', alpha=0.7)

# Approximate decision boundary
x_line = np.linspace(0, 8, 100)
y_line = x_line  # Simple diagonal boundary for illustration
plt.plot(x_line, y_line, 'g--', linewidth=2, label='Decision Boundary')

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Binary Classification with Decision Boundary')
plt.legend()
plt.show()

Points on one side of the boundary are classified as class 0, and points on the other side as class 1.

The Cost Function: Log Loss

Logistic regression uses log loss (binary cross-entropy) as its cost function instead of mean squared error.

Log Loss Formula

Cost = -1/n × Σ[yᵢ × log(pᵢ) + (1-yᵢ) × log(1-pᵢ)]

Where:

yᵢ = actual label (0 or 1)
pᵢ = predicted probability
n = number of samples

Why Log Loss?

Heavily penalizes confident wrong predictions
Creates a convex optimization problem
Works naturally with probability outputs

Implementing Logistic Regression in Python

Step 1: Import Libraries and Create Data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import make_classification

# Generate synthetic binary classification data
X, y = make_classification(
    n_samples=1000,
    n_features=2,
    n_informative=2,
    n_redundant=0,
    n_clusters_per_class=1,
    random_state=42
)

print(f"Features shape: {X.shape}")
print(f"Target distribution: {np.bincount(y)}")

The make_classification function creates a synthetic dataset perfect for learning classification concepts.

Step 2: Visualize the Data

plt.figure(figsize=(8, 6))
plt.scatter(X[y==0, 0], X[y==0, 1], c='blue', label='Class 0', alpha=0.6)
plt.scatter(X[y==1, 0], X[y==1, 1], c='red', label='Class 1', alpha=0.6)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Binary Classification Dataset')
plt.legend()
plt.show()

Visualizing data helps you understand class separation and identify potential challenges.

Step 3: Split the Data

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")

The stratify=y parameter ensures both train and test sets have the same class distribution.

Step 4: Train the Logistic Regression Model

model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)

print(f"Coefficients: {model.coef_[0]}")
print(f"Intercept: {model.intercept_[0]:.4f}")

The fit() method learns the optimal weights that minimize the log loss function.

Step 5: Make Predictions

# Predict class labels
y_pred = model.predict(X_test)

# Predict probabilities
y_prob = model.predict_proba(X_test)

print("First 5 predictions:")
print(f"Predicted labels: {y_pred[:5]}")
print(f"Predicted probabilities:\n{y_prob[:5].round(3)}")

Output:

First 5 predictions:
Predicted labels: [1 0 1 0 1]
Predicted probabilities:
[[0.12  0.88 ]
 [0.91  0.09 ]
 [0.08  0.92 ]
 [0.85  0.15 ]
 [0.03  0.97 ]]

The predict_proba() method returns probabilities for both classes, which sum to 1.

Evaluating Classification Models

Accuracy Score

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")

Accuracy measures the proportion of correct predictions but can be misleading with imbalanced classes.

Confusion Matrix

cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)

Output:

Confusion Matrix:
[[92  8]
 [ 7 93]]

The confusion matrix shows:

True Negatives (TN): 92 - correctly predicted class 0
False Positives (FP): 8 - incorrectly predicted class 1
False Negatives (FN): 7 - incorrectly predicted class 0
True Positives (TP): 93 - correctly predicted class 1

Visualize Confusion Matrix

import seaborn as sns

plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Predicted 0', 'Predicted 1'],
            yticklabels=['Actual 0', 'Actual 1'])
plt.title('Confusion Matrix')
plt.show()

Classification Report

print("Classification Report:")
print(classification_report(y_test, y_pred))

Output:

Classification Report:
              precision    recall  f1-score   support

           0       0.93      0.92      0.92       100
           1       0.92      0.93      0.93       100

    accuracy                           0.93       200
   macro avg       0.93      0.93      0.93       200
weighted avg       0.93      0.93      0.93       200

Understanding the Metrics

Metric	Formula	Interpretation
Precision	TP / (TP + FP)	Of all positive predictions, how many were correct?
Recall	TP / (TP + FN)	Of all actual positives, how many were found?
F1-Score	2 × (Precision × Recall) / (Precision + Recall)	Harmonic mean of precision and recall

Visualizing the Decision Boundary

def plot_decision_boundary(model, X, y):
    """Plot decision boundary for 2D data."""
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    plt.figure(figsize=(10, 6))
    plt.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')
    plt.scatter(X[y==0, 0], X[y==0, 1], c='blue', label='Class 0', alpha=0.6)
    plt.scatter(X[y==1, 0], X[y==1, 1], c='red', label='Class 1', alpha=0.6)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('Logistic Regression Decision Boundary')
    plt.legend()
    plt.show()

plot_decision_boundary(model, X_test, y_test)

This visualization shows how the model separates the two classes with a linear boundary.

Adjusting the Classification Threshold

By default, logistic regression uses 0.5 as the probability threshold. Adjusting this threshold changes the balance between precision and recall.

# Get probabilities for positive class
y_prob_positive = model.predict_proba(X_test)[:, 1]

# Try different thresholds
thresholds = [0.3, 0.5, 0.7]

for threshold in thresholds:
    y_pred_custom = (y_prob_positive >= threshold).astype(int)
    
    cm = confusion_matrix(y_test, y_pred_custom)
    accuracy = accuracy_score(y_test, y_pred_custom)
    
    print(f"\nThreshold: {threshold}")
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Confusion Matrix: TN={cm[0,0]}, FP={cm[0,1]}, FN={cm[1,0]}, TP={cm[1,1]}")

When to adjust threshold:

Lower threshold: When missing positive cases is costly (disease detection)
Higher threshold: When false positives are costly (spam filtering)

Regularization in Logistic Regression

Scikit-learn's logistic regression includes L2 regularization by default, controlled by the C parameter.

# C is the inverse of regularization strength
# Smaller C = stronger regularization
models = {
    'Weak Regularization (C=100)': LogisticRegression(C=100),
    'Default (C=1)': LogisticRegression(C=1),
    'Strong Regularization (C=0.01)': LogisticRegression(C=0.01)
}

for name, model in models.items():
    model.fit(X_train, y_train)
    accuracy = model.score(X_test, y_test)
    print(f"{name}: Accuracy = {accuracy:.4f}")

Regularization prevents overfitting by penalizing large coefficient values.

Real-World Example: Email Spam Detection

# Simplified spam detection example
from sklearn.feature_extraction.text import CountVectorizer

# Sample emails
emails = [
    "Win free money now click here",
    "Meeting scheduled for tomorrow at 3pm",
    "Congratulations you won lottery",
    "Please review the attached report",
    "Free gift card waiting for you",
    "Project deadline extended to Friday",
    "Claim your prize immediately",
    "Quarterly review meeting notes"
]

labels = [1, 0, 1, 0, 1, 0, 1, 0]  # 1 = spam, 0 = not spam

# Convert text to features
vectorizer = CountVectorizer()
X_email = vectorizer.fit_transform(emails)

# Train model
spam_model = LogisticRegression()
spam_model.fit(X_email, labels)

# Test on new email
new_email = ["Free money win prize now"]
new_email_features = vectorizer.transform(new_email)

prediction = spam_model.predict(new_email_features)
probability = spam_model.predict_proba(new_email_features)

print(f"Prediction: {'Spam' if prediction[0] == 1 else 'Not Spam'}")
print(f"Spam probability: {probability[0][1]:.2%}")

This example demonstrates how logistic regression can classify text data after appropriate feature extraction.

Summary

Logistic regression is a fundamental binary classification algorithm that outputs probabilities using the sigmoid function.

Key takeaways:

Logistic regression predicts probabilities between 0 and 1
The sigmoid function transforms linear outputs into probabilities
Decision boundary separates classes at the threshold (default 0.5)
Log loss is the cost function optimized during training
Evaluate using accuracy, precision, recall, and F1-score
Confusion matrix provides detailed insight into prediction errors
Regularization (parameter C) prevents overfitting
Threshold adjustment balances precision and recall trade-offs

Related Lessons

K-Nearest Neighbors (KNN) Algorithm

Understand how the KNN algorithm classifies data based on similarity. This lesson explains distance metrics, choosing the right K value, and building accurate classification models.

Decision Trees

Explore Decision Trees and how they split data into meaningful decision rules. This lesson teaches tree-building, visualization, and practical classification applications.

Naive Bayes Classifier

Discover the Naive Bayes classifier, a fast and powerful algorithm based on probability and Bayes’ theorem. This lesson shows how it excels in text classification and other high‑dimensional tasks.