Matrix operations are fundamental to machine learning, enabling efficient computation and data transformations. This section covers key operations—addition, multiplication, transposition, inversion, and element-wise operations—highlighting their role in representing datasets, performing calculations, and implementing algorithms like linear regression and neural networks.
Like vectors, matrices of the same shape can be added or subtracted element-wise.
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Element-wise addition
C = A + B
print("A + B:")
print(C)
# [[ 6 8]
# [10 12]]
Matrix multiplication is fundamental to Machine Learning. It is NOT element-wise—it follows specific rules about combining rows and columns.
Rule: To multiply matrices A (m×n) and B (n×p), the inner dimensions must match. The result has shape (m×p).
import numpy as np
# Dataset: 3 samples, 2 features
X = np.array([
[1, 2],
[3, 4],
[5, 6]
])
# Weights: 2 features, 1 output
W = np.array([
[0.5],
[0.3]
])
# Matrix multiplication: predictions for all samples
predictions = X @ W # or np.dot(X, W)
print("Predictions shape:", predictions.shape) # (3, 1)
print("Predictions:")
print(predictions)
This single matrix multiplication computes predictions for all samples simultaneously—the foundation of efficient ML implementations.
import numpy as np
# Simple example
A = np.array([[1, 2],
[3, 4]])
B = np.array([[5, 6],
[7, 8]])
# Manual calculation of A @ B
# Row 1 of A dot Column 1 of B: 1*5 + 2*7 = 19
# Row 1 of A dot Column 2 of B: 1*6 + 2*8 = 22
# Row 2 of A dot Column 1 of B: 3*5 + 4*7 = 43
# Row 2 of A dot Column 2 of B: 3*6 + 4*8 = 50
result = A @ B
print("A @ B:")
print(result)
# [[19 22]
# [43 50]]
Each element in the result is a dot product between a row of the first matrix and a column of the second.
Sometimes you need true element-wise multiplication, called the Hadamard product.
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Element-wise multiplication
hadamard = A * B
print("Element-wise product:")
print(hadamard)
# [[ 5 12]
# [21 32]]
This differs from matrix multiplication and is used in operations like applying activation masks in neural networks.
The inverse of a matrix A, written A⁻¹, satisfies: A × A⁻¹ = I (identity matrix).
import numpy as np
A = np.array([[4, 7],
[2, 6]])
# Calculate inverse
A_inv = np.linalg.inv(A)
print("Inverse of A:")
print(A_inv)
# Verify: A @ A_inv should equal identity
result = A @ A_inv
print("\nA @ A_inv (should be identity):")
print(np.round(result, 10)) # Round to handle floating point
Matrix inversion is used in solving linear equations and in the closed-form solution for linear regression (Normal Equation).
Many ML problems reduce to solving systems of linear equations: Ax = b
import numpy as np
# System: 2x + 3y = 8
# 4x + y = 6
A = np.array([[2, 3],
[4, 1]])
b = np.array([8, 6])
# Solve for x and y
solution = np.linalg.solve(A, b)
print(f"Solution: x = {solution[0]:.2f}, y = {solution[1]:.2f}")
# Verify
print(f"Verification: A @ solution = {A @ solution}") # Should equal b
Rank indicates the number of linearly independent rows or columns—essentially, how much "unique information" the matrix contains.
import numpy as np
# Full rank matrix (all rows independent)
A = np.array([[1, 2],
[3, 4]])
print(f"Rank of A: {np.linalg.matrix_rank(A)}") # 2
# Rank-deficient matrix (row 2 = 2 × row 1)
B = np.array([[1, 2],
[2, 4]])
print(f"Rank of B: {np.linalg.matrix_rank(B)}") # 1
Low rank can indicate redundant features or multicollinearity in your dataset.
The closed-form solution for linear regression uses matrix operations:
Normal Equation: θ = (XᵀX)⁻¹Xᵀy
import numpy as np
# Training data
X = np.array([[1, 1], # Bias term and feature
[1, 2],
[1, 3],
[1, 4]])
y = np.array([2, 4, 5, 4])
# Normal equation
XtX = X.T @ X # X transpose times X
XtX_inv = np.linalg.inv(XtX) # Inverse
Xty = X.T @ y # X transpose times y
theta = XtX_inv @ Xty # Final parameters
print(f"Learned parameters: {theta}")
print(f"Intercept: {theta[0]:.2f}, Slope: {theta[1]:.2f}")
# Predictions
predictions = X @ theta
print(f"Predictions: {predictions}")
This demonstrates how matrix operations combine to solve ML problems elegantly.
Vectors and matrices are the core structures used to represent and manipulate data in machine learning. This section introduces these building blocks, explaining how they encode features, datasets, and transformations, and why efficient matrix operations are central to modern machine learning algorithms.
Linear transformations and eigenvalues are key concepts in machine learning for understanding how data is manipulated and represented. This section explains how linear transformations map vectors to new spaces and how eigenvalues and eigenvectors reveal important properties of matrices, which are essential in techniques like dimensionality reduction, PCA, and spectral analysis.