Clarifying: SVD (Singular Value Decomposition)

Apr 22, 2025

In NLP tasks, when creating word embeddings through counting principles, it is often necessary to reduce dimensions using the SVD algorithm. Additionally, in generative tasks, SVD is also a commonly used method to compare the similarity of two matrices.

This article will clarify the working principle of this algorithm for individuals with no background in algebra.

Why Use SVD

In linear algebra, SVD decomposes a matrix into the product of three special matrices.

Assuming you have understood the implementation principle of the co-occurrence matrix, we can see that the co-occurrence matrix contains many zeros, meaning it is very sparse.

Through the PPMI algorithm, we can ensure that each value is less than 1, but it is still very sparse, which can lead to wasted computational resources. Therefore, we use the SVD algorithm to transform this matrix into a dense matrix.

Note that there are many methods for dimensionality reduction, and SVD is one of the widely adopted methods.

Working Principle of SVD

The most basic formula for SVD is as follows:

$A = U \Sigma V^T$

In simple terms, a matrix is represented as the product of three matrices. Because of this property, it can reduce the size of matrices created in NLP or utilize the decomposed eigenvalues to compare matrix similarity.

Next, let's discuss the composition of these three matrices.

Image

Pay attention to the shape of each matrix. First, let's look at some concepts.

$V^T$ : The transpose of matrix V, which transforms the m×n matrix V's rows into columns of the same ordinal number, resulting in an n×m matrix. After two transpose operations, the original matrix will be obtained.
Orthogonal Matrix: If a matrix satisfies $Q^TQ=QQ^T=I$ (where I is the identity matrix and $Q^T$ is the transpose of Q), then we say that Q is an orthogonal matrix.
Covariance Matrix: Given matrix A, their covariance matrices are $A^TA$ and $AA^T$ .
Singular Values: The values on the diagonal of the singular value matrix (marked in the figure). For the singular value matrix, all values except those on the diagonal are zero.

Based on the initially provided formula, we can derive the following:

$AV = U \Sigma V^TV$

Since V is an orthogonal matrix, we can obtain:

$AV = U\Sigma$

Thus:

$Av\_i = u\_i \sigma\_i$

So the singular value is

$\sigma\_i = \frac{Av\_i}{u\_i}$

The question now becomes how to compute U and VT. In SVD, U and V are orthogonal matrices, and their column vectors are the eigenvectors from the covariance matrices of A ( $A^TA$ and $AA^T$ ). The singular values (the elements on the diagonal of Σ) are the square roots of the eigenvalues of A's covariance matrix. That is:

$\sigma\_i = \sqrt{\lambda\_i}$

It's okay if you don't understand this step; we will go through it step by step. First, you might wonder: How do we perform eigen decomposition on a matrix?

For a given square matrix $A$ , a non-zero vector v is an eigenvector of A if and only if Av is a scalar multiple of v, where this scalar is the eigenvalue corresponding to the eigenvector v. That is:

$Av = \lambda v$

Python Implementation

By observing the code implementation, you will gain a better understanding of this algorithm.

1import numpy as np
2
3# Input matrix A
4A = np.array([[4, 11, 14], [8, 7, -2]])
5
6# Calculate AAT and ATA
7AAT = A @ A.T
8ATA = A.T @ A
9
10# Calculate eigenvalues and eigenvectors
11eig_vals_U, eig_vecs_U = np.linalg.eig(AAT)
12eig_vals_V, eig_vecs_V = np.linalg.eig(ATA)
13
14# Sort eigenvalues in descending order
15idx_U = eig_vals_U.argsort()[::-1]
16idx_V = eig_vals_V.argsort()[::-1]
17
18# Use sorted indices to get corresponding eigenvectors and eigenvalues
19eig_vals_U = eig_vals_U[idx_U]
20eig_vecs_U = eig_vecs_U[:, idx_U]
21
22eig_vals_V = eig_vals_V[idx_V]
23eig_vecs_V = eig_vecs_V[:, idx_V]
24
25# Calculate diagonal matrix Sigma
26Sigma = np.zeros((A.shape[0], A.shape[1]))
27for i in range(min(A.shape)):
28    Sigma[i, i] = np.sqrt(eig_vals_U[i])
29
30# Construct U, Sigma, V matrices
31U = eig_vecs_U
32V = eig_vecs_V.T
33print("U:\n", U)
34print("Sigma:\n", Sigma)
35print("V:\n", V)