Clarifying: SVD (Singular Value Decomposition)

In NLP tasks, when creating word embeddings through counting principles, it is often necessary to reduce dimensions using the SVD algorithm. Additionally, in generative tasks, SVD is also a commonly used method to compare the similarity of two matrices.

This article will clarify the working principle of this algorithm for individuals with no background in algebra.

Why Use SVD

In linear algebra, SVD decomposes a matrix into the product of three special matrices.

Assuming you have understood the implementation principle of the co-occurrence matrix, we can see that the co-occurrence matrix contains many zeros, meaning it is very sparse.

Through the PPMI algorithm, we can ensure that each value is less than 1, but it is still very sparse, which can lead to wasted computational resources. Therefore, we use the SVD algorithm to transform this matrix into a dense matrix.

Note that there are many methods for dimensionality reduction, and SVD is one of the widely adopted methods.

Working Principle of SVD

The most basic formula for SVD is as follows:

A=UΣVTA = U \Sigma V^T

In simple terms, a matrix is represented as the product of three matrices. Because of this property, it can reduce the size of matrices created in NLP or utilize the decomposed eigenvalues to compare matrix similarity.

Next, let's discuss the composition of these three matrices.

Image
Image

Pay attention to the shape of each matrix. First, let's look at some concepts.

  • VTV^T: The transpose of matrix V, which transforms the m×n matrix V's rows into columns of the same ordinal number, resulting in an n×m matrix. After two transpose operations, the original matrix will be obtained.
  • Orthogonal Matrix: If a matrix satisfies QTQ=QQT=IQ^TQ=QQ^T=I (where I is the identity matrix and QTQ^T is the transpose of Q), then we say that Q is an orthogonal matrix.
  • Covariance Matrix: Given matrix A, their covariance matrices are ATAA^TA and AATAA^T.
  • Singular Values: The values on the diagonal of the singular value matrix (marked in the figure). For the singular value matrix, all values except those on the diagonal are zero.

Based on the initially provided formula, we can derive the following:

AV=UΣVTVAV = U \Sigma V^TV

Since V is an orthogonal matrix, we can obtain:

AV=UΣAV = U\Sigma

Thus:

Av_i=u_iσ_iAv\_i = u\_i \sigma\_i

So the singular value is

σ_i=Av_iu_i\sigma\_i = \frac{Av\_i}{u\_i}

The question now becomes how to compute U and VT. In SVD, U and V are orthogonal matrices, and their column vectors are the eigenvectors from the covariance matrices of A (ATAA^TA and AATAA^T). The singular values (the elements on the diagonal of Σ) are the square roots of the eigenvalues of A's covariance matrix. That is:

σ_i=λ_i\sigma\_i = \sqrt{\lambda\_i}

It's okay if you don't understand this step; we will go through it step by step. First, you might wonder: How do we perform eigen decomposition on a matrix?

For a given square matrix AA, a non-zero vector v is an eigenvector of A if and only if Av is a scalar multiple of v, where this scalar is the eigenvalue corresponding to the eigenvector v. That is:

Av=λvAv = \lambda v

Python Implementation

By observing the code implementation, you will gain a better understanding of this algorithm.

1import numpy as np
2
3# Input matrix A
4A = np.array([[4, 11, 14], [8, 7, -2]])
5
6# Calculate AAT and ATA
7AAT = A @ A.T
8ATA = A.T @ A
9
10# Calculate eigenvalues and eigenvectors
11eig_vals_U, eig_vecs_U = np.linalg.eig(AAT)
12eig_vals_V, eig_vecs_V = np.linalg.eig(ATA)
13
14# Sort eigenvalues in descending order
15idx_U = eig_vals_U.argsort()[::-1]
16idx_V = eig_vals_V.argsort()[::-1]
17
18# Use sorted indices to get corresponding eigenvectors and eigenvalues
19eig_vals_U = eig_vals_U[idx_U]
20eig_vecs_U = eig_vecs_U[:, idx_U]
21
22eig_vals_V = eig_vals_V[idx_V]
23eig_vecs_V = eig_vecs_V[:, idx_V]
24
25# Calculate diagonal matrix Sigma
26Sigma = np.zeros((A.shape[0], A.shape[1]))
27for i in range(min(A.shape)):
28    Sigma[i, i] = np.sqrt(eig_vals_U[i])
29
30# Construct U, Sigma, V matrices
31U = eig_vecs_U
32V = eig_vecs_V.T
33print("U:\n", U)
34print("Sigma:\n", Sigma)
35print("V:\n", V)