In NLP tasks, when creating word embeddings through counting principles, it is often necessary to reduce dimensions using the SVD algorithm. Additionally, in generative tasks, SVD is also a commonly used method to compare the similarity of two matrices.
This article will clarify the working principle of this algorithm for individuals with no background in algebra.
Why Use SVD
In linear algebra, SVD decomposes a matrix into the product of three special matrices.
Assuming you have understood the implementation principle of the co-occurrence matrix, we can see that the co-occurrence matrix contains many zeros, meaning it is very sparse.
Through the PPMI algorithm, we can ensure that each value is less than 1, but it is still very sparse, which can lead to wasted computational resources. Therefore, we use the SVD algorithm to transform this matrix into a dense matrix.
Note that there are many methods for dimensionality reduction, and SVD is one of the widely adopted methods.
Working Principle of SVD
The most basic formula for SVD is as follows:
In simple terms, a matrix is represented as the product of three matrices. Because of this property, it can reduce the size of matrices created in NLP or utilize the decomposed eigenvalues to compare matrix similarity.
Next, let's discuss the composition of these three matrices.
Pay attention to the shape of each matrix. First, let's look at some concepts.
- : The transpose of matrix V, which transforms the m×n matrix V's rows into columns of the same ordinal number, resulting in an n×m matrix. After two transpose operations, the original matrix will be obtained.
- Orthogonal Matrix: If a matrix satisfies (where I is the identity matrix and is the transpose of Q), then we say that Q is an orthogonal matrix.
- Covariance Matrix: Given matrix A, their covariance matrices are and .
- Singular Values: The values on the diagonal of the singular value matrix (marked in the figure). For the singular value matrix, all values except those on the diagonal are zero.
Based on the initially provided formula, we can derive the following:
Since V is an orthogonal matrix, we can obtain:
Thus:
So the singular value is
The question now becomes how to compute U and VT. In SVD, U and V are orthogonal matrices, and their column vectors are the eigenvectors from the covariance matrices of A ( and ). The singular values (the elements on the diagonal of Σ) are the square roots of the eigenvalues of A's covariance matrix. That is:
It's okay if you don't understand this step; we will go through it step by step. First, you might wonder: How do we perform eigen decomposition on a matrix?
For a given square matrix , a non-zero vector v is an eigenvector of A if and only if Av is a scalar multiple of v, where this scalar is the eigenvalue corresponding to the eigenvector v. That is:
Python Implementation
By observing the code implementation, you will gain a better understanding of this algorithm.
1import numpy as np 2 3# Input matrix A 4A = np.array([[4, 11, 14], [8, 7, -2]]) 5 6# Calculate AAT and ATA 7AAT = A @ A.T 8ATA = A.T @ A 9 10# Calculate eigenvalues and eigenvectors 11eig_vals_U, eig_vecs_U = np.linalg.eig(AAT) 12eig_vals_V, eig_vecs_V = np.linalg.eig(ATA) 13 14# Sort eigenvalues in descending order 15idx_U = eig_vals_U.argsort()[::-1] 16idx_V = eig_vals_V.argsort()[::-1] 17 18# Use sorted indices to get corresponding eigenvectors and eigenvalues 19eig_vals_U = eig_vals_U[idx_U] 20eig_vecs_U = eig_vecs_U[:, idx_U] 21 22eig_vals_V = eig_vals_V[idx_V] 23eig_vecs_V = eig_vecs_V[:, idx_V] 24 25# Calculate diagonal matrix Sigma 26Sigma = np.zeros((A.shape[0], A.shape[1])) 27for i in range(min(A.shape)): 28 Sigma[i, i] = np.sqrt(eig_vals_U[i]) 29 30# Construct U, Sigma, V matrices 31U = eig_vecs_U 32V = eig_vecs_V.T 33print("U:\n", U) 34print("Sigma:\n", Sigma) 35print("V:\n", V)