# Linear AlgebraSingular Value Decomposition

In this section we will develop one of the most powerful ideas in linear algebra: the **singular value decomposition**. The first step on this journey is the **polar decomposition**.

## Polar decomposition

The

as befits the notation .

The matrix is simpler to understand than because it is symmetric and

In other words, for all , the images of under and under have equal norm. If points are the same distance from the origin, then one may be mapped to the other by a

Therefore, for each , there is an orthogonal transformation from the range of to the range of which sends to .

It can be shown that the orthogonal transformation mapping to is the same transformation for every . Furthermore, even if the range of is not all of , we can extend this orthogonal transformation to an orthogonal transformation on by arbitrarily mapping vectors in a basis of the *polar decomposition*:

**Theorem** (Polar Decomposition)

For any matrix , there exists an orthogonal matrix such that

This representation is useful because it represents an arbitrary square matrix as a product of matrices whose properties are easier to understand (the orthogonal matrix because it is distance- and angle-preserving, and the positive-definite matrix because it is orthogonally diagonalizable, by the

**Exercise**

Let's explore a fast method of computing a polar decomposition . This method actually works by calculating and then recovering as (since this is computationally faster than calculating the matrix square root). We call the *orthogonal part* of and the *symmetric part* of .

We set and define the iteration

Let's see why this converges to .

Defining and using the equation , show that Use the prior step to explain why the 's all have the same orthogonal parts and have symmetric parts converging to the identity matrix.

Hint: consider the eigendecompositions of the symmetric parts. You may assume that the sequence defined by converges to 1 regardless of the starting value .Write some code to apply this algorithm to the matrix

```
import numpy as np
A = np.array([[1,3,4],[7,-2,5],[-3,4,11]])
```

`A = [1 3 4; 7 -2 5; -3 4 11]`

and confirm that the resulting matrices and satisfy and .

*Solution.* Since both conjugation and inversion reverse the order of matrix products, we get

Since is orthogonal, , as . Since is symmetric, . So this is equal to

We see that the and have the same orthogonal part, and repeating the calculation shows that all the have the same orthogonal part. As for the symmetric parts , we see that

Let's see why this averaging process converges to the identity matrix. Symmetric matrices are diagonalizable by the

Thus the 's converge to the matrix , where is the diagonal matrix whose th entry is the limit obtained when you start with and repeatedly apply the function . By the fact about this iteration given in the problem statement, we conclude that is the identity matrix. Therefore, the limit of as is equal to .

For example:

import numpy as np def polar(A,n): R = A for i in range(n): R = (R + np.linalg.inv(R.T))/2 return R, np.linalg.solve(R, A) I = np.eye(3) A = np.array([[1, 3, 4],[7, -2, 5], [-3, 4, 11]]) R, P = polar(A,100) R.T @ R - I, P @ P - A.T @ A

using LinearAlgebra function polar(A,n) R = A for i=1:n R = (R + inv(R'))/2 end R, R \ A end A = [1 3 4; 7 -2 5; -3 4 11] R, P = polar(A,100) R'*R - I, P^2 - A'*A

Both of the matrices returned on the last line have entries which are within of zero.

**Exercise**

Show that the product of two matrices with orthonormal columns has orthonormal columns.

*Solution.* If and , then .

## The singular value decomposition

The polar decomposition tells us that any square matrix is almost the same as some symmetric matrix, and the spectral theorem tells us that a symmetric matrix is almost the same as a simple scaling along the coordinate axes. (In both cases, the phrase "almost the same" disguises a composition with an orthogonal transformation.) We should be able to combine these ideas to conclude that *any* square matrix is basically the same as a simple scaling along the coordinate axes!

Let's be more precise. Suppose that is a square matrix. The polar decomposition tells us that

for some orthogonal matrix . The spectral theorem tells us that for some orthogonal matrix and a diagonal matrix with nonnegative diagonal entries.

Combining these equations, we get

Since a product of orthogonal matrices is **singular value decomposition** (SVD) of :

where and are orthogonal matrices.

We can visualize the decomposition geometrically by making a figure like the one shown below, which illustrates the successive effects of each map in the composition . If we draw grid lines on the *second* plot (just before is applied) and propagate those grid lines to the other plots by applying the indicated maps, then we endow the domain and range of with orthogonal sets of gridlines with mapping one to the other.

We can extend the singular value decomposition to rectangular matrices (that is, matrices which are not necessarily square) by adding rows or columns of zeros to a rectangular matrix to get a square matrix, applying the SVD to that square matrix, and then trimming the resulting matrix as well as either or (depending on which dimension of is smaller). We get a decomposition of the form where is an orthogonal matrix, is an orthogonal matrix, and is a rectangular diagonal matrix.

**Theorem** (Singular value decomposition)

Suppose that is an matrix. Then there exist orthogonal matrices and and a rectangular diagonal matrix such that

We call **singular value decomposition** (or SVD) of **singular values** of

The diagonal entries of **singular values** of *left* singular vectors, and the columns of *right* singular vectors.

Looking at the bottom half of the *principal component analysis*.

As an example of how the singular value decomposition can be used to understand the structure of a linear transformation, we introduce the **Moore-Penrose pseudoinverse**

- If
is square and invertible, thenA A^+ = A^{-1} - If
has no solution, thenA\mathbf{x} = \mathbf{b} is the value ofA^+\mathbf{b} which minimizes\mathbf{x} (in other words, the closest thing to a solution you can get).|A\mathbf{x} - \mathbf{b}|^2 - If
has multiple solutions, thenA\mathbf{x} = \mathbf{b} is the solution with minimal norm.A^+\mathbf{b}

**Exercise**

Show that

*Solution.* Let

and let

We need to check that

The formula for the Moore-Penrose pseudoinverse is

The matrix

With a little calculation, we arrive at

## SVD and linear dependence

Linear dependence is numerically fragile: if the columns of a matrix (with more rows than columns) are linearly dependent, then perturbing the entries slightly by adding tiny independent random numbers is almost certain to result in a matrix with linearly independent columns. However, intuition suggests that subverting the principles of linear algebra in this way

This intuition is accurate, and it highlights the utility of having a generalization of the idea of linear independence which can *quantify* how close a list of vectors is to having linear dependence relationships, rather than remaining within the confines of the binary labels "linearly dependent" or "linearly independent". The singular value decomposition provides such a tool.

**Exercise**

Define a matrix with 100 rows and 5 columns, and do it in such a way that two of the five columns are nearly equal to some linear combination of the other three. Calculate the singular values of the matrix, and make a conjecture about how the number of approximate linear dependencies could have been detected from the list of singular values.

import numpy as np

*Solution.* We see that two of the singular values are much smaller than the other three. (Remember that you have to run the cell twice to get the plot to show.)

import numpy as np import matplotlib.pyplot as plt A = np.random.standard_normal((100,5)) A[:,3] = A[:,2] + A[:,1] + 1e-2*np.random.standard_normal(100) A[:,4] = A[:,1] - A[:,0] + 1e-2*np.random.standard_normal(100) plt.bar(range(5),np.linalg.svd(A)[1])

using Plots, LinearAlgebra A = randn(100, 5) A[:,4] = A[:,3] + A[:,2] + 1e-2*randn(100) A[:,5] = A[:,2] - A[:,1] + 1e-2*randn(100) bar(1:5, svdvals(A), label = "singular values")

We conjecture that

In fact, the idea developed in this exercise is used by the NumPy function `np.linalg.matrix_rank`

to calculate the rank of a matrix. Because of the roundoff errors associated with representing real numbers in memory on a computer, most matrices with float entries technically have `np.linalg.matrix_rank`

computes the singular value decomposition and returns the number of

In fact, the idea developed in this exercise is used by the Julia function `rank`

to calculate the rank of a matrix. Because of the roundoff errors associated with representing real numbers in memory on a computer, most matrices with float entries technically have `rank`

computes the singular value decomposition and returns the number of

## SVD and image compression

We close this section with a computational exercise illustrating another widely applicable feature of the singular value decomposition.

**Exercise**

Show that if

are the columns of\mathbf{u}_1, \ldots, \mathbf{u}_n ,U are the columns of\mathbf{v}_1, \ldots \mathbf{v}_n , andV are the diagonal entries of\sigma_1, \ldots, \sigma_n , then\Sigma .A = \sigma_{1} \mathbf{u}_{1}\mathbf{v}_{1}'+\sigma_{2}\mathbf{u}_{2}\mathbf{v}_{2}'+\cdots+ \sigma_{n}\mathbf{u}_{n}\mathbf{v}_{n}' The equation is useful for

*compression*, because terms with sufficiently small singular value factors can be dropped and the remaining vectors and singular values can be stored using less space. Suppose that is aA matrix—how many entries does256 \times 128 have, and how many entries doA ,\mathbf{u}_1 ,\mathbf{u}_2 ,\mathbf{u}_3 ,\mathbf{v}_1 ,\mathbf{v}_2 have in total?\mathbf{v}_3 The Python code below creates a matrix

with pixel values for the image shown. How many nonzero singular values doesA have? Explain how you can tell just from looking at the picture.A figure: img(src="/content/linear-algebra/images/zero.svg")

```
import numpy as np
import matplotlib.pyplot as plt
m = 80
n = 100
a = m // 8
b = m // 4
A = np.ones((m,n))
def pixel(i,j):
if (a <= i <= b or m-b <= i <= m-a) and a <= j <= n-a:
return 0
elif (a <= j <= b or n-b <= j <= n-a) and a <= i <= m-a:
return 0
return 1
A = np.array([[pixel(i,j) for i in range(1,m+1)] for j in range(1,n+1)])
U, Σ, V = np.linalg.svd(A)
plt.matshow(A)
```

```
using LinearAlgebra, Plots
m = 80
n = 100
a = m ÷ 8
b = m ÷ 4
A = ones(m,n)
function pixel(i,j)
if (a ≤ i ≤ b || m-b ≤ i ≤ m-a) && a ≤ j ≤ n - a
0
elseif (a ≤ j ≤ b || n-b ≤ j ≤ n-a) && a ≤ i ≤ m - a
0
else
1
end
end
A = [pixel(i,j) for i=1:m,j=1:n]
U, Σ, V = svd(A)
heatmap(A)
```

- Now add some noise to the image:

`B = A + 0.05*np.linalg.standard_normal((m,n))`

`B = A + 0.05 * randn(m, n)`

Display this new matrix

Hint: you can achieve this computationally either by setting some singular values to 0 or by indexing the matrices `np.diagonal`

`diagm`

to generate a diagonal matrix from the vector of `svd`

.

import numpy as np import matplotlib.pyplot as plt

*Solution.*

- Let
We need to show thatM = \sigma_1\mathbf{u}_1\mathbf{v}_1' + \cdots + \sigma_n\mathbf{u}_n\mathbf{v}_n'. We will do this by first showing thatA = M. for allA\mathbf{x} = M\mathbf{x} \mathbf{x} \in \mathbb{R}^n.

Now,

where

By linearity of matrix multiplication,

and thus

as required.

hasA entries and256 \times 128 = 32768 combined have\mathbf{u}_1, \mathbf{u}_2, \mathbf{u}_3, \mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3 entries.3(256 + 128) = 1152 It can be seen from the picture that

hasA kinds of columns: one whose components are all dark, another whose components are light in the middle, and the other whose components are dark on the outside and in the middle with strips of light in between. These columns are clearly linearly independent, and thus3 has rankA Therefore3. hasA non-zero singular values.3

We can select only the first three terms by suitably indexing the vectors, as follows:

```
U, Σ, V = np.linalg.svd(B)
plt.matshow(U[:,:3] * np.diag(Σ[:3]) * V.T[:3,:])
```

U, Σ, V = svd(B) heatmap(U[:,1:3] * diagm(Σ[1:3]) * V'[1:3,:])