|About the Singular Value Decomposition|
Message #1 Posted by Rodger Rosenbaum on 31 Mar 2006, 1:18 p.m.
I'm going to discuss the Singular Value Decomposition (SVD) and some of its properties. I'll be doing calculations on the HP49G+ calculator; the HP48G will give the same results, but I'm not so sure about the HP49 anymore!
The book, "Linear Algebra and its Applications", by Gilbert Strang is a good reference for those who want more information. Another very nice reference on the web is the paper
I will refer to the length of a vector and the Euclidian norm of a matrix; these are calculated by the ABS command in the HP49G+ (Frobenius norm, FNORM, in the HP71). Another matrix norm is the spectral norm which is the largest singular value of a matrix, calculated by the SNRM command. The (Euclidian) distance between a couple of matrices is calculated by subtracting the two matrices and then applying the ABS command (the matrices have to be conformable for subtraction; the same size, in other words). I will be referring to pure real matrices unless I state otherwise.
Given a square n x n matrix M and an n-dimensional vector V, it is well known that forming the product M*V gives an result vector Vr which, in general, is rotated and changed in length with respect to the original vector. For a concrete example, consider the 2 x 2 matrix M:
[[ 1 2 ]
[ 3 4 ]]
and the vector V, [ 1 0 ]. This vector is oriented along the x-axis on the 2-dimensional plane. After the multiplication by M, the result is [ 1 3 ]. The length is increased by a factor of 3.16 and its direction is changed--it no longer points directly in the direction of the positive x-axis. Imagine that we start with the unit vector
[ 1 0 ] and rotate it slowly in a counter-clockwise direction; the tip of the vector will describe a circle of radius 1. Now, as the vector rotates, multiply it by the matrix M at each position, and plot the curve described by the tip of the result vector. This curve will not be a circle, because as the position of the input vector changes, the transformation caused by multiplication by M also changes. For some positions of the input vector, the multiplication causes the vector to become longer, as in the example above. For other positions, the multiplication by M causes the input vector to become shorter, such as for the vector [ .8 -.5 ] which becomes [ -.2 .4 ]. The curve plotted is an ellipse showing the maximum and minimum change in length of the input vector after multiplication by the matrix M.
The same thing happens for matrices and vectors with more than 2 dimensions. If we could plot the magnification factor caused by the matrix multiplication for all possible vectors in the hyperspace, it would be a hyperellipse in n-dimensional hyperspace. For a typical matrix, the hyperellipse would have a maximum size in one particular direction, and along the orthogonal directions relative to the maximum, we would find other multiplication factors.
The SVD can be applied to any matrix, square or rectangular, real or complex. The result of the decomposition as traditionally described in texts is three matrices, U, S, and V (the texts use the capital greek letter sigma instead of S; also the matrix V is usually the transpose of the V returned by the HP49G+, but I will describe what the calculator does).
U and V are orthogonal matrices; that is, they are square matrices whose columns and rows are of unit length, and the rows and columns are linearly independent of one another (orthogonal matrices have the property that when THEY multiply a vector, they don't change its length). In texts, the matrix S is a diagonal matrix with the singular values on the main diagonal, and zeroes elsewhere. From an m x n matrix initially on the stack, the calculator returns on the stack, U V S, with S as a vector rather than a diagonal matrix; the singular values are sorted in decreasing order in the vector S. The number of singular values is the lesser of m and n, and some of them may be zero.
Given an input matrix A, the relationship that holds is given by this expression: A = U * DIAG(S) * V. If the input matrix, A, is an m x n dimensional matrix, then the U matrix is m x m, the V matrix is n x n, and the vector S returned by the calculator must be converted to an m x n diagonal matrix with the singular values on the main diagonal before the multiplication can be carried out.
So, just what are those singular values? Above we plotted a hyperellipse showing the magnification factor caused when a matrix multiplies all possible vectors. The singular values are just the magnification factors in n orthogonal directions in n-dimensional hyperspace. To use our concrete example above, the 2 x 2 matrix has 2 singular values, namely
5.46498570422 and .365966190626.
This means that the largest magnification caused by that matrix is 5.464..., and the smallest magnification (maximum shrinkage) is .365... We saw that the effect caused by the matrix multiplication depended on the vector we operated on, so we might ask, what vector would experience the maximum magnification of 5.464...? The answer is given by the SVD (I'll explain the details later). The vector
[ -.576048436766 -.817415560470 ] (which is a unit length vector) becomes:
[ -2.21087955771 -4.99780755218 ] after premultiplication by the matrix:
[[ 1 2 ]
[ 3 4 ]]
and we can determine the length of this vector with the ABS command to be 5.46498570422, exactly equal to the largest singular value.
If a square matrix is singular, its determinant is zero, and its columns are linearly dependent. This linear dependence of the columns means that some combination of the columns is zero. How can we find the particular combination of columns that adds to zero? The SVD, of course!
If a square matrix is singular, then at least one of its singular values will be zero. The number of non-zero singular values is the rank of the matrix. In floating point arithmetic, we may not be able to exactly represent a matrix we want to work with. Let's say that a particular element of a singular matrix should be 1/3, but because of the limitations of floating point arithmetic, this element is represented as .333333333333 on the HP49G+. This peturbation will probably make the matrix as represented in the calculator not singular. But the SVD will have an extremely small singular value in the S vector, so we will know that the matrix should be considered "effectively" singular. In fact, it can be shown that the smallest singular value is the distance to the nearest singular matrix! Thus, the SVD provides us with a way to know just how close to being singular the matrix is.
Another property of the singular values is that their product is the determinant of the matrix, if it's a square matrix; this is why at least one of the singular values is zero if the matrix is square and singular. If the matrix is not square, the presence of zero singular values means that the matrix is rank deficient in the smaller of m or n. Also, the singular values are always positive numbers, even if the matrix is a complex matrix.
We can use the SVD to find the inverse of a matrix. The original matrix is reconstituted by the product U * DIAG(s) * V. If we form the product,
TRANSPOSE(V) * DIAG(1/s) * TRANSPOSE(U),
it will be found to be the inverse of the original matrix. The expression DIAG(1/S) means to form an n x m (not m x n) diagonal matrix, with the reciprocals of the singular values on the main diagonal. Since the SVD can be applied to a rectangular matrix, with m not equal to n, we might wonder what we get if we form that product with the SVD of a rectangular matrix. In that case, the result is called the pseudoinverse. The pseudoinverse can be used to solve least squares problems and ill-conditioned systems.
Premultiplication of a vector by a matrix A is the way we get a linear combination of the columns of the matrix. Each component of the vector multiplies a column of the matrix and then the products are added up like this: let the columns of the vector be denoted C1, C2, C3,...Cn and the components of the vector are V1, V2, V3,...Vn. Premultiplying the vector by the matrix forms the linear combination V1*C1 + V2*C2 + V3*C3...+Vn*Cn, the result of which is a vector. If this vector is all zeroes (length zero), we say that the columns are linearly dependent. If the result vector is nearly zero, then the matrix is nearly singular (or rank deficient if it isn't square).
The singular values can all be formed from some linear combination of columns of the matrix A, whose SVD we have. Extract the first row of the V matrix from the SVD into a vector; premultiplying this vector (which has unit length) by the matrix A will give a result vector whose length is the first (largest) singular value. We have thus found a vector which demonstrates the magnification factor which is the first singular value. The second row of the V matrix will give the second singular value and so on. If any of the singular values are zero, such as the last one, the corresponding row of the V matrix will give the combination of the columns of A that result in a zero vector for a result. The singular values can also be formed from linear combinations of the rows of matrix A in a similar way by using the columns of the U matrix.
If the matrix A is very close to a singular matrix, then the procedure of the previous paragraph can find the vector of unit length which comes closest to solving the system Ax = 0.