|Re: Near, or exactly, rank-deficient matrices.|
Message #17 Posted by Rodger Rosenbaum on 31 Jan 2008, 3:09 a.m.,
in response to message #15 by Rodger Rosenbaum
To continue, I'll assume the reader has access to an HP48G/HP49/HP50 calculator or emulator, running in RPN mode.
I'll also assume the reader has created a directory to work examples in, containing variables A, B, A1, B1, PA, SS, ULSS, NM, U, V, AND S.
The reader should also review (and install the small programs found in the first reference):
Store the matrix A in its variable:
[[ 4. 2. 3. 3. ]
[ 5. 2. 6. 6. ]
[ 9. 2. 5. 3. ]
[ 4. 9. 4. 9. ]]
the matrix B in its variable:
[[ 22.1743265837 ]
[ 33.8228271732 ]
[ 34.6819580525 ]
[ 51.0426400707 ]]
the matrix PA in its variable:
[[ 4. 2. 3. 3. ]
[ 5.00001 2. 6. 6. ]
[ 9. 2. 5. 3. ]
[ 4. 9. 4. 9. ]]
the matrix ULSS in its variable:
[[ 2.01952510743 ]
[ 2.23362533599 ]
[ 1.20500296581 ]
[ 2.00465552820 ]]
the matrix SS in its variable:
[[ 1.997894004E-5 ]
[ 5.26288302879 ]
[ 6.25376578705 ]
[ -2.37093891693 ]]
I called this solution vector S in the previous post, but I'm changing it to avoid conflict with the SVD matrix S.
and save the matrix NM in its variable:
[[ .05 -.01 .02 -.08 ]
[ -.01 .06 -.05 .08 ]
[ 0. .01 -.09 .03 ]
[ .05 .05 .06 -.07 ]]
An important mathematical property to be aware of is this:
If you have N real numbers, whose sum is always T, if the individual numbers vary, the RMS value of the numbers is minimum when the numbers are equal. As their various differences increase, their RMS value increases. Here this particularly applies to the elements of a vector (or matrix). If a vector has 4 elements whose sum is a constant (or nearly a constant, as a practical matter), for example, the norm (calculated with ABS; giving the RMS value of the elements) of the vector is a minimum is the elements are identical.
For instance, let A1 be:
[[ 1. 1. 1. 1. ]
[ 2. 2. 2. 2. ]
[ 3. 3. 3. 3. ]
[ 4. 4. 4. 4. ]]
and let B1 be:
[[ 4. ]
[ 8. ]
One obvious exact solution vector for this system is [ 1. 1. 1. 1. ]T. But another exact solution is [ 12. -10. 12. -10. ]T. The sum of the elements in each solution vector is 4, but the norm of the first is 2 and the norm of the second is 22.09+.
If we postmultiply the noise matrix NM by each of these solution vectors and calculate the norm, we get .1319+ for the first and 3.467+ for the second. The second is about 26.3 times the first. It can be shown that the ratio of the noise multiplication factors is about equal to the ratios of the norms of the two solution vectors. The relation doesn't hold exactly except as an average when many numerical experiments with many different noise matrices is conducted, but it's of the right order of magnitude for almost any single noise matrix.
So the idea is to find approximate solutions to a linear system that have an acceptably low residual norm, and a substantially reduced noise multiplication factor. This is done by using the PSU program from the first reference above. First the SVD (Singular Value Decomposition) is computed for the system matrix. Then the singular values are examined and if there are any that are very small in comparison to the largest, they are candidates for removal.
For example, the SVD of the PA matrix gives the following singular values:
[ 19.769 7.2897 2.8365 4.4034E-7 ]T (truncated to save space)
Clearly the last (and smallest; the singular values are always in decreasing order in the vector) is a candidate for removal.
Do this in the following way. Put the matrix PA on the stack, then press ->SV giving the three SVD results on the stack (U, V, and S). Store them in the variables S, U, and V. Then type 1 PSU B * to see a solution vector (call it PAS4):
[[ 2.01952724285 ]
[ 2.23362880284 ]
[ 1.20499966532 ]
[ 2.00465262032 ]]
Calculate the residual norm like this: PA PAS4 * B - ABS. I get 3.35E-6. It's not an exact solution but it's good enough for government work as we used to say in the military. It's probably much better than the precision to which your data are known.
Compare this result to ULSS, the solution of the unperturbed (with singular A matrix) system, found with LSQ. It's the same to the 5th place after the decimal point. The unperturbed A matrix has a non-empty null column space; that's why it's singular. The perturbed A matrix (PA) has a basis vector for its column space that's very close to defining a non-empty null space, and that's why it has such a high condition number. When we solve the unperturbed system with LSQ, it gets a solution that doesn't include the column space null vector. With the perturbed system, even LSQ includes the near null vector. The only way we can get rid of it is with the SVD method, and then we get a solution vector very close to the solution of the unperturbed system when its null column space vector is not included. This is what we want for minimum noise magnification.
And, by the way, Karl, even though the LSQ function was able to solve the unperturbed system, if you try to do it with the Moore-Penrose technique AT*(AAT)^-1*B , you will get an error. Does this mean that it's not really underdetermined?
We could try eliminating another singular value in the solution like this: 2 PSU B * and get (call it PAS3):
[[ 1.71082483493 ]
[ 1.87920936888 ]
[ 1.54600935358 ]
[ 2.29523695332 ]]
If we calculate the residual norm like this: PA PAS3 * B - ABS, we get 1.8418569, much larger than we would like. This shows what we could have known from our examination of the singular values. The 4th was very small and a good candidate for removal; the 3rd was too large.
Let's go back to the 4 ammeter problem.
The A matrix is (store in variable A1):
[[ 0.95 1.09 1.01 1.09 ]
[ 1.94 1.95 2.02 1.91 ]
[ 2.92 3.00 2.93 2.98 ]
[ 4.01 3.95 3.97 3.94 ]]
and the B matrix is (store in variable B1):
[[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]]
Recall A1 to the stack and execute ->SV (from the first reference). Store the stack results to S, V and U (S is on level 1, V on level 2, U on level 3). Examine the singular values which are in variable S: [ 10.8409016914 .147018600028 7.31158831777E-2 9.49300838352E-3 ]T
We can definitely delete the effect of the 3rd and 4th, and probably even the 2nd.
Let's compare the exact solution and the 3 solutions we get when we delete the effect of the last 3 singular values, plus the residual in each case (truncated to save space). The singular values are dropped starting with the smallest first, then the last two, then the last three.
Exact [ .8826 3.1449 -.48362 -2.5482 ]T 1E-12
4th SV dropped [ .29408 .16277 .47708 .07622 ]T 3.92E-2
3rd & 4th SV dropped [ .40200 .14421 .33214 .13238 ]T 4.16E-2
2nd, 3rd, 4th SV dropped [ .25209 .25349 .25286 .25197 ]T 5.4024E-2
Notice how the variance in the elements of the solutions becomes less and less as we drop more singular values.
Remember the minimum RMS property I mentioned above? The residual norms of all these solutions are reasonably low, but the noise magnification factor decreases as the values of the elements become nearly the same. The exact solution is especially bad because of the large values of the elements. Such large values are usually of opposite sign in order to give the correct noise free solution. When noise is present, those large, varying, values increase the norm of the noise in the predicted B vector.
Let's test the noise magnification of the 1st and last solution from above. Postmultiply NM (the noise matrix) by the exact solution above and calculate the norm (ABS function). I get .4072. Now postmultiply NM by the solution labelled "2nd, 3rd, 4th SV dropped", and then calculate the norm. I get 3.338E-2, about 12 times less.
It is a characteristic of these tradeoffs that as the residual norm increases, the noise magnification decreases, so let's calculate the residual norm with some other solution vectors.
Solution Residual norm
[ .25 .25 .25 .25 ]T 7.818E-2
[ .252607010711 .252607010711 .252607010711 .252607010711 ]T 5.401E-2
[ .253280216940 .252112563361 .253205061818 .251830755168 ]T 5.389E-2
The solution in the first table above, the one labeled "2nd, 3rd, 4th SV dropped", has the largest residual norm, hence the lowest noise magnification. There is no other solution vector with smaller noise magnification properties, which is also equally close to the exact solution.
Of course, these are very small differences. Almost any of the solution vectors with elements near .25 would be satisfactory as a practical matter. Such a solution can be obtained without high powered mathematics; visual inspection of the system is sufficient. But even with this unusually symmetric system, the SVD technique shows its virtue and gives the best solution.
With the first system, the PA system matrix, visual inspection won't help you much. Only the SVD can so easily guide you to the improved solution with reduced noise magnification.
In the old days of statistics, multiple regression analysis would often encounter a very ill-conditioned system. Ill-conditioning usually indicates that some of the columns contain redundant information. One of the techniques they used was to delete a column and perform the regression (least squares analysis), and evaluate the residual norm; then delete a different column or columns, and do it again. After trying all the possible combinations of dropped columns, they would select the configuration with the best residual norm (and with consideration of some other factors). The more modern technique is the SVD method, explained very nicely in Lawson and Hanson's classic book, "Solving Least Squares Problems".