HP Forums

Full Version: HP Article VA058 - Boldly Going - HP-15C CE Big Matrix Woes
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
      

Boldly Going - HP-15C CE Big Matrix Woes
© 2023 Valentín Albillo


This article was created to commemorate the recent general availability of the HP-15C Collector's Edition, which boasts enormously increased processing speed (up to 180x) and memory (up to 3x more RAM,) relative to the original.

However, there's a significant problem which needs addressing.

The Problem

The problem is that although the memory is vastly increased in most 15C clones (physical or virtual), allowing the dimensioning and use of matrices larger than 8x8, some very important advanced matrix operations (namely matrix inversion, system solving and determinant) can't be confidently performed using such large matrices (unreliable results, blocked calculator) because of firmware limitations having to do with the internal LU decomposition, which are too difficult or even impossible to overcome via ad-hoc patches.

The Sleuthing

Fully aware of this, I investigated what could I do about it, and first of all I looked at my own NxN Matrix Inversion program for the HP-41C (which I wrote and published more than four decades ago,) in order to try and convert it to run on the CE, but I sadly realized that the conversion wasn't viable because the 41C program is too long (170 steps, 40 registers) and needs 11 numbered data registers to work, for a grand total of 51 registers. This would severely limit the maximum size of the matrices to process.

Secondly, the 41C program uses a lot of flags (e.g. 13+1 for 13x13 matrices), which the CE doesn't have, and additionally it uses a lot of indirect register storage and recall using several index registers, while the CE has only one, namely register I (three if you also consider the row/col index registers R0 and R1,) so this would considerably complicate and slow the conversion, needing special care with the stack contents. This would further increase program size and running time significantly, as the program uses nested loops and thus executes many hundreds or even thousands of RPN instructions to perform the inversion. In short, not viable.

Then, I investigated whether it would be possible to write a program to create and use the LU decomposition as the firmware does but without the dreaded 8x8 limitation, and though this would possibly do, it would nevertheless result in long, complicated code which would further need additional registers to work, thus limiting seriously the size of the matrices it could process. It would also need to execute hundreds or thousands of RPN instructions, not firmware microcode. Viable but difficult and with suboptimal performance, so

What to do ?

My Solution

To adopt an entirely new strategy, that's what. Come to think of it, the real problem here is that the HP-15C's firmware can't obtain the LU decomposition (an so invert/system-solve/compute the determinant) of a matrix larger than 8x8. Well, let's avoid it altogether by using some strategy which allows us to use the built-in microcoded inversion instruction for speed, but making sure it never has to invert a matrix larger than 8x8 ! Enter partitioned matrices.

A partitioned matrix is a matrix considered to be partitioned into a number of submatrices, which we'll call blocks. The blocks can be of arbitrary sizes, for example, this 5x5 matrix M can be partitioned into four unequal blocks A (4x4), B (4x1), C (1x4) and D (1x1), like this:

      │ 3 1 4 1 5 │                 │ 3 1 4 1 │       │ 5 │
      │ 9 2 6 5 3 │                 │ 9 2 6 5 │       │ 3 │
  M = │ 5 8 9 7 9 │ = | A B | , A = │ 5 8 9 7 │ , B = │ 9 │ , C = │ 6 2 6 4 │ , D = │ 3 │
      │ 3 2 3 8 4 │   | C D |       │ 3 2 3 8 │       │ 4 │
      │ 6 2 6 4 3 │

and of course the idea is that there are efficient algorithms to deal with such partitioned matrices by interacting directly with their blocks, in particular for inverting the matrix or computing its determinant, without ever needing to process any block larger than the largest one, whose size we can choose.

Note that regardless of their sizes, every set of blocks represents the same original matrix and operations performed using them will result in the same outcome, but some sizes are better one way or another. For instance, my implementation here of the algorithm to invert a matrix (similarly for computing its determinant) requires that it must be partitioned into 4 blocks and that block A must be a square matrix no larger than 8x8 (which forces block D to also be a square matrix,) because it eventually needs to compute the inverse of A using the built-in [1/x] instruction.

Also very important, to achieve maximum speed it's essential that my routine spends most of its time executing microcode, so that's why block A must be dimensioned to be as large as possible without exceeding the 8x8 limit and thus for original NxN matrices larger than 8x8 we'll dimension A to be exactly 8x8, which forces the sizes of all remaining blocks: B will then be 8x(N-8), C will be (N-8)x8, D will be (N-8)x(N-8) and as long as N ≤ 16 no block will be larger than 8x8, as required.

And last but not least, all the code here will run on any HP-15C version, including the original, the Limited Edition (patched for extra RAM or not), the Collector's Edition (in its default or extended RAM modes), the DM15's various versions and firmwares, as well as assorted true emulators.
    Note: For N < 9 no partioning into blocks is necessary. as the [1/x] instruction can be used to directly invert the matrix. However, this subroutine can be used if desired, see 5x5 Toy Example below.

The implementation Part 1: Matrix Inversion

This 29-step, 33-byte subroutine will invert in place an NxN partitioned matrix for N ≤ 16 (subject to available memory). It takes no inputs but the caller (the user or another program) must have previously dimensioned and populated the four blocks with the elements of the original matrix.

The inversion proceeds as follows:
  1. Let's have an NxN matrix M partitioned in 4 blocks A, B, C and D, where A is a non-singular square matrix of size at most 8x8:

        M = | A  B |
            | C  D | 

          
  2. Now we compute the following, in this order:

        Let A = A-1
        Let C = C x A
        Let D = (D - C x B)-1
        Let B = A x B x D 

          
  3. And finally we have:

        M-1 =  |  A + B x C    -B  |
               |   -D x C       D  |

Once it returns, the original values in the blocks will have been replaced with those of the computed inverse matrix, which the user or the caller program may proceed to output or use as desired. In other words, this subroutine can be called from the keyboard or another program (in which case it could be directly embedded into it if it's called just once) but it doesn't perform any input or output operations, it just does the inversion in place, period.

Program listing

   ►LBL E            001- 42,21,15     Comments       X    Y    Z    T
    RESULT A         002- 42,26,11     A=..
    RCL MATRIX A     003- 45,16,11     A              A
    RCL MATRIX B     004- 45,16,12     B              B    A
    RCL MATRIX C     005- 45,16,13     C              C    B    A
    RCL MATRIX A     006- 45,16,11     A              A    C    B    A
    1/x              007-       15     A=Inv(A)       A    C    B    A
    RESULT E         008- 42,26,15     E=..
     x               009-       20     E=CxA          E    B    A
    STO MATRIX C     010- 44,16,13     C=E=CxA        E    B    A
    RESULT D         011- 42,26,14     D=..
    RCL MATRIX B     012- 45,16,12     B              B    E=C  B    A
    MATRIX 6         013- 42,16, 6     D=D-CxB        D    B    A
    1/x              014-       15     D'=Inv(D)      D    B    A
    RESULT E         015- 42,26,15     E=..
     x               016-       20     E=BxD          E    A
    RESULT B         017- 42,26,12     B=..
     x               018-       20     B=AxE=AxBxD    B
    CHS              019-       16     B'=-B          B
    RESULT A         020- 42,26,11     A=..
    RCL MATRIX C     021- 45,16,13     C              C    B
    MATRIX 6         022- 42,16, 6     A'=A-BxC       A
    RESULT E         023- 42,26,15     E=..
    RCL MATRIX D     024- 45,16,14     D              D    A
    RCL MATRIX C     025- 45,16,13     C              C    D    A
     x               026-       20     E=DxC          E    A
    CHS              027-       16     E=-E=-DxC      E    A
    STO MATRIX C     028- 44,16,13     C'=E=-DxC      E    A
    RTN              029-    43,32

    • Stack registers left blank are to be considered "don't care".
    • A', B', C' and D' are the final contents of the blocks of the computed inverse matrix.
Notes:
  • This subroutine doesn't use or alter any numbered storage registers, including the three permanent index registers R0, R1 and RI.
     
  • It also doesn't use any flags, labels (other than LBL E), branching, loops (simple or nested), logic tests of any kind or scalar operations, and it executes sequentially from the first step (001) to the last step (029), executing each step just once, which means it executes 29 user-code instructions in all, not hundreds or thousands like other approaches, so it runs very fast (e.g. 0.8" to invert a 9x9 matrix).
     
  • The inversion is performed in place: once the process ends the elements of the inverse replace those of the original matrix. Reinverting the computed inverse would get back the original matrix.
     
  • If calling it manually from the keyboard, apart from having to dimension four matrices (the blocks) instead of just one (the large original), which takes but a few extra keystrokes, all the remaining keystrokes to input the data, call the subroutine and output the results are exactly the same in number, the user's not doing any additional work because of the partitioning.
Requirements:
  • The maximum size NxN matrix you can invert depends on the memory available in your physical or virtual device as per this table, which features the A, B, C, D partition into blocks that requires the least memory (other block sizes would work equally well but would require more registers, as discussed below):

          M     A       B       C       D       E     Regs  +Prog
       -----------------------------------------------------------
         9x9   8x8     8x1     1x8     1x1     1x8      89     94  {max. size with CE/96}
       10x10   8x8     8x2     2x8     2x2     2x8     116    121
       11x11   8x8     8x3     3x8     3x3     3x8     145    150
       12x12   8x8     8x4     4x8     4x4     4x8     176    181  {ditto with CE/192}
       13x13   8x8     8x5     5x8     5x5     5x8     209    214  {ditto with DM15/M1B}
       14x14   8x8     8x6     6x8     6x6     6x8     244    249  {no current device}
       15x15   8x8     8x7     7x8     7x7     7x8     281    286  {ditto}
       16x16   8x8     8x8     8x8     8x8     8x8     320    325  {ditto}

    so, e.g. if you want to invert a 9x9 matrix you need to have 94 registers available in the common pool, which includes all 81 elements (distributed among the four blocks) plus the 8 registers automatically allocated for auxiliary matrix E, plus 5 registers to hold the 33-byte subroutine itself, so 81+8+5 = 94 registers in all, as seen in the above table. The reason why auxiliary matrix E needs to be so big is because the HP-15C can't multiply two 8x8 (say) matrices in place, a third matrix (E) is needed to store the result.

    The subroutine would work as-is, unchanged, for matrices up to and including 16x16, but as of August 2023 no physical or virtual devices exist which provide more than 229 registers because the HP-15C has a RAM limit of 256 registers and 27 of those are used internally, which means that matrices 14x14 or larger can't be processed by any currently existing device. However, it might be possible that some future patch or modification could override that restriction, in which case this subroutine could process matrices up to 16x16 without any modifications.

    Also, this subroutine can work with a partition into blocks of other sizes (say a 10x10 matrix partitioned in four blocks of size 5x5) as long as they and the corresponding auxiliary matrix E fit in available memory. This can result in speeding up the inversion process, as it's faster to invert two 5x5 blocks than an 8x8 block and a 2x2 block. However, using the recommended partition A(8x8), B(8x2), C(2x8), D(2x2) requires auxiliary matrix E to hold 8x2 = 16 elements while using the partition A(5x5), B(5x5), C(5x5), D(5x5) requires auxiliary matrix E to hold 5x5 = 25 elements, nine more.
     
  • Block A must be invertible, i.e. det(A) # 0. Also, det(D-CB) # 0. For most real-life uses this will be the case.

    If these conditions aren't met (if in doubt you can check the inverse's correctness by reinverting it, which should get the original matrix back, negligible rounding errors aside), you might try exchanging or moving rows and/or columns in the original matrix, computing the inverse, and then exchanging or moving back the corresponding columns and/or rows in the computed inverse matrix. You can also try transposing the matrix, computing the inverse, and then transposing back the inverse.

Worked examples

The following examples will show you how to use the subroutine and further assume that we're using the newly released HP-15C CE (Collector's Edition) in its default mode, i.e. with just 96 registers available to store matrix elements.

1.- Worked 5x5 Toy Example
    Note: The usefulness of this toy example (you could do the inversion directly with 1/x) is twofold: first, to get to know how to use the routine and get comfortable using it and second, to ascertain that you've loaded it correctly into program memory by running it and checking the results it produces, without having to tediously and unnecessarily input and output a large number of values.
Invert the following 5x5 matrix M; the A, B, C, D blocks are:

       │ 3 1 4 1 | 5 │       │ 3 1 4 1 │       │ 5 │
       │ 9 2 6 5 | 3 │       │ 9 2 6 5 │       │ 3 │
   M = │ 5 8 9 7 | 9 │ , A = │ 5 8 9 7 │ , B = │ 9 │ , C = │ 6 2 6 4 │ , D = │ 3 │
       │ 3 2 3 8 | 4 │       │ 3 2 3 8 │       │ 4 │
       │ 6 2 6 4 | 3 │

- Initialize and dimension the blocks: A is 4x4, B is 4x1, C is 1x4 and D is 1x1:

   CF 8, FIX 4             {ensure disabled complex stack and set 4 decimal places}
   1, DIM (i), MATRIX 0    {now MEM: 01 91 05-2}
   4, ENTER, DIM A         {now MEM: 01 75 05-2}
   1, DIM B                {now MEM: 01 71 05-2}
   4, DIM C                {now MEM: 01 67 05-2}
   1, ENTER, DIM D         {now MEM: 01 66 05-2}

   USER, MATRIX 1          {set USER mode, reset row/col indexes to first element}

- Store the 16 elements of block A:

   3, STO A, 1, STO A, 4, STO A, 1, STO A,
   9, STO A, 2, STO A, 6, STO A, 5, STO A,
   5, STO A, 8, STO A, 9, STO A, 7, STO A,
   3, STO A, 2, STO A, 3, STO A, 8, STO A

- Store the 4 elements of block B:

   5, STO B, 3, STO B, 9, STO B, 4, STO B

- Store the 4 elements of block C:

   6, STO C, 2, STO C, 6, STO C, 4, STO C

- Store the single element of block D:

   3, STO D

- Compute the inverse matrix:

   E -> E    1  4   {nearly instantaneous}

- Recall the inverse matrix elements from blocks A, B, C, D:

   RCL A ->  0.0265, RCL A ->  0.3591, ..., RCL A -> 0.1638   {16 elements}
   RCL B -> -0.3685, RCL B -> -0.3254, ..., RCL B -> 0.1054   { 4 elements}
   RCL C ->  0.2747, RCL C ->  0.1004, ..., RCL C -> 0.0585   { 4 elements}
   RCL D -> -0.3227                                           { 1 element }

so we've got the following inverse matrix blocks A', B', C', D':

        │  0.0265  0.3591  0.0127 -0.0546 │        │ -0.3685 │
   A' = │ -0.2101  0.2124  0.2118 -0.1291 │ , B' = │ -0.3254 │
        │ -0.0408 -0.4286 -0.0612 -0.0408 │        │  0.7347 │
        │ -0.0794 -0.0772 -0.0381  0.1638 │        │  0.1054 │

   C' = │  0.2747  0.1004  0.0066  0.0585 │ , D' = │ -0.3227 │

and thus the 5x5 inverse matrix M' is:

        │  0.0265  0.3591  0.0127 -0.0546 -0.3685 │
        │ -0.2101  0.2124  0.2118 -0.1291 -0.3254 │
   M' = │ -0.0408 -0.4286 -0.0612 -0.0408  0.7347 │
        │ -0.0794 -0.0772 -0.0381  0.1638  0.1054 │
        │  0.2747  0.1004  0.0066  0.0585 -0.3227 │


2.- Worked 9x9 Full-fledged Example

Invert the following 9x9 matrix M:

       │ 5  3  4  7  8  0  1  2 | 6 │
       │ 6  7  2  0  5  3  4  8 | 1 │
       │ 1  0  8  4  2  5  6  7 | 3 │
       │ 8  5  0  6  1  4  2  3 | 7 │
   M = │ 4  2  6  5  3  7  0  1 | 8 │
       │ 7  1  3  2  4  8  5  6 | 0 │
       │ 0  6  1  3  7  2  8  4 | 5 │
       │ 2  8  7  1  0  6  3  5 | 4 
       │ 3  4  5  8  6  1  7  0 | 2 │

- The A, B, C, D blocks are:

       │ 5  3  4  7  8  0  1  2 │       │ 6 │
       │ 6  7  2  0  5  3  4  8 │       │ 1 │
       │ 1  0  8  4  2  5  6  7 │       │ 3 │
   A = │ 8  5  0  6  1  4  2  3 │ , B = │ 7 │ , C = │ 3  4  5  8  6  1  7  0 │ , D = │ 2 │ 
       │ 4  2  6  5  3  7  0  1 │       │ 8 │      
       │ 7  1  3  2  4  8  5  6 │       │ 0 │
       │ 0  6  1  3  7  2  8  4 │       │ 5 │
       │ 2  8  7  1  0  6  3  5 │       │ 4 │

- Initialize and dimension the blocks: A is 8x8, B is 8x1, C is 1x8, D is 1x1:

   CF 8, FIX 4             {ensure disabled complex stack and set 4 decimal places}
   1, DIM (i), MATRIX 0    {now MEM: 01 91 05-2}
   8, ENTER, DIM A         {now MEM: 01 27 05-2}
   1, DIM B                {now MEM: 01 19 05-2}
   8, DIM C                {now MEM: 01 11 05-2}
   1, ENTER, DIM D         {now MEM: 01 10 05-2}

   USER, MATRIX 1           {set USER mode, reset row/col indexes to first element}

- Store the 64 elements of block A:

   5, STO A, 3, STO A, 4, STO A, 7, STO A, 8, STO A, 0, STO A, 1, STO A, 2, STO A,
   6, STO A, 7, STO A, 2, STO A, 0, STO A, 5, STO A, 3, STO A, 4, STO A, 8, STO A,
   1, STO A, 0, STO A, 8, STO A, 4, STO A, 2, STO A, 5, STO A, 6, STO A, 7, STO A,
   8, STO A, 5, STO A, 0, STO A, 6, STO A, 1, STO A, 4, STO A, 2, STO A, 3, STO A,
   4, STO A, 2, STO A, 6, STO A, 5, STO A, 3, STO A, 7, STO A, 0, STO A, 1, STO A,
   7, STO A, 1, STO A, 3, STO A, 2, STO A, 4, STO A, 8, STO A, 5, STO A, 6, STO A,
   0, STO A, 6, STO A, 1, STO A, 3, STO A, 7, STO A, 2, STO A, 8, STO A, 4, STO A,
   2, STO A, 8, STO A, 7, STO A, 1, STO A, 0, STO A, 6, STO A, 3, STO A, 5, STO A

- Store the 8 elements of block B:

   6, STO B, 1, STO B, 3, STO B, 7, STO B, 8, STO B, 0, STO B, 5, STO B, 4, STO B

- Store the 8 elements of block C:

   3, STO C, 4, STO C, 5, STO C, 8, STO C, 6, STO C, 1, STO C, 7, STO C, 0, STO C

- Store the single element of block D:

   2, STO D

- Compute the inverse matrix:

   E -> E    1  8]    {~ 0.8 sec.}

- Recall the inverse matrix elements from blocks A, B, C, D:

   RCL A -> -0.8879, RCL A ->  1.1441, ..., RCL A ->  0.5356  {64 elements}
   RCL B ->  0.4953, RCL B -> -0.1670, ..., RCL B -> -0.3966  { 8 elements}
   RCL C -> -0.6907, RCL C ->  0.8420, ..., RCL C -> -0.6703  { 8 elements}
   RCL D ->  0.2930                                           { 1 element }

so we've got the following inverse matrix blocks A', B', C', D':

        │ -0.8879  1.1441  0.1934 -0.0150  0.9321 -0.7169 -0.2699 -0.8474 │        │  0.4953 │
        │  0.3822 -0.4559 -0.1674  0.0274 -0.4299  0.2991  0.0893  0.4501 │        │ -0.1670 │
        │ -0.5514  0.7337  0.1725 -0.1023  0.6292 -0.5107 -0.2008 -0.4859 │        │  0.3434 │
   A' = │  1.2107 -1.5120 -0.2495  0.1405 -1.2987  0.9816  0.2347  1.0938 │ , B' = │ -0.5735 │
        │  0.2241 -0.1713 -0.0954 -0.0868 -0.1196  0.1774  0.0830  0.1148 │        │ -0.0983 │
        │  0.6076 -0.8621 -0.2155  0.0166 -0.6233  0.6452  0.1842  0.6312 │        │ -0.3560 │
        │ -0.9092  1.0035  0.2297 -0.0243  0.8488 -0.6830 -0.1314 -0.7941 │        │  0.4877 │
        │  0.6424 -0.6943 -0.0413  0.0732 -0.6939  0.4720  0.1307  0.5356 │        │ -0.3966 │

   C' = │ -0.6907  0.8420  0.2012 -0.0015  0.7832 -0.6369 -0.0921 -0.6703 │ , D' = │  0.2930 │

and thus the 9x9 inverse matrix M' is:

        │ -0.8879  1.1441  0.1934 -0.0150  0.9321 -0.7169 -0.2699 -0.8474  0.4953 │
        │  0.3822 -0.4559 -0.1674  0.0274 -0.4299  0.2991  0.0893  0.4501 -0.1670 │
        │ -0.5514  0.7337  0.1725 -0.1023  0.6292 -0.5107 -0.2008 -0.4859  0.3434 │
        │  1.2107 -1.5120 -0.2495  0.1405 -1.2987  0.9816  0.2347  1.0938 -0.5735 │
   M' = │  0.2241 -0.1713 -0.0954 -0.0868 -0.1196  0.1774  0.0830  0.1148 -0.0983 │
        │  0.6076 -0.8621 -0.2155  0.0166 -0.6233  0.6452  0.1842  0.6312 -0.3560 │
        │ -0.9092  1.0035  0.2297 -0.0243  0.8488 -0.6830 -0.1314 -0.7941  0.4877 │
        │  0.6424 -0.6943 -0.0413  0.0732 -0.6939  0.4720  0.1307  0.5356 -0.3966 │
        │ -0.6907  0.8420  0.2012 -0.0015  0.7832 -0.6369 -0.0921 -0.6703  0.2930 │

Notes:
  • If in doubt, a simple way to check the inverse's correction is to just reinvert it once you've computed it and written down its elements, which should still be undisturbed in memory. Simply run the subroutine again, like this:

       E (in User mode) or GSB E (out of User mode) -> E    1 8

    and you'll get the original matrix back (ignoring negligible rounding errors,) which you can verify by using RCL in User mode, as we did in the Examples above.
     
  • Once you're done with using the subroutine you might want to free memory by resizing all blocks A, B, C, D and the auxiliary matrix E to 0x0 as well as resetting the row/col indexes and reallocating numbered storage registers R0-R.9, like this:

       MATRIX 0, MATRIX 1, 19, DIM (i)



The implementation Part 2: Determinant

This 15-step, 18-byte subroutine (half the size of the Matrix Inversion one and three times as fast) will compute and display the determinant of an NxN partitioned matrix for N ≤ 16 (subject to available memory). It takes no inputs but the caller (the user or another program) must have previously dimensioned and populated the four blocks with the elements of the original matrix.

To compute the determinant we proceed as follows:
  1. Let's have an NxN matrix M partitioned in 4 blocks A, B, C and D, where A is a square matrix of size at most 8x8 and D is a non-singular matrix:

        M = | A  B |
            | C  D | 

               
  2. Then we have:

            Det(M) = Det(A - B x D-1 x C) * Det(D)
Once it returns, the computed determinant is left in the X stack register. In other words, this subroutine can be called from the keyboard or another program (in which case it could be directly embedded into it if it's called just once) but it doesn't perform any input operations, it just computes and leaves the determinant in the X stack register (i.e. the display.)

Program listing

   ►LBL D          001- 42,21,14     Comment       X        Y        Z        T     
    RESULT D       002- 42,26,14     D=..
    RCL MATRIX D   003- 45,16,14     D             D
    MATRIX 9       004- 42,16, 9     Det(D)        Det(D)
    RCL MATRIX B   005- 45,16,12     B             B        Det(D)     
    LASTX          006-    43 36     D             D        B        Det(D)
    1/x            007-       15     D=Inv(D)      Inv(D)   B        Det(D)
    RESULT E       008- 42,26,15     E=..
    RCL MATRIX C   009- 45,16,13     C             C        Inv(D)   B        Det(D)
     x             010-       20     E=Inv(D)xC    E        B        Det(D)
    RESULT A       011- 42,26,11     A=..
    MATRIX 6       012- 42,16, 6     A=A-BxE       A        Det(D)
    MATRIX 9       013- 42,16, 9     Det(A)        Det(A)   Det(D)
     x             014-       20     Det(M)        Det(M)
    RTN            015-    43 32

Notes:

All the Notes for the Matrix Inversion subroutine exactly apply, with the following changes:
  • It sequentially executes 15 user-code instructions in all, not hundreds or thousands like other approaches, so it runs very fast (e.g. ~ 0.2" to compute the determinant of a 9x9 matrix).
     
  • After execution is complete, the original matrix is left irretrievably altered.
     
Requirements:

All the Requirements for the Matrix Inversion subroutine apply exactly, with the following changes:
  • Block D is the one which must be invertible, i.e. det(D) # 0. For most real-life uses this will be the case.

    If this condition isn't met follow the guidelines given for Matrix Inversion above.

Worked examples

The following examples will show you how to use the subroutine and further assume that we're using the newly released HP-15C CE (Collector's Edition) in its default mode, i.e. with just 96 registers available to store matrix elements.

1.- Worked 5x5 Toy Example

The 5x5 matrix used in this example is the same as the one used in the corresponding example for Matrix Inversion above, the only difference being that now you're asked to compute its determinant instead of its inverse, so just exactly follow the initialization and input instructions indicated there until you are about to perform the computation part, where you should now do the following:

- Compute the determinant:

   D -> -1,813.0000   {nearly instantaneous}

As no inverse matrix es computed, its output instructions don't apply here.

2.- Worked 9x9 Full-fledged Example

The 9x9 matrix used in this example is the same as the one used in the corresponding example for Matrix Inversion above, the only difference being that now you're asked to compute its determinant instead of its inverse, so just follow exactly the initialization and input instructions indicated there until you are about to perform the computation part, where you should now do the following:

- Compute the determinant:

   D -> -10,278,575.95   {~ 0.2 sec.; exact is -10,278,576}

As no inverse matrix es computed, its output instructions don't apply here.


Notes:

You might consider applying the same finalization instructions as described in the final Notes for Matrix Inversion above.




The implementation Part 3: All together now

There are two relevant questions:

1.- Can both subroutines fit at the same time in the default CE/96 mode ?

Yes, both are completely standalone (neither one requires the other or its results) but the Inversion routine is 33 bytes long while the Determinant routine is 18 bytes long, so they would seem to occupy 51 bytes together.

However, if you change the final RTN instruction at step 029 in the former by an R/S instruction and delete the initial LBL D at step 001 and the RTN at step 015 in the latter, you can concatenate both and the combination will fit in 49 bytes (7 registers,) which is exactly all the memory that is available for the code if you want to process a 9x9 matrix in the standard CE/96 mode.

But if using the extended CE/192 mode instead, the question is moot as both routines can be present at the same time, unmodified, and there's plenty of room for larger matrices or additional programs.

2.- Is it possible to compute both Inverse and Determinant without having to reintroduce the original data ?

Yes, it is quite possible by using either one of two simple tricks.

Assuming both routines are present in program memory, say you want to compute the inverse and the determinant of a large matrix without having to reintroduce the matrix elements. Do as follows:
    - Dimension the blocks and store the elements of the original matrix in their respective blocks

    - Execute subroutine E to compute the inverse matrix.

    - Recall and write down the elements of the inverse matrix

    - Now you have two options to avoid reintroducing the original matrix again, namely either

      a) Reinvert the matrix to get back the original matrix by executing subroutine E once again, then execute subroutine D to compute its determinant. or

      b) Execute subroutine D to compute the inverse's determinant, then (out of User mode) press 1/x and you'll get the determinant of the original matrix, as DET(A) = 1/DET(Inverse of A).



Timings

This Section includes a short 42-step timing-helper program which can be used to obtain arbitrarily accurate timings for both my matrix inverse and determinant subroutines for large N, namely 9 ≤ N ≤ 16, depending on the particular device's available memory.

The program runs on the CE/192 for 9 ≤ N ≤ 12 (the CE/96 lacks the necessary memory) as well as any other devices physical or virtual which have enough memory. The DM15 with firmware M1B might be able to also do N=13, but no current device has enough memory to do 14 ≤ N ≤ 16 though both this timing program and the subroutines themselves would work unchanged. 

Program listing  {the ► are purely cosmetic}

    001 ►LBL A        018  RCL MATRIX A     035 ►LBL 0
    002  CF 8         019  GSB 0 ►          036  STO I
    003   1           020  RCL MATRIX B     037  MATRIX 1
    004  DIM (i)      021  GSB 0 ►          038 ►LBL 1
    005  X<>Y         022  RCL MATRIX C     039   RAN#
    006  MATRIX 0     023  GSB 0 ►          040u  STO (i)  (*)
    007   8           024  RCL MATRIX D     041  GTO 1 ►
    008   -           025  GSB 0 ►          042  RTN
    009   8           026   2               043 ►LBL E        043 ►LBL D
    010  DIM C        027   0               ...  ...     or   ...  ...
    011  X<>Y         028  R/S              071  RTN          057  RTN
    012  DIM B        029  STO I
    013  ENTER        030 ►LBL 2
    014  DIM D        031   GSB E ►  or  GSB D ►
    015   8           032   DSE I
    016  ENTER        033  GTO 2 ►
    017  DIM A        034  RTN

             (*) Step 040u STO (i) must be entered in USER mode.
    Notes:

    • Steps 001-017 initialize the program (disable the complex stack, allocate all pool registers for matrices, reset all matrices to 0x0) and dimension all 4 blocks A, B, C, D to their respective sizes: 8x8, 8x(N-8), (N-8)x8 and (N-8)x(N-8).
       
    • Steps 018-028 fill up all 4 blocks with random values, then stop with the default number of loops (20) in the display, ready for the user to start the loops and the timing.
       
    • Steps 029-034 perform the specified number of loops, calling the partitioned matrix inversion subroutine (LBL E) or the determinant subroutine (LBL D) that many times, then execution ends.
       
    • Steps 035-042 implement a subroutine that fills up with random values the matrix whose descriptor is passed to it in the X stack register.
       
    • Steps 043-071 or 043-057 implement my partitioned matrix inversion subroutine (LBL E) or my determinant subroutine (LBL D), respectively, whose listings can be found above.

Usage Instructions
    Note: This step is not necessary, but if you want to check that the program has been correctly entered and works as intended, do this:

         1, STO RAN#, 9, GSB A ->  20


    and if now you recall the contents of the first element of matrices A and D you should get (in FIX 4):

         A1,1 = 0.2018D1,1 = 0.1746
This program works strictly for NxN matrices with 9 ≤ N ≤ 16 (subject to available memory; N ≤ 12 for the CE/192). Inputting N outside this range will generate an error. To time the inversion/determinant of an NxN matrix, do as follows:
  • First, run the program to dimension and quickly fill up all NxN elements with random values, then it stops with the default number of loops (20) in the display:

         FIX 4, N, GSB A -> 20

    The user can either accept this default (20 loops i.e. 20 NxN matrix inversions or determinants will be computed) or else key in another number of loops, say 50 or whatever. The more loops, the more time it takes to perform them all but the greater the timing accuracy. Assuming the user can get the final time accurate to the nearest second, the timing accuracy will be 1/#loops i.e. 1/20 = 0.05" for the default.
     
  • Now the user must simultaneously press R/S and start the timing, closely watching for the program to end and taking note of the final time.

         R/S -> E  1  8  (say, depends on N) for the inversion or some numeric value for the determinant.

    and dividing the final time by the number of loops will give the time per NxN operation accurate to 1/#loops.
       
Results

These are the times I obtain on my CE/192 when using the above program with fresh lithium batteries (yours may vary slightly.) Note that I use the default 20 loops to time the inversion subroutine but change it to 50 loops when timing the determinant subroutine, as it is about 3x faster:

                                       Inverse        Determinant
         Dim   Block A  Block D   20 loops  time    50 loops  time
       -------------------------------------------------------------
         9x9     8x8      1x1       18"     0.90"      17"    0.34"
        10x10    8x8      2x2       24"     1.20"      22"    0.44"
        11x11    8x8      3x3       31"     1.55"      29"    0.58"
        12x12    8x8      4x4       40"     2.00"      37"    0.74"

Thus, when using my matrix inversion subroutine, inverting a 9x9 matrix takes my HP-15C CE less than one second, and inverting a sizable 12x12 matrix takes about two seconds flat.

As for the determinant subroutine, it can compute the determinant of a 9x9 matrix in 1/3 of a second (i.e. it can compute three 9x9 determinants per second) and that of a 12x12 matrix in less than 3/4 of a second.




Further considerations

For those interested, there's a number of additional things to do if you feel like it, e.g.:
  1. In this article I've provided subroutines for real matrix inversion and determinant computations but I haven't provided one for system solving, which would complete the big-matrix functionality and might be quite useful for large systems.
     
  2. My subroutine for matrix inversion takes a 4-block partitioned NxN matrix as input and does the inversion in place, so the matrix inverse is also in the same form and can be manually output by the user or used by some other program as-is. However, it might be useful to write two supporting, complementary subroutines:

    1. One to convert a non-partioned NxN matrix to a 4-block partitioned matrix, as needed by the matrix inverse and determinant subroutines.
       
    2. Another one to convert the partitioned NxN matrix inverse to a single non-partitioned NxN matrix. This might simplify further operations with the matrix inverse (e.g. system solving) and would also free 3 matrix descriptors for other purposes.
     
  3. You can call my inversion subroutine from one program of yours to perform, say, Nth-degree Polynomial Fit or Nth-degree Least-Squares Polynomial Regression or Multivariate Linear Regression, or any other tasks which would benefit from using an NxN matrix inverse and/or determinant for 9 ≤ N ≤ 12 (subject to available memory.)



References

      Tzon-Tzer Lu & Sheng-Hua Shiou          Inverses of 2 × 2 Block Matrices
      John R Silvester                                       Determinants of Block Matrices


Copyrights

Copyright for this article and its contents is retained by the author. Permission to use it for non-profit purposes is granted as long as the contents aren’t modified in any way and the copyright is acknowledged.

For the purposes of this copyright, the definition of non-profit does not include publishing this article in any media for which a subscription fee is asked and thus such use is strictly disallowed without explicit written permision granted by the author
.

 
 
Epic post Valentin, another astounding piece of work. I look forward to reading it this weekend when I can dedicate the time needed to read it through and appreciate it.
Reference URL's