HP Article VA058  Boldly Going  HP15C CE Big Matrix Woes

11302023, 12:47 AM
Post: #1




HP Article VA058  Boldly Going  HP15C CE Big Matrix Woes
Boldly Going  HP15C CE Big Matrix Woes
© 2023 Valentín Albillo
This article was created to commemorate the recent general availability of the HP15C Collector's Edition, which boasts enormously increased processing speed (up to 180x) and memory (up to 3x more RAM,) relative to the original. However, there's a significant problem which needs addressing. The Problem The problem is that although the memory is vastly increased in most 15C clones (physical or virtual), allowing the dimensioning and use of matrices larger than 8x8, some very important advanced matrix operations (namely matrix inversion, system solving and determinant) can't be confidently performed using such large matrices (unreliable results, blocked calculator) because of firmware limitations having to do with the internal LU decomposition, which are too difficult or even impossible to overcome via adhoc patches. The Sleuthing Fully aware of this, I investigated what could I do about it, and first of all I looked at my own NxN Matrix Inversion program for the HP41C (which I wrote and published more than four decades ago,) in order to try and convert it to run on the CE, but I sadly realized that the conversion wasn't viable because the 41C program is too long (170 steps, 40 registers) and needs 11 numbered data registers to work, for a grand total of 51 registers. This would severely limit the maximum size of the matrices to process. Secondly, the 41C program uses a lot of flags (e.g. 13+1 for 13x13 matrices), which the CE doesn't have, and additionally it uses a lot of indirect register storage and recall using several index registers, while the CE has only one, namely register I (three if you also consider the row/col index registers R0 and R1,) so this would considerably complicate and slow the conversion, needing special care with the stack contents. This would further increase program size and running time significantly, as the program uses nested loops and thus executes many hundreds or even thousands of RPN instructions to perform the inversion. In short, not viable. Then, I investigated whether it would be possible to write a program to create and use the LU decomposition as the firmware does but without the dreaded 8x8 limitation, and though this would possibly do, it would nevertheless result in long, complicated code which would further need additional registers to work, thus limiting seriously the size of the matrices it could process. It would also need to execute hundreds or thousands of RPN instructions, not firmware microcode. Viable but difficult and with suboptimal performance, so What to do ? My Solution To adopt an entirely new strategy, that's what. Come to think of it, the real problem here is that the HP15C's firmware can't obtain the LU decomposition (an so invert/systemsolve/compute the determinant) of a matrix larger than 8x8. Well, let's avoid it altogether by using some strategy which allows us to use the builtin microcoded inversion instruction for speed, but making sure it never has to invert a matrix larger than 8x8 ! Enter partitioned matrices. A partitioned matrix is a matrix considered to be partitioned into a number of submatrices, which we'll call blocks. The blocks can be of arbitrary sizes, for example, this 5x5 matrix M can be partitioned into four unequal blocks A (4x4), B (4x1), C (1x4) and D (1x1), like this: │ 3 1 4 1 5 │ │ 3 1 4 1 │ │ 5 │ │ 9 2 6 5 3 │ │ 9 2 6 5 │ │ 3 │ M = │ 5 8 9 7 9 │ =  A B  , A = │ 5 8 9 7 │ , B = │ 9 │ , C = │ 6 2 6 4 │ , D = │ 3 │ │ 3 2 3 8 4 │  C D  │ 3 2 3 8 │ │ 4 │ │ 6 2 6 4 3 │ and of course the idea is that there are efficient algorithms to deal with such partitioned matrices by interacting directly with their blocks, in particular for inverting the matrix or computing its determinant, without ever needing to process any block larger than the largest one, whose size we can choose. Note that regardless of their sizes, every set of blocks represents the same original matrix and operations performed using them will result in the same outcome, but some sizes are better one way or another. For instance, my implementation here of the algorithm to invert a matrix (similarly for computing its determinant) requires that it must be partitioned into 4 blocks and that block A must be a square matrix no larger than 8x8 (which forces block D to also be a square matrix,) because it eventually needs to compute the inverse of A using the builtin [1/x] instruction. Also very important, to achieve maximum speed it's essential that my routine spends most of its time executing microcode, so that's why block A must be dimensioned to be as large as possible without exceeding the 8x8 limit and thus for original NxN matrices larger than 8x8 we'll dimension A to be exactly 8x8, which forces the sizes of all remaining blocks: B will then be 8x(N8), C will be (N8)x8, D will be (N8)x(N8) and as long as N ≤ 16 no block will be larger than 8x8, as required. And last but not least, all the code here will run on any HP15C version, including the original, the Limited Edition (patched for extra RAM or not), the Collector's Edition (in its default or extended RAM modes), the DM15's various versions and firmwares, as well as assorted true emulators.
The implementation Part 1: Matrix Inversion This 29step, 33byte subroutine will invert in place an NxN partitioned matrix for N ≤ 16 (subject to available memory). It takes no inputs but the caller (the user or another program) must have previously dimensioned and populated the four blocks with the elements of the original matrix. The inversion proceeds as follows:
Program listing ►LBL E 001 42,21,15 Comments X Y Z T RESULT A 002 42,26,11 A=.. RCL MATRIX A 003 45,16,11 A A RCL MATRIX B 004 45,16,12 B B A RCL MATRIX C 005 45,16,13 C C B A RCL MATRIX A 006 45,16,11 A A C B A 1/x 007 15 A=Inv(A) A C B A RESULT E 008 42,26,15 E=.. x 009 20 E=CxA E B A STO MATRIX C 010 44,16,13 C=E=CxA E B A RESULT D 011 42,26,14 D=.. RCL MATRIX B 012 45,16,12 B B E=C B A MATRIX 6 013 42,16, 6 D=DCxB D B A 1/x 014 15 D'=Inv(D) D B A RESULT E 015 42,26,15 E=.. x 016 20 E=BxD E A RESULT B 017 42,26,12 B=.. x 018 20 B=AxE=AxBxD B CHS 019 16 B'=B B RESULT A 020 42,26,11 A=.. RCL MATRIX C 021 45,16,13 C C B MATRIX 6 022 42,16, 6 A'=ABxC A RESULT E 023 42,26,15 E=.. RCL MATRIX D 024 45,16,14 D D A RCL MATRIX C 025 45,16,13 C C D A x 026 20 E=DxC E A CHS 027 16 E=E=DxC E A STO MATRIX C 028 44,16,13 C'=E=DxC E A RTN 029 43,32
Worked examples The following examples will show you how to use the subroutine and further assume that we're using the newly released HP15C CE (Collector's Edition) in its default mode, i.e. with just 96 registers available to store matrix elements. 1. Worked 5x5 Toy Example
│ 3 1 4 1  5 │ │ 3 1 4 1 │ │ 5 │ │ 9 2 6 5  3 │ │ 9 2 6 5 │ │ 3 │ M = │ 5 8 9 7  9 │ , A = │ 5 8 9 7 │ , B = │ 9 │ , C = │ 6 2 6 4 │ , D = │ 3 │ │ 3 2 3 8  4 │ │ 3 2 3 8 │ │ 4 │ │ 6 2 6 4  3 │  Initialize and dimension the blocks: A is 4x4, B is 4x1, C is 1x4 and D is 1x1: CF 8, FIX 4 {ensure disabled complex stack and set 4 decimal places} 1, DIM (i), MATRIX 0 {now MEM: 01 91 052} 4, ENTER, DIM A {now MEM: 01 75 052} 1, DIM B {now MEM: 01 71 052} 4, DIM C {now MEM: 01 67 052} 1, ENTER, DIM D {now MEM: 01 66 052} USER, MATRIX 1 {set USER mode, reset row/col indexes to first element}  Store the 16 elements of block A: 3, STO A, 1, STO A, 4, STO A, 1, STO A, 9, STO A, 2, STO A, 6, STO A, 5, STO A, 5, STO A, 8, STO A, 9, STO A, 7, STO A, 3, STO A, 2, STO A, 3, STO A, 8, STO A  Store the 4 elements of block B: 5, STO B, 3, STO B, 9, STO B, 4, STO B  Store the 4 elements of block C: 6, STO C, 2, STO C, 6, STO C, 4, STO C  Store the single element of block D: 3, STO D  Compute the inverse matrix: E > E 1 4 {nearly instantaneous}  Recall the inverse matrix elements from blocks A, B, C, D: RCL A > 0.0265, RCL A > 0.3591, ..., RCL A > 0.1638 {16 elements} RCL B > 0.3685, RCL B > 0.3254, ..., RCL B > 0.1054 { 4 elements} RCL C > 0.2747, RCL C > 0.1004, ..., RCL C > 0.0585 { 4 elements} RCL D > 0.3227 { 1 element } so we've got the following inverse matrix blocks A', B', C', D': │ 0.0265 0.3591 0.0127 0.0546 │ │ 0.3685 │ A' = │ 0.2101 0.2124 0.2118 0.1291 │ , B' = │ 0.3254 │ │ 0.0408 0.4286 0.0612 0.0408 │ │ 0.7347 │ │ 0.0794 0.0772 0.0381 0.1638 │ │ 0.1054 │ C' = │ 0.2747 0.1004 0.0066 0.0585 │ , D' = │ 0.3227 │ and thus the 5x5 inverse matrix M' is: │ 0.0265 0.3591 0.0127 0.0546 0.3685 │ │ 0.2101 0.2124 0.2118 0.1291 0.3254 │ M' = │ 0.0408 0.4286 0.0612 0.0408 0.7347 │ │ 0.0794 0.0772 0.0381 0.1638 0.1054 │ │ 0.2747 0.1004 0.0066 0.0585 0.3227 │ 2. Worked 9x9 Fullfledged Example Invert the following 9x9 matrix M: │ 5 3 4 7 8 0 1 2  6 │ │ 6 7 2 0 5 3 4 8  1 │ │ 1 0 8 4 2 5 6 7  3 │ │ 8 5 0 6 1 4 2 3  7 │ M = │ 4 2 6 5 3 7 0 1  8 │ │ 7 1 3 2 4 8 5 6  0 │ │ 0 6 1 3 7 2 8 4  5 │ │ 2 8 7 1 0 6 3 5  4 │ │ 3 4 5 8 6 1 7 0  2 │  The A, B, C, D blocks are: │ 5 3 4 7 8 0 1 2 │ │ 6 │ │ 6 7 2 0 5 3 4 8 │ │ 1 │ │ 1 0 8 4 2 5 6 7 │ │ 3 │ A = │ 8 5 0 6 1 4 2 3 │ , B = │ 7 │ , C = │ 3 4 5 8 6 1 7 0 │ , D = │ 2 │ │ 4 2 6 5 3 7 0 1 │ │ 8 │ │ 7 1 3 2 4 8 5 6 │ │ 0 │ │ 0 6 1 3 7 2 8 4 │ │ 5 │ │ 2 8 7 1 0 6 3 5 │ │ 4 │  Initialize and dimension the blocks: A is 8x8, B is 8x1, C is 1x8, D is 1x1: CF 8, FIX 4 {ensure disabled complex stack and set 4 decimal places} 1, DIM (i), MATRIX 0 {now MEM: 01 91 052} 8, ENTER, DIM A {now MEM: 01 27 052} 1, DIM B {now MEM: 01 19 052} 8, DIM C {now MEM: 01 11 052} 1, ENTER, DIM D {now MEM: 01 10 052} USER, MATRIX 1 {set USER mode, reset row/col indexes to first element}  Store the 64 elements of block A: 5, STO A, 3, STO A, 4, STO A, 7, STO A, 8, STO A, 0, STO A, 1, STO A, 2, STO A, 6, STO A, 7, STO A, 2, STO A, 0, STO A, 5, STO A, 3, STO A, 4, STO A, 8, STO A, 1, STO A, 0, STO A, 8, STO A, 4, STO A, 2, STO A, 5, STO A, 6, STO A, 7, STO A, 8, STO A, 5, STO A, 0, STO A, 6, STO A, 1, STO A, 4, STO A, 2, STO A, 3, STO A, 4, STO A, 2, STO A, 6, STO A, 5, STO A, 3, STO A, 7, STO A, 0, STO A, 1, STO A, 7, STO A, 1, STO A, 3, STO A, 2, STO A, 4, STO A, 8, STO A, 5, STO A, 6, STO A, 0, STO A, 6, STO A, 1, STO A, 3, STO A, 7, STO A, 2, STO A, 8, STO A, 4, STO A, 2, STO A, 8, STO A, 7, STO A, 1, STO A, 0, STO A, 6, STO A, 3, STO A, 5, STO A  Store the 8 elements of block B: 6, STO B, 1, STO B, 3, STO B, 7, STO B, 8, STO B, 0, STO B, 5, STO B, 4, STO B  Store the 8 elements of block C: 3, STO C, 4, STO C, 5, STO C, 8, STO C, 6, STO C, 1, STO C, 7, STO C, 0, STO C  Store the single element of block D: 2, STO D  Compute the inverse matrix: E > E 1 8] {~ 0.8 sec.}  Recall the inverse matrix elements from blocks A, B, C, D: RCL A > 0.8879, RCL A > 1.1441, ..., RCL A > 0.5356 {64 elements} RCL B > 0.4953, RCL B > 0.1670, ..., RCL B > 0.3966 { 8 elements} RCL C > 0.6907, RCL C > 0.8420, ..., RCL C > 0.6703 { 8 elements} RCL D > 0.2930 { 1 element } so we've got the following inverse matrix blocks A', B', C', D': │ 0.8879 1.1441 0.1934 0.0150 0.9321 0.7169 0.2699 0.8474 │ │ 0.4953 │ │ 0.3822 0.4559 0.1674 0.0274 0.4299 0.2991 0.0893 0.4501 │ │ 0.1670 │ │ 0.5514 0.7337 0.1725 0.1023 0.6292 0.5107 0.2008 0.4859 │ │ 0.3434 │ A' = │ 1.2107 1.5120 0.2495 0.1405 1.2987 0.9816 0.2347 1.0938 │ , B' = │ 0.5735 │ │ 0.2241 0.1713 0.0954 0.0868 0.1196 0.1774 0.0830 0.1148 │ │ 0.0983 │ │ 0.6076 0.8621 0.2155 0.0166 0.6233 0.6452 0.1842 0.6312 │ │ 0.3560 │ │ 0.9092 1.0035 0.2297 0.0243 0.8488 0.6830 0.1314 0.7941 │ │ 0.4877 │ │ 0.6424 0.6943 0.0413 0.0732 0.6939 0.4720 0.1307 0.5356 │ │ 0.3966 │ C' = │ 0.6907 0.8420 0.2012 0.0015 0.7832 0.6369 0.0921 0.6703 │ , D' = │ 0.2930 │ and thus the 9x9 inverse matrix M' is: │ 0.8879 1.1441 0.1934 0.0150 0.9321 0.7169 0.2699 0.8474 0.4953 │ │ 0.3822 0.4559 0.1674 0.0274 0.4299 0.2991 0.0893 0.4501 0.1670 │ │ 0.5514 0.7337 0.1725 0.1023 0.6292 0.5107 0.2008 0.4859 0.3434 │ │ 1.2107 1.5120 0.2495 0.1405 1.2987 0.9816 0.2347 1.0938 0.5735 │ M' = │ 0.2241 0.1713 0.0954 0.0868 0.1196 0.1774 0.0830 0.1148 0.0983 │ │ 0.6076 0.8621 0.2155 0.0166 0.6233 0.6452 0.1842 0.6312 0.3560 │ │ 0.9092 1.0035 0.2297 0.0243 0.8488 0.6830 0.1314 0.7941 0.4877 │ │ 0.6424 0.6943 0.0413 0.0732 0.6939 0.4720 0.1307 0.5356 0.3966 │ │ 0.6907 0.8420 0.2012 0.0015 0.7832 0.6369 0.0921 0.6703 0.2930 │ Notes:
The implementation Part 2: Determinant This 15step, 18byte subroutine (half the size of the Matrix Inversion one and three times as fast) will compute and display the determinant of an NxN partitioned matrix for N ≤ 16 (subject to available memory). It takes no inputs but the caller (the user or another program) must have previously dimensioned and populated the four blocks with the elements of the original matrix. To compute the determinant we proceed as follows:
Program listing ►LBL D 001 42,21,14 Comment X Y Z T RESULT D 002 42,26,14 D=.. RCL MATRIX D 003 45,16,14 D D MATRIX 9 004 42,16, 9 Det(D) Det(D) RCL MATRIX B 005 45,16,12 B B Det(D) LASTX 006 43 36 D D B Det(D) 1/x 007 15 D=Inv(D) Inv(D) B Det(D) RESULT E 008 42,26,15 E=.. RCL MATRIX C 009 45,16,13 C C Inv(D) B Det(D) x 010 20 E=Inv(D)xC E B Det(D) RESULT A 011 42,26,11 A=.. MATRIX 6 012 42,16, 6 A=ABxE A Det(D) MATRIX 9 013 42,16, 9 Det(A) Det(A) Det(D) x 014 20 Det(M) Det(M) RTN 015 43 32 Notes: All the Notes for the Matrix Inversion subroutine exactly apply, with the following changes:
All the Requirements for the Matrix Inversion subroutine apply exactly, with the following changes:
Worked examples The following examples will show you how to use the subroutine and further assume that we're using the newly released HP15C CE (Collector's Edition) in its default mode, i.e. with just 96 registers available to store matrix elements. 1. Worked 5x5 Toy Example The 5x5 matrix used in this example is the same as the one used in the corresponding example for Matrix Inversion above, the only difference being that now you're asked to compute its determinant instead of its inverse, so just exactly follow the initialization and input instructions indicated there until you are about to perform the computation part, where you should now do the following:  Compute the determinant: D > 1,813.0000 {nearly instantaneous} As no inverse matrix es computed, its output instructions don't apply here. 2. Worked 9x9 Fullfledged Example The 9x9 matrix used in this example is the same as the one used in the corresponding example for Matrix Inversion above, the only difference being that now you're asked to compute its determinant instead of its inverse, so just follow exactly the initialization and input instructions indicated there until you are about to perform the computation part, where you should now do the following:  Compute the determinant: D > 10,278,575.95 {~ 0.2 sec.; exact is 10,278,576} As no inverse matrix es computed, its output instructions don't apply here. Notes: You might consider applying the same finalization instructions as described in the final Notes for Matrix Inversion above. The implementation Part 3: All together now There are two relevant questions: 1. Can both subroutines fit at the same time in the default CE/96 mode ? Yes, both are completely standalone (neither one requires the other or its results) but the Inversion routine is 33 bytes long while the Determinant routine is 18 bytes long, so they would seem to occupy 51 bytes together. However, if you change the final RTN instruction at step 029 in the former by an R/S instruction and delete the initial LBL D at step 001 and the RTN at step 015 in the latter, you can concatenate both and the combination will fit in 49 bytes (7 registers,) which is exactly all the memory that is available for the code if you want to process a 9x9 matrix in the standard CE/96 mode. But if using the extended CE/192 mode instead, the question is moot as both routines can be present at the same time, unmodified, and there's plenty of room for larger matrices or additional programs. 2. Is it possible to compute both Inverse and Determinant without having to reintroduce the original data ? Yes, it is quite possible by using either one of two simple tricks. Assuming both routines are present in program memory, say you want to compute the inverse and the determinant of a large matrix without having to reintroduce the matrix elements. Do as follows:
 Execute subroutine E to compute the inverse matrix.  Recall and write down the elements of the inverse matrix  Now you have two options to avoid reintroducing the original matrix again, namely either
b) Execute subroutine D to compute the inverse's determinant, then (out of User mode) press 1/x and you'll get the determinant of the original matrix, as DET(A) = 1/DET(Inverse of A). Timings This Section includes a short 42step timinghelper program which can be used to obtain arbitrarily accurate timings for both my matrix inverse and determinant subroutines for large N, namely 9 ≤ N ≤ 16, depending on the particular device's available memory. The program runs on the CE/192 for 9 ≤ N ≤ 12 (the CE/96 lacks the necessary memory) as well as any other devices physical or virtual which have enough memory. The DM15 with firmware M1B might be able to also do N=13, but no current device has enough memory to do 14 ≤ N ≤ 16 though both this timing program and the subroutines themselves would work unchanged. Program listing {the ► are purely cosmetic} 001 ►LBL A 018 RCL MATRIX A 035 ►LBL 0 002 CF 8 019 GSB 0 ► 036 STO I 003 1 020 RCL MATRIX B 037 MATRIX 1 004 DIM (i) 021 GSB 0 ► 038 ►LBL 1 005 X<>Y 022 RCL MATRIX C 039 RAN# 006 MATRIX 0 023 GSB 0 ► 040u STO (i) (*) 007 8 024 RCL MATRIX D 041 GTO 1 ► 008  025 GSB 0 ► 042 RTN 009 8 026 2 043 ►LBL E 043 ►LBL D 010 DIM C 027 0 ... ... or ... ... 011 X<>Y 028 R/S 071 RTN 057 RTN 012 DIM B 029 STO I 013 ENTER 030 ►LBL 2 014 DIM D 031 GSB E ► or GSB D ► 015 8 032 DSE I 016 ENTER 033 GTO 2 ► 017 DIM A 034 RTN (*) Step 040u STO (i) must be entered in USER mode.
Usage Instructions
1, STO RAN#, 9, GSB A > 20 and if now you recall the contents of the first element of matrices A and D you should get (in FIX 4): A1,1 = 0.2018, D1,1 = 0.1746
Results These are the times I obtain on my CE/192 when using the above program with fresh lithium batteries (yours may vary slightly.) Note that I use the default 20 loops to time the inversion subroutine but change it to 50 loops when timing the determinant subroutine, as it is about 3x faster: Inverse Determinant Dim Block A Block D 20 loops time 50 loops time  9x9 8x8 1x1 18" 0.90" 17" 0.34" 10x10 8x8 2x2 24" 1.20" 22" 0.44" 11x11 8x8 3x3 31" 1.55" 29" 0.58" 12x12 8x8 4x4 40" 2.00" 37" 0.74" Thus, when using my matrix inversion subroutine, inverting a 9x9 matrix takes my HP15C CE less than one second, and inverting a sizable 12x12 matrix takes about two seconds flat. As for the determinant subroutine, it can compute the determinant of a 9x9 matrix in 1/3 of a second (i.e. it can compute three 9x9 determinants per second) and that of a 12x12 matrix in less than 3/4 of a second. Further considerations For those interested, there's a number of additional things to do if you feel like it, e.g.:
References TzonTzer Lu & ShengHua Shiou Inverses of 2 × 2 Block Matrices John R Silvester Determinants of Block Matrices Copyrights Copyright for this article and its contents is retained by the author. Permission to use it for nonprofit purposes is granted as long as the contents aren’t modified in any way and the copyright is acknowledged. For the purposes of this copyright, the definition of nonprofit does not include publishing this article in any media for which a subscription fee is asked and thus such use is strictly disallowed without explicit written permision granted by the author. All My Articles & other Materials here: Valentin Albillo's HP Collection 

11302023, 02:19 AM
Post: #2




RE: HP Article VA058  Boldly Going  HP15C CE Big Matrix Woes
Epic post Valentin, another astounding piece of work. I look forward to reading it this weekend when I can dedicate the time needed to read it through and appreciate it.
Bob Prosperi 

« Next Oldest  Next Newest »

User(s) browsing this thread: 1 Guest(s)