Post Reply 
Matrix Division on the 28C/S (and later)
03-04-2020, 01:41 PM
Post: #21
RE: Matrix Division on the 28C/S (and later)
(03-04-2020 09:23 AM)Werner Wrote:  
Code:
@ dot test 1: input a, result = DOT([a 1 -1],[1 1 1])
>LBL "DT1" 
 3
 1
 NEWMAT
 EDIT
 Rv
 \->
 1
 \->
 -1
 EXITALL
 ENTER
 SIGN
 ABS
 DOT
 END

Slightly changing the test to DOT( [ a -1 1 ] , [ 1 1 1] ), I noticed this difference between the 71B and the 28S/42S (and later RPL machines I guess):
Using a=1E-20 (or any values <5E-16):
71B: result is 1E-15 !
28S/42S: result is correctly rounded to 0.

Something I will investigate for the HP71 Math ROM 2 :-)

J-F
Visit this user's website Find all posts by this user
Quote this message in a reply
03-04-2020, 04:39 PM (This post was last modified: 03-04-2020 07:00 PM by Albert Chan.)
Post: #22
RE: Matrix Division on the 28C/S (and later)
(03-04-2020 12:29 PM)Thomas Okken Wrote:  I'll write the new version of DOT next week, I think. I'll have to add FMA to the "phloat" functions, using bid128_fma() in the decimal version, and (I think) std::fma() in the binary version, and then implement the functions from your citations.
It looks pretty straightforward.

DotFMA probably will not gain much accuracy.
Example, if we fuse for every step, with tiny a, such that a+1=1

DotFMA([a 1 -1], [1 1 1]) = (a+1) - 1 = 1-1 = 0

However, fusing result at the end help (essentially "double" precision until final sum)
see CompDot, CompDotFMA, http://rnc7.loria.fr/louvet_poster.pdf


(03-04-2020 01:41 PM)J-F Garnier Wrote:  Slightly changing the test to DOT( [ a -1 1 ] , [ 1 1 1] ), I noticed this difference between the 71B and the 28S/42S
(and later RPL machines I guess):
Using a=1E-20 (or any values <5E-16):
71B: result is 1E-15 !
28S/42S: result is correctly rounded to 0.

It seems 71B internal 15-digits precision is using Round-to-zero (chopping excess digits)
Say, a = 1.00000000001e-15

DOT([a 1 -1], [1 1 1]) = (1.0000 00000 00000 10000 00000 01) - 1 ≈ (1.0000 00000 00000) - 1 = 0

DOT([a -1 1], [1 1 1]) = (-0.99999 99999 99998 99999 999999) + 1 ≈ (-0.99999 99999 99998) + 1 = 2E-15
Find all posts by this user
Quote this message in a reply
03-05-2020, 06:41 AM
Post: #23
RE: Matrix Division on the 28C/S (and later)
(03-04-2020 04:39 PM)Albert Chan Wrote:  DotFMA probably will not gain much accuracy.
Example, if we fuse for every step, with tiny a, such that a+1=1

DotFMA([a 1 -1], [1 1 1]) = (a+1) - 1 = 1-1 = 0

However, fusing result at the end help (essentially "double" precision until final sum)
see CompDot, CompDotFMA, http://rnc7.loria.fr/louvet_poster.pdf

Well, yes, that's the idea: don't use the FMA for the dot product itself, but for the correction term, a bit like Kahan summation, but for dot products.

Cheers, Werner
Find all posts by this user
Quote this message in a reply
03-06-2020, 05:48 AM (This post was last modified: 03-06-2020 06:12 AM by Thomas Okken.)
Post: #24
RE: Matrix Division on the 28C/S (and later)
First stab at compensated DOT, real-real case only:

Code:
-        for (i = 0; i < size; i++)
-            dot += rm1->array->data[i] * rm2->array->data[i];
+        phloat *x = rm1->array->data;
+        phloat *y = rm2->array->data;
+        phloat s = x[0] * y[0];
+        phloat c = fma(x[0], y[0], -s);
+        for (i = 1; i < size; i++) {
+            phloat p = x[i] * y[i];
+            phloat pp = fma(x[i], y[i], -p);
+            phloat xx = s + p;
+            phloat zz = xx - s;
+            phloat ss = (s - (xx - zz)) + (p - zz);
+            s = xx;
+            c = c + (pp + ss);
+        }
+        phloat dot = s + c;

I guess Werner's test case is a bit too easy, since it's returning a for arbitrarily low values, not just down to 1e-67. But maybe my implementation is wrong!

I uploaded a test version for Windows here. Decimal only, since VC++ 2008 doesn't appear to support FMA in <math.h>; I guess I'll have to switch to a newer version if I'm going through with this. The code does build without complaint for MacOS and Linux; I haven't tried Android or iOS yet.

I optimistically pushed this initial code to my master branch.
Visit this user's website Find all posts by this user
Quote this message in a reply
03-06-2020, 12:05 PM
Post: #25
RE: Matrix Division on the 28C/S (and later)
Hello Thomas, no that is correct, the FMA 'compensated dot' algorithm is actually not the same as just using 68 digits ;-) As long as we're just performing a DOT product with one argument being a vector with all 1's, we're performing additions only, and we can actually do the same in RPN:
(based on Dekker's double precision algorithms, but modified a bit to work for decimal floating point arithmetic):

Code:
extended precision summation
In: [x]
Out: Sum(x[i])

>LBL "\S"
 EDIT
 ABS
 CLST
 LASTX
>LBL 10
 XEQ DS+
 J+
 RCLEL
 FC? 77
 GTO 10
 EXITALL
 CLX
 +
 + @ leave this off to see the full double-precision result
 END

@ add a number to a double-precision number
@        L    X    Y    Z    T
>LBL "DS+"        x    ts    hs
 X<>Y
 X<> ST Z        hs    x    ts
 XEQ "S+"        tr    hr    ts
 X<>Y
 X<> ST Z        ts    tr    hr
 +
>LBL "Q+" @ 'quick add 2 numbers'
 X<>Y
 RCL+ ST Y
 STO- ST L
 X<>Y
 RCL+ ST L
 RTN
>LBL "S+" @ add 2 numbers, result in double precision
 LSTO "."
 RCL+ ST Y
 RCL- ST Y
 STO- "."
 X<> ST L
 STO- ST L
 X<>Y
 RCL+ ST L
 RCL+ "."
 END

I can probably do the same for the full-fledged DOT.. I have the 'multiply two numbers and obtain a double-precision result' routines as well. The only thing I don't have for now is time ;-)

Cheers, Werner
Find all posts by this user
Quote this message in a reply
03-06-2020, 05:20 PM
Post: #26
RE: Matrix Division on the 28C/S (and later)
(03-06-2020 05:48 AM)Thomas Okken Wrote:  I uploaded a test version for Windows here. Decimal only, since VC++ 2008 doesn't appear to support FMA in <math.h>;
I guess I'll have to switch to a newer version if I'm going through with this.

For machines that does not have hardware FMA, we can implement it in software.
see Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates, page 18 to 20

lua> function split(x) -- assume IEEE double
:      local hi = (2^27+1) * x
:      hi = hi - (hi - x)
:      return hi, x - hi
:      end

lua> function product(a, b)
:      local x = a*b
:      local a1, a2 = split(a)
:      local b1, b2 = split(b)
:      return x, -x + a1*b1 + a2*b1 + a1*b2 + a2*b2
:      end

lua> product(1.1, 1.2)
1.32     -4.44089209850063e-018
Find all posts by this user
Quote this message in a reply
03-07-2020, 04:08 PM
Post: #27
RE: Matrix Division on the 28C/S (and later)
I implemented the real-complex and complex-complex cases, and moved all three cases to separate functions, so they can be easily reused later in other functions. Those other functions have yet to be done, but the improved DOT is now complete.

I uploaded test versions for Android, Windows, Mac, and Linux to https://thomasokken.com/free42/download/test/
and the source code is in the master branch in GitHub.
For the Windows build, I switched from Visual C++ 2008 to 2019, so I was able to build the Binary version again.
Visit this user's website Find all posts by this user
Quote this message in a reply
03-08-2020, 06:11 PM (This post was last modified: 03-08-2020 06:18 PM by J-F Garnier.)
Post: #28
RE: Matrix Division on the 28C/S (and later)
(03-04-2020 04:39 PM)Albert Chan Wrote:  
(03-04-2020 01:41 PM)J-F Garnier Wrote:  Slightly changing the test to DOT( [ a -1 1 ] , [ 1 1 1] ), I noticed this difference between the 71B and the 28S/42S
(and later RPL machines I guess):
Using a=1E-20 (or any values <5E-16):
71B: result is 1E-15 !
28S/42S: result is correctly rounded to 0.

It seems 71B internal 15-digits precision is using Round-to-zero (chopping excess digits)
Say, a = 1.00000000001e-15

DOT([a 1 -1], [1 1 1]) = (1.0000 00000 00000 10000 00000 01) - 1 ≈ (1.0000 00000 00000) - 1 = 0

DOT([a -1 1], [1 1 1]) = (-0.99999 99999 99998 99999 999999) + 1 ≈ (-0.99999 99999 99998) + 1 = 2E-15

There is another similar truncating effect, instead of proper rounding with complex numbers:
>(1+1E-10,1)*(1-1E-10,1)
(-1.E-15,2)

Contrary to the previous case, it is still present in the 42S, 28S, 48 series up to the 49G+.
So I will not attempt to fix this effect (I don't call it bug...).

J-F
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)