Threaded Mode | Linear Mode

J-F Garnier · 03-04-2020, 01:41 PM

(03-04-2020 09:23 AM)Werner Wrote:

Code:

@ dot test 1: input a, result = DOT([a 1 -1],[1 1 1]) >LBL "DT1" 3 1 NEWMAT EDIT Rv \-> 1 \-> -1 EXITALL ENTER SIGN ABS DOT END

Slightly changing the test to DOT( [ a -1 1 ] , [ 1 1 1] ), I noticed this difference between the 71B and the 28S/42S (and later RPL machines I guess):
Using a=1E-20 (or any values <5E-16):
71B: result is 1E-15 !
28S/42S: result is correctly rounded to 0.

Something I will investigate for the HP71 Math ROM 2 :-)

J-F

Albert Chan · (This post was last modified: 03-04-2020 07:00 PM by Albert Chan.)

(03-04-2020 12:29 PM)Thomas Okken Wrote: I'll write the new version of DOT next week, I think. I'll have to add FMA to the "phloat" functions, using bid128_fma() in the decimal version, and (I think) std::fma() in the binary version, and then implement the functions from your citations.
It looks pretty straightforward.

DotFMA probably will not gain much accuracy.
Example, if we fuse for every step, with tiny a, such that a+1=1

DotFMA([a 1 -1], [1 1 1]) = (a+1) - 1 = 1-1 = 0

However, fusing result at the end help (essentially "double" precision until final sum)
see CompDot, CompDotFMA, http://rnc7.loria.fr/louvet_poster.pdf

(03-04-2020 01:41 PM)J-F Garnier Wrote: Slightly changing the test to DOT( [ a -1 1 ] , [ 1 1 1] ), I noticed this difference between the 71B and the 28S/42S
(and later RPL machines I guess):
Using a=1E-20 (or any values <5E-16):
71B: result is 1E-15 !
28S/42S: result is correctly rounded to 0.

It seems 71B internal 15-digits precision is using Round-to-zero (chopping excess digits)
Say, a = 1.00000000001e-15

DOT([a 1 -1], [1 1 1]) = (1.0000 00000 00000 10000 00000 01) - 1 ≈ (1.0000 00000 00000) - 1 = 0

DOT([a -1 1], [1 1 1]) = (-0.99999 99999 99998 99999 999999) + 1 ≈ (-0.99999 99999 99998) + 1 = 2E-15

Werner · 03-05-2020, 06:41 AM

(03-04-2020 04:39 PM)Albert Chan Wrote: DotFMA probably will not gain much accuracy.
Example, if we fuse for every step, with tiny a, such that a+1=1

DotFMA([a 1 -1], [1 1 1]) = (a+1) - 1 = 1-1 = 0

However, fusing result at the end help (essentially "double" precision until final sum)
see CompDot, CompDotFMA, http://rnc7.loria.fr/louvet_poster.pdf

Well, yes, that's the idea: don't use the FMA for the dot product itself, but for the correction term, a bit like Kahan summation, but for dot products.

Cheers, Werner

Thomas Okken · (This post was last modified: 03-06-2020 06:12 AM by Thomas Okken.)

First stab at compensated DOT, real-real case only:

Code:

-        for (i = 0; i < size; i++)

-            dot += rm1->array->data[i] * rm2->array->data[i];

+        phloat *x = rm1->array->data;

+        phloat *y = rm2->array->data;

+        phloat s = x[0] * y[0];

+        phloat c = fma(x[0], y[0], -s);

+        for (i = 1; i < size; i++) {

+            phloat p = x[i] * y[i];

+            phloat pp = fma(x[i], y[i], -p);

+            phloat xx = s + p;

+            phloat zz = xx - s;

+            phloat ss = (s - (xx - zz)) + (p - zz);

+            s = xx;

+            c = c + (pp + ss);

+        }

+        phloat dot = s + c;

I guess Werner's test case is a bit too easy, since it's returning a for arbitrarily low values, not just down to 1e-67. But maybe my implementation is wrong!

I uploaded a test version for Windows here. Decimal only, since VC++ 2008 doesn't appear to support FMA in <math.h>; I guess I'll have to switch to a newer version if I'm going through with this. The code does build without complaint for MacOS and Linux; I haven't tried Android or iOS yet.

I optimistically pushed this initial code to my master branch.

Werner · 03-06-2020, 12:05 PM

Hello Thomas, no that is correct, the FMA 'compensated dot' algorithm is actually not the same as just using 68 digits ;-) As long as we're just performing a DOT product with one argument being a vector with all 1's, we're performing additions only, and we can actually do the same in RPN:
(based on Dekker's double precision algorithms, but modified a bit to work for decimal floating point arithmetic):

Code:

extended precision summation

In: [x]

Out: Sum(x[i])

>LBL "\S"

 EDIT

 ABS

 CLST

 LASTX

>LBL 10

 XEQ DS+

 J+

 RCLEL

 FC? 77

 GTO 10

 EXITALL

 CLX

 +

 + @ leave this off to see the full double-precision result

 END

@ add a number to a double-precision number

@        L    X    Y    Z    T

>LBL "DS+"        x    ts    hs

 X<>Y

 X<> ST Z        hs    x    ts

 XEQ "S+"        tr    hr    ts

 X<>Y

 X<> ST Z        ts    tr    hr

 +

>LBL "Q+" @ 'quick add 2 numbers'

 X<>Y

 RCL+ ST Y

 STO- ST L

 X<>Y

 RCL+ ST L

 RTN

>LBL "S+" @ add 2 numbers, result in double precision

 LSTO "."

 RCL+ ST Y

 RCL- ST Y

 STO- "."

 X<> ST L

 STO- ST L

 X<>Y

 RCL+ ST L

 RCL+ "."

 END

I can probably do the same for the full-fledged DOT.. I have the 'multiply two numbers and obtain a double-precision result' routines as well. The only thing I don't have for now is time ;-)

Cheers, Werner

Albert Chan · 03-06-2020, 05:20 PM

(03-06-2020 05:48 AM)Thomas Okken Wrote: I uploaded a test version for Windows here. Decimal only, since VC++ 2008 doesn't appear to support FMA in <math.h>;
I guess I'll have to switch to a newer version if I'm going through with this.

For machines that does not have hardware FMA, we can implement it in software.
see Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates, page 18 to 20

lua> function split(x) -- assume IEEE double
: local hi = (2^27+1) * x
: hi = hi - (hi - x)
: return hi, x - hi
: end

lua> function product(a, b)
: local x = a*b
: local a1, a2 = split(a)
: local b1, b2 = split(b)
: return x, -x + a1*b1 + a2*b1 + a1*b2 + a2*b2
: end

lua> product(1.1, 1.2)
1.32 -4.44089209850063e-018

Thomas Okken · 03-07-2020, 04:08 PM

I implemented the real-complex and complex-complex cases, and moved all three cases to separate functions, so they can be easily reused later in other functions. Those other functions have yet to be done, but the improved DOT is now complete.

I uploaded test versions for Android, Windows, Mac, and Linux to https://thomasokken.com/free42/download/test/
and the source code is in the master branch in GitHub.
For the Windows build, I switched from Visual C++ 2008 to 2019, so I was able to build the Binary version again.

J-F Garnier · (This post was last modified: 03-08-2020 06:18 PM by J-F Garnier.)

(03-04-2020 04:39 PM)Albert Chan Wrote:
(03-04-2020 01:41 PM)J-F Garnier Wrote: Slightly changing the test to DOT( [ a -1 1 ] , [ 1 1 1] ), I noticed this difference between the 71B and the 28S/42S
(and later RPL machines I guess):
Using a=1E-20 (or any values <5E-16):
71B: result is 1E-15 !
28S/42S: result is correctly rounded to 0.

It seems 71B internal 15-digits precision is using Round-to-zero (chopping excess digits)
Say, a = 1.00000000001e-15

DOT([a 1 -1], [1 1 1]) = (1.0000 00000 00000 10000 00000 01) - 1 ≈ (1.0000 00000 00000) - 1 = 0

DOT([a -1 1], [1 1 1]) = (-0.99999 99999 99998 99999 999999) + 1 ≈ (-0.99999 99999 99998) + 1 = 2E-15

There is another similar truncating effect, instead of proper rounding with complex numbers:
>(1+1E-10,1)*(1-1E-10,1)
(-1.E-15,2)

Contrary to the previous case, it is still present in the 42S, 28S, 48 series up to the 49G+.
So I will not attempt to fix this effect (I don't call it bug...).

J-F