Threaded Mode | Linear Mode

Thomas Okken · (This post was last modified: 03-06-2020 06:12 AM by Thomas Okken.)

First stab at compensated DOT, real-real case only:

Code:

-        for (i = 0; i < size; i++)

-            dot += rm1->array->data[i] * rm2->array->data[i];

+        phloat *x = rm1->array->data;

+        phloat *y = rm2->array->data;

+        phloat s = x[0] * y[0];

+        phloat c = fma(x[0], y[0], -s);

+        for (i = 1; i < size; i++) {

+            phloat p = x[i] * y[i];

+            phloat pp = fma(x[i], y[i], -p);

+            phloat xx = s + p;

+            phloat zz = xx - s;

+            phloat ss = (s - (xx - zz)) + (p - zz);

+            s = xx;

+            c = c + (pp + ss);

+        }

+        phloat dot = s + c;

I guess Werner's test case is a bit too easy, since it's returning a for arbitrarily low values, not just down to 1e-67. But maybe my implementation is wrong!

I uploaded a test version for Windows here. Decimal only, since VC++ 2008 doesn't appear to support FMA in <math.h>; I guess I'll have to switch to a newer version if I'm going through with this. The code does build without complaint for MacOS and Linux; I haven't tried Android or iOS yet.

I optimistically pushed this initial code to my master branch.