Threaded Mode | Linear Mode

robve · 04-04-2021, 05:44 PM

(04-03-2021 03:17 AM)emece67 Wrote: These are the results for my Python script:

By comparing the implementations I found out that the VB code in the "Article" is not really based on the WP-34S RPN code. They use the tanh-sinh variant by Michalski and Mosig. I've included this variant in the updated writeup, explained this interesting variant in some detail, and added their VB code to the appendix for comparison. The appendix contains both their VB code based on Michalski and Mosig and also their DE1 VB code that looks somewhat similar to your Python code.

The Michalski and Mosig variant requires fewer operations. I've aggressively optimized this further to just one exp() per inner loop iteration.

- Rob

emece67 · 04-04-2021, 08:52 PM

(04-04-2021 05:44 PM)robve Wrote: By comparing the implementations I found out that the VB code in the "Article" is not really based on the WP-34S RPN code. They use the tanh-sinh variant by Michalski and Mosig.

AFAIR, the VB article is prior to the WP34S code. When I wrote the WP34S (and Python) programs I used the spreadsheet to get some example integrals. I didn't look at the code inside, though.

Regards.

robve · 04-05-2021, 05:34 PM

(04-03-2021 03:17 AM)emece67 Wrote: These are the results for my Python script:

Thank for sharing. I rewrote the Python code today in C with exactly the same functionality and updated the document with the results and other additions. In this way I can compare apples to apples using IEEE floating point.

I noticed that I had to set eps=1e-15 in the C code I wrote based on your Python implementation, because the error threshold variable with eps=1e-9 is only thr=3e-4, returning results with large errors even for the easy integrals. I also used n=6 levels to make the comparison fair.

The values with the C implementation with eps=1e-15 are the same as your numbers, except when the number of points is larger due to my choice of n=6 levels:

Code:

# (points, est.err)

1 (47,8e-12)

2 (49,5e-12)

3 (49,1e-8+)

4 (49,2e-13)

5 (49,7e-12)

6 (49,5e-8)

7 (49,2e-12)

8 (25,5e-7)

9 (25,5e-7)

10 (395,2e-4)

11 (395,2e-4)

12 (49,8e-8)

13 (395,2e-4)

14 (25,4e-7)

15 (25,4e-7)

16 (335,1e-16)

17 (49,7e-10)

18 (49,2e-12)

19 (395,1e-3)

20 (395,6e-3)

21 (395,6e-3)

Integral 16 is the only one that differs for a different reason, because it converges with 137 points in your case, but with IEEE double precision and eps=1e-15 in the C version, it takes 335 points.

I noticed a few things in the Python code, including:

      try:
        fpl = f(bpa2 + bma2*r)
      except Exception:
        fpl = 0;
      p = fpl*w
      try:
        fmi = f(bpa2 + bma2/r) if expsinh else f(bpa2 - bma2*r)
      except Exception:
        fmi = 0

I may be mistaken, but this sets fpl and fmi to zero when an exception occurs. This may put some downward pressure on the quadrature towards zero and could lead to delayed convergence if the weight isn't yet insignificant, in turn requiring more points that in turn could cause more exceptions in the same area. I did some comparisons and it looks like interpolating/extrapolating the missing function values is best (rather than terminating the iterations). A simple way to interpolate is to reuse the previous fpl and fmi, although more elaborate schemes could be used. Think about functions that have a large nonzero limit at an endpoint.

On the other hand, the Python code does not get as close to the endpoints (due to tmax) as the other implementations. By comparison, the VB implementation uses a fixed tmax=6.56, which I am looking into.

- Rob

emece67 · 04-06-2021, 06:48 AM

(04-05-2021 05:34 PM)robve Wrote: I noticed a few things in the Python code, including:

      try:
        fpl = f(bpa2 + bma2*r)
      except Exception:
        fpl = 0;
      p = fpl*w
      try:
        fmi = f(bpa2 + bma2/r) if expsinh else f(bpa2 - bma2*r)
      except Exception:
        fmi = 0

I may be mistaken, but this sets fpl and fmi to zero when an exception occurs. This may put some downward pressure on the quadrature towards zero and could lead to delayed convergence if the weight isn't yet insignificant, in turn requiring more points that in turn could cause more exceptions in the same area. I did some comparisons and it looks like interpolating/extrapolating the missing function values is best (rather than terminating the iterations). A simple way to interpolate is to reuse the previous fpl and fmi, although more elaborate schemes could be used. Think about functions that have a large nonzero limit at an endpoint.

On the other hand, the Python code does not get as close to the endpoints (due to tmax) as the other implementations. By comparison, the VB implementation uses a fixed tmax=6.56, which I am looking into.

- Rob

I've just tested re-using the previous fpl or fmi when the exception raises, but the end results are the same.

The idea behind tmax is described succinctly in another thread here (the expression for N, although I later reworked it to directly get tmax). With such tmax one ensures that the integrand is never evaluated at the extremes. Maybe this is the reason the modification above (reusing fpl/fmi) does not show any benefits.

Regards.

Albert Chan · 04-06-2021, 12:49 PM

(04-06-2021 06:48 AM)emece67 Wrote: I've just tested re-using the previous fpl or fmi when the exception raises, but the end results are the same.

I preferred original default of 0.0, instead of previous point.

My first post here mentioned problems if u-transformation is applied.
Instead of sharp point, when f(0) or f(1) = inf, it widened the problematic edge.

(03-28-2021 06:59 PM)Albert Chan Wrote: >>> from mpmath import *
>>> f = lambda x: 1/sqrt(-ln(x))
>>> quadts(f, [0,1], error=True)
(mpf('1.7724538503750329'), mpf('1.0e-10'))
>>>
>>> uf = lambda u: 6*u*(1-u) * f(u*u*(3-2*u))
>>> quadts(uf, [0,1], error=True)
Traceback (most recent call last):
...
ZeroDivisionError

This is not just an issue of u-transformation.
Perhaps integrand already have fuzzy bad edge(s).

Previous point for default will have "upward pressure" for the fuzzy edge.
(this assumed previous point is next to current point; false if calculating points in parallel)

Default of 0.0 will have "downward pressure" for the fuzzy edge.
However, with double-exponential transform, I think more of the bad edge is closer to zero.

Example, using above f, uf, and we apply u-transform to uf, getting uuf.

Run double_exponential() with different behavior for "bad" edge, then square the result (should be pi)
First number is points evaluated, reported from double_exponential() third entry.

Code:

        Default Zero           Previous Point

f   25->3.14159258765585   25->3.14159258765585

uf  47->3.14159262045834   49->3.14159263570079

uuf 39->3.14159264565517  191->3.14160003440478

---

I think setting 0.0 for any exceptions may be overkill.
Example: if f(x) = x^p, and p is not yet defined, quadrature should raised NameError.

Trap ArithmeticError is good enough (= OverflowError, ZeroDivisionError, FloatingPointError)

On the other hand, Inf/NaN can occur without raising exceptions.

>>> ln(0), atanh(1), exp(inf), inf-inf
(mpf('-inf'), mpf('+inf'), mpf('+inf'), mpf('nan'))

We should also test for these, like this patch for double_exponential.py

Code:

      try: y = f(bpa2 + bma2*r)

      except ArithmeticError: y = inf

      p = 0.0 if y-y else y*w

      try: y = f(bpa2 + bma2/r) if expsinh else f(bpa2 - bma2*r)

      except ArithmeticError: y = inf

      if y-y == 0: p += y/w if expsinh else y*w

      p *= ch

      wsl += p

      tnfe += 2

emece67 · 04-06-2021, 03:09 PM

(04-06-2021 12:49 PM)Albert Chan Wrote: We should also test for these, like this patch for double_exponential.py

Code:

try: y = f(bpa2 + bma2*r) except ArithmeticError: y = inf p = 0.0 if y-y else y*w try: y = f(bpa2 + bma2/r) if expsinh else f(bpa2 - bma2*r) except ArithmeticError: y = inf if y-y == 0: p += y/w if expsinh else y*w p *= ch wsl += p tnfe += 2

Sorry, cannot see the difference between such code and the original one replacing Exception by ArithmeticError:

Code:

      try:

        fpl = f(bpa2 + bma2*r)

      except ArithmeticError:

        fpl = 0;

      p = fpl*w

      try:

        fmi = f(bpa2 + bma2/r) if expsinh else f(bpa2 - bma2*r)

      except ArithmeticError:

        fmi = 0

      tnfe += 2

      p += fmi/w if expsinh else fmi*w

      p *= ch

      wsl += p

Albert Chan · 04-06-2021, 04:19 PM

Hi, emece67

>>> f = atanh
>>> uf = lambda u: 6*u*(1-u) * f(u*u*(3-2*u))
>>> uuf = lambda u: 6*u*(1-u) * uf(u*u*(3-2*u))

>>> log(2) # expected, for ∫(f(x), x=0..1)
mpf('0.69314718055994529')

Default 0.0 + ArithmeticError + Inf/NaN test, we get this:

(mpf('0.69314718055994518'), mpf('2.7138055025410779e-8'), 47, 3, 0) # uf
(mpf('0.69314718055994484'), mpf('4.525002594846228e-11'), 71, 4, 0) # uuf

Previous Point + ArithmeticError + Inf/NaN test, we get this:

(mpf('0.69314718055995395'), mpf('2.713805868914676e-8'), 49, 3, 0) # uf
(mpf('0.69314725666345778'), mpf('7.6148712890855563e-8'), 83, 4, 0) # uuf

But, without Inf/NaN test, both get this:

(mpf('+inf'), mpf('nan'), 171, 5, 0) # uf
(mpf('+inf'), mpf('nan'), 133, 5, 0) # uuf

This is because atanh(1) does not raise an exception, just return +inf

emece67 · (This post was last modified: 04-06-2021 06:31 PM by emece67.)

OK, I see now. I've changed it to:

Code:

      try:

        fpl = f(bpa2 + bma2*r)

      except ArithmeticError:

        fpl = 0;

      p = fpl*w if isnormal(fpl) else 0

      try:

        fmi = f(bpa2 + bma2/r) if expsinh else f(bpa2 - bma2*r)

      except ArithmeticError:

        fmi = 0

      tnfe += 2

      p += (fmi/w if expsinh else fmi*w) if isnormal(fmi) else 0

      p *= ch

      wsl += p

which seems more readable to me. I've committed it to GitHub.

Regards.

robve · 04-06-2021, 08:58 PM

Hi Albert & emece67,

I tested the qthsh and wp34s C implementations against 818 integrals from the spreadsheet downloaded from https://newtonexcelbach.com/2020/10/29/n...ture-v-5-0

I rewrote the 818 functions in C (using a bit of sed wizardry) then ran all 818 test cases in C to use IEEE double fp with n=6 levels and eps=1e-9 for qthsh and eps=1e-15 for wp34s (because eps=1e-15 sets thr=3e-7 in wp34s which is about the same threshold as qthsh's 1e-8).

The zip file includes a spreadsheet with the results for the the 818 integrands, produced with some bash scripts that generate C code that is compiled and run on each integrand. The relative errors are reported by the methods and the exact result is compared as an absolute error. An ALERT is shown when the relative error AND the absolute error are both larger than 1e-9. Many ALERT are close to 1e-9 so can in principle be ignored, typically indicating difficulties with Tanh-Sinh convergence. However, there are several ALERT that indicate more serious issues with convergence, including some NaN. I hope this is useful.

The qthsh C (and BASIC) source code is updated in this thread, because "fixing" the underflow "problem" in the inner loop caused problems with non-convergence for a few functions, notably with 25*exp(-25*x) over [0,10] (row 63 in the spreadsheet). I also updated the draft report to correct this.

The 818 integrals are mostly functions composed of the usual trig and other transcendental functions. There are a couple with ABS and one with SIGN. But nothing as difficult as a step function, for example.

Note that the inf/NaN singularities are ignored in these implementations (i.e. "interpolated" by using previous points to produce "upward pressure"), but I agree that this is something to look into, as Albert pointed out.

Compared to other integration methods, there is nothing "standard" about Tanh-Sinh rules and implementations, they are all different! The Michalski & Mosig rule is a bit cheaper to execute. But as a consequence, this rule produces point clouds that are a little bit more dense in the middle of the interval (see the chart in the draft report). Other Tanh-Sinh points and weights could be defined to produce even denser "point clouds" between the endpoints, perhaps to improve convergence for integrands with more "mass" in the middle. However, other transformations might be more appropriate as Albert suggests.

- Rob

robve · 04-06-2021, 11:36 PM

(04-06-2021 04:19 PM)Albert Chan Wrote: Default 0.0 + ArithmeticError + Inf/NaN test, we get this:

(mpf('0.69314718055994518'), mpf('2.7138055025410779e-8'), 47, 3, 0) # uf
(mpf('0.69314718055994484'), mpf('4.525002594846228e-11'), 71, 4, 0) # uuf

Previous Point + ArithmeticError + Inf/NaN test, we get this:

(mpf('0.69314718055995395'), mpf('2.713805868914676e-8'), 49, 3, 0) # uf
(mpf('0.69314725666345778'), mpf('7.6148712890855563e-8'), 83, 4, 0) # uuf

It may not be necessarily the case that choosing zero as default is better than reusing the previous point, or vice versa. However, I suspect that if a function in the limit approaches a nonzero finite value, then reusing previous points as an approximation of that value could be more accurate, if these weighted values contributes to the sum.

Among the 818 integrations with qthsh, I found that almost all of them produced the same integral regardless of the choice of zero or reuse by default. There are only 8 integrals that differ (I've added the limits):

When fp=0 and fm=0 is better as default for f on [a,b]:

Code:

LN(COS(x))        [0,1.570796327]    lim x->pi/2+ f(x)=-inf

LN(COS(x))^2        [0,1.570796327]    lim x->pi/2+ f(x)=+inf

(x-SIN(x))/(1-COS(x))    [0,3.141592654]    lim x->0- f(x)=0

LN(2*TAN(x))        [0,1.570796327]    lim x->0- f(x)=-inf   lim x->pi/2+ f(x)=undefined

SQRT(TAN(x))        [0,1.570796327    lim x->pi/2+ f(x)=undefined

When reusing previous points is better as default for f on [a,b]:

Code:

1/(1+COS(x)^x)        [-1.570796327,1.570796327]    lim x->pi/2+ f(x)=1       lim x->-pi/2- f(x)=0

x^2/(1-COS(x))        [0,1.570796327]        lim x->0- f(x)=0

x*SIN(x)/(1-COS(x))    [0,3.141592654]        lim x->0- f(x)=0

To determine these 8 integrals and categorize them I've used the absolute error of the method compared to the exact result, i.e. NOT the estimated error by the method.

When reusing previous points by default is better, then the absolute error with the exact value is substantially better by several orders in magnitude:

Code:

1/(1+COS(x)^x) 4.44E-16 versus 6.89E-11

x^2/(1-COS(x)) 6.27E-10 versus 1.76E-08

x*SIN(x)/(1-COS(x)) 5.24E-11 versus 3.41E-08

For the other default case of zero being better, the absolute error difference is small, mostly by a factor of about 2, so this looks more like noise.

Not a formal proof, but the empirical support appears somewhat convincing.

- Rob

robve · (This post was last modified: 04-07-2021 02:25 PM by robve.)

(04-06-2021 06:26 PM)emece67 Wrote: OK, I see now. I've changed it to:
...

There is also this part:

Code:

    while True:

      # try to avoid the computation of too many exp functions...

      ch = cosh(t)

      # pi/2*sinh(t)

      pi2sh = pi2*sqrt(ch**2 - 1)

      ecs = exp(pi2sh) if expsinh else cosh(pi2sh)

We can try a little harder to avoid computation of too many functions. Strength reduction is also applicable to the WP34S code to remove an exp() and then a sqrt() from the inner loop:

Code:

    ex = eh = exp(t)

    if level > 0:

      eh *= eh

    while True:

      ch = (ex+1/ex)/2               # cosh(t)

      pi2sh = pi2*(ex-1/ex)/2        # pi/2*sinh(t)

      ecs = exp(pi2sh) if expsinh else cosh(pi2sh)

      ...

      t = j*h

      ex *= eh

Another sqrt() can be removed from the inner loop by reusing exp(pi2sh) to replace cosh(pi2sh) and to get tanh(pi2sh) almost for free:

Code:

      ch = (ex+1/ex)/2               # cosh(t)

      pi2sh = pi2*(ex-1/ex)/2        # pi/2*sinh(t)

      r = ecs = exp(pi2sh)

      if not expsinh:

        ecs += 1/r         # 2*cosh(pi2sh)

        r -= 1/r           # 2*sinh(pi2sh)

        if tanhsinh:

          r /= ecs         # tanh(pi2sh)

        ecs /= 2           # cosh(pi2sh)

      w = 1/ecs**2 if tanhsinh else ecs

      ...

      t = j*h

      ex *= eh

- Rob

Edit: corrected code in the last block, typo made when translating from C

emece67 · 04-07-2021, 02:08 PM

The way the abscissas and weights are computed is an artifact of both sinh and tanh being much slower, in the wp34s, than cosh, exp and sqrt. Surely this code can be optimized when moving to another architectures, so I need to devote some time to this question.

Regards.

Albert Chan · 04-07-2021, 07:04 PM

My translated qthsh C code to lua

Code:

function qthsh(f, a, b, c, n, eps)  -- tanh-sinh quadrature

    c = c or 1                      -- decay factor, 0 to 1

    n = n or 6                      -- 6 is optimal, 7 take longer

    eps = eps or 1e-9               -- relative error

    local s, d = f((b+a)/2), (b-a)/2

    local h, k, fp, fm, err = 2, 0, 0, 0

    repeat

        h = h/2

        local p, t = 0, exp(h)

        local eh = k > 0 and t*t or t

        repeat

            local u = exp(1/t-t)

            local r = 2*u/(1+u)

            local x, y = d*r

            if a+x ~= a then

                y = f(a+x)

                if y-y == 0 then fp=y else fp=c*fp end

            end

            if b-x ~= b then

                y = f(b-x)

                if y-y == 0 then fm=y else fm=c*fm end

            end

            y = (t+1/t)*r/(1+u) * (fp+fm)   -- apply weight

            p, t = p+y, t*eh

        until abs(y) <= eps*abs(p)

        s, err, k = s+p, abs(s-p), k+1

    until err <= 10*eps*abs(s) or k>n

    return d*s*h, err/abs(s)

end

Instead of deciding settings of default 0.0 vs previous point, I added a decay factor c.
If we hit some "bad" points, it will use previous point, multiply by this decaying factor.

c=0 ≡ Default 0.0
c=1 ≡ Previous point (current default setting)

lua> function makeu(f) return function(u) return f(u*u*(3-2*u)) * 6*u*(1-u) end end
lua> uf = makeu(atanh)

lua> qthsh(uf, 0, 1, 0) -- default 0.0
0.6931471805598393 6.9295721273908944e-012
lua> qthsh(uf, 0, 1, 1) -- previous point
0.693147180709479 2.2281417025345338e-010

Also, limits of integration can now be swapped.

lua> qthsh(atanh, 1, 0) -- = qthsh(atanh, 1, 0, 1, 6, 1e-9)
-0.6931471805599405 9.370022523658699e-015

---

I also removed r==0 or r==1 test.
Reaching both extreme r is very unlikely.

To hit r=0, t ≥ e^6.62, with weight underflowed to 0.0
So, even if we reached it, no harm is done.

To hit r=1, it required t=1.
This implied h must be so small that t = exp(h) = 1., or h < 2^-53.

I did some timing test on my laptop.
To reach n=18 (h=2^-18), it take about 1 second.
Each additional level take about doubled the time.

Extrapolated to n=53 (h=2^-53), it take 2^(53-18) = 2^35 sec > 1000 years ! (*)
It does not worth the code to test it.

(*) At h=2^-53, exp(h) = 1+h+h^2/2+... is slightly above half-way, thus should round-up.
However, we assume exp(x) is not "mpfr" accurate.

>>> from gmpy2 import *
>>> exp(mpfr(2)**-53)
mpfr('1.0000000000000002')

emece67 · (This post was last modified: 04-07-2021 07:32 PM by emece67.)

Hi,

Just uploaded to GitHub a new version of the Python script. Based on the ideas of Rob about applying strength reduction. The speed increase is about 20%. Precision and TNFE suffered a little bit, though. Variables j and t have vanished, the latter was replaced by expt —being equal to exp(t)— and j is no longer needed. The inner loop only needs an exp() function and no sqrt() (as Rob showed). Surely the wp34s code will also benefit from this (provided someone does it, I'm not inclined now to work on it). Also, fixed a problem on Rob's code that made the sinh-sinh case not to work (r was twice than expected).

Thanks & regards.

robve · 04-08-2021, 01:29 AM

(04-07-2021 07:31 PM)emece67 Wrote: Just uploaded to GitHub a new version of the Python script. Based on the ideas of Rob about applying strength reduction. The speed increase is about 20%. Precision and TNFE suffered a little bit, though.

Well, this is wrong I'm afraid. I ran a comparison of the original WP34S code written in C and the optimized WP34S code written in C. I only made the small change to replace:

Code:

      double ch = cosh(t);

      double ecs = cosh(M_PI/2 * sqrt(ch*ch - 1));

      double r = sqrt(ecs*ecs - 1)/ecs;

      double w = 1/(ecs*ecs);

with these optimizations:

Code:

    ex = eh = exp(h);

    if (k > 0)

      eh *= eh;

    do {

      double ch = (ex+1/ex)/2; // = cosh(t)

      double u = exp(M_PI*(ex-1/ex)/4);

      double ecs = u+1/u;      // = 2*cosh(M_PI*(ex-1/ex)/4);

      double r = (u-1/u)/ecs;  // = tanh(pi/2*sinh(t))

      double w = 4/(ecs*ecs);

      ...

      ex *= eh;

The optimized code evaluates the same number of functions for all 818 integrals. There is no difference.

Secondly, the absolute error of the results to the exact result is actually smaller for every integral with the optimized version!

I will be happy to share the spreadsheet but you can run the experiment with the zip file I shared earlier by making the changes to the wp34s.sh code as shown above.

- Rob

Albert Chan · (This post was last modified: 04-08-2021 04:03 PM by Albert Chan.)

qthsh version 2, using expm1() instead of exp(), for possibly more accuracy.
We can use below identity, to scale from x = expm1(a), y = expm1(b), to expm1(a+b)

(x+1)*(y+1) - 1 = x*y + x + y

Code:

expm1 = require'mathx'.expm1

function qthsh2(f, a, b, c, n, eps) -- tanh-sinh quadrature

    c = c or 1                      -- decay factor, 0 to 1

    n = n or 6                      -- 6 is optimal, 7 take longer

    eps = eps or 1e-9               -- relative error

    local s, d = f((b+a)/2), (b-a)/2

    local h, k, fp, fm, err = 2, 0, 0, 0

    repeat

        h = h/2

        local p, t = 0, expm1(h)

        local t0 = k==0 and t or t*t + 2*t

        repeat

            local u = exp(-t - t/(t+1))

            local r = 2*u/(1+u)

            local x, y = d*r

            if a+x ~= a then

                y = f(a+x)

                if y-y == 0 then fp=y else fp=c*fp end

            end

            if b-x ~= b then

                y = f(b-x)

                if y-y == 0 then fm=y else fm=c*fm end

            end

            y = r/(1+u) * (fp+fm)

            y = t*t/(t+1)*y + 2*y   -- apply weight

            p, t = p+y, t0*t + t0 + t

        until abs(y) <= eps*abs(p)

        s, err, k = s+p, abs(s-p), k+1

    until err <= 10*eps*abs(s) or k>n

    return d*s*h, err/abs(s)

end

Redo example from https://www.hpmuseum.org/forum/thread-79...#pid145564

lua> function f(x) n=n+1; local y=expm1(x); return (x+x*y)/(1+y*y) end
lua> function f01(t) local y=t*t-t; return f((2*t-1)/y) * (2*y+1)/(y*y) end

lua> n = 0
lua> qthsh2(f01, 0, 1, 0)
0.8165947838638208 3.915835799526105e-009
lua> n
93

∫(f(x), x=-inf .. inf) = ∫(f01(t), t=0 .. 1) = 3/8*log(2)*pi ≈ 0.8165947838638508

f(x) will decay to 0.0 for both ends, even if intermediate calculations overflowed.

Thus, default 0.0 is appropriate here. Had we use previous point, we get this:

lua> n = 0
lua> qthsh2(f01, 0, 1, 1)
0.8165983039584289 4.3106914389618785e-006
lua> n
178

Comment: too bad expm1 version make no difference.

lua> qthsh(f01, 0, 1, 0)
0.816594783863821 3.915834575907313e-009
lua> qthsh(f01, 0, 1, 1)
0.8165983039584306 4.310691436990493e-006

emece67 · 04-08-2021, 11:20 AM

(04-08-2021 01:29 AM)robve Wrote: I only made the small change to replace:

Code:

double ch = cosh(t); double ecs = cosh(M_PI/2 * sqrt(ch*ch - 1)); double r = sqrt(ecs*ecs - 1)/ecs; double w = 1/(ecs*ecs);
with these optimizations:

Code:

ex = eh = exp(h); if (k > 0) eh *= eh; do { double ch = (ex+1/ex)/2; // = cosh(t) double u = exp(M_PI*(ex-1/ex)/4); double ecs = u+1/u; // = 2*cosh(M_PI*(ex-1/ex)/4); double r = (u-1/u)/ecs; // = tanh(pi/2*sinh(t)) double w = 4/(ecs*ecs); ... ex *= eh;

The optimized code evaluates the same number of functions for all 818 integrals. There is no difference.

Secondly, the absolute error of the results to the exact result is actually smaller for every integral with the optimized version!

I will be happy to share the spreadsheet but you can run the experiment with the zip file I shared earlier by making the changes to the wp34s.sh code as shown above.

- Rob

With these small changes in the Python code (in fact I copied it from your post above) I do see differences in the results. Usually only in the last digit, but enough to get (a really small) decrement on correct digits and a small but noticeable increment in TNFE (this time mainly in the sin(40πx) case). In 32 out of 160 cases the true absolute error get (slightly) worse. Of them, 14 have mutated from 0 true absolute error, to != 0.

Adding the other modifications (suppressing the remaining sqrt, j and t, that's: the code now on GitHub) degraded TNFE mainly because one of the integrals (1 - sqrt(2)*cosh(x)/sqrt(cosh(2*x)), from -inf to +inf) needed one more level. This particular case worsened dramatically, from 13 correct digits with strength reduction to only 4 with these changes). I think this is the only case that worth the effort to study it.

Regards.

robve · 04-08-2021, 01:32 PM

(04-08-2021 11:20 AM)emece67 Wrote: With these small changes in the Python code (in fact I copied it from your post above) I do see differences in the results. Usually only in the last digit, but enough to get (a really small) decrement on correct digits and a small but noticeable increment in TNFE (this time mainly in the sin(40πx) case). In 32 out of 160 cases the true absolute error get (slightly) worse. Of them, 14 have mutated from 0 true absolute error, to != 0.

Looks like this falls within machine precision MachEps? Is this just noise or is this significant? Perhaps Python mpmath mp.dps is too small?

One of the reasons why the current WP34S code must be worse compared to the optimized version is simple: the current WP34S code uses two square roots to compute sinh and tanh from cosh. This approach is almost always less accurate compared to computing sinh and tanh directly, which the optimizations perform with the usual hyperbolic formulas based on exp. Furthermore, the fewer arithmetic operations to produce these, the better, as in the optimized code.

Using exp(t) in the inner loop instead of the strength-reduced expt recurrence should be the most accurate. Try it.

However, the expt strength-reduced recurrence is accurate enough when the number of inner loop iterations are limited, which we know is the case. The expt error to exp(t) is at most \( (1+\epsilon)^m-1 \) for \( m \) inner loop iterations. Consider for example \( \epsilon=10^{-15} \) and \( m=100 \) then the error of expt to exp(t) is at most \( 10^{-13} \). But IEEE 754 MachEps is smaller and the integration results are better than that because the inner loop does not exceed 100 iterations (requiring 200 function evaluations) and is typically much lower.

From the 818 integrals tested, for those that do not produce exactly the same integration result, 754 produce smaller errors after optimization of the WP34S code. Almost all of the remaining differ in the last 15'th digit or within MachEps which is more noise so the difference can be discarded as insignificant.

Note that Tanh-Sinh is not suitable for some types of integrals. Integrands can be ill-conditioned, i.e. a small change in x can cause a very large change in y. One can find examples with the "right conditions" when the optimized version differs from the non-optimized version, but this is not at all related to the numerical stability of the (optimized) algorithm.

I would suggest to consider the Michalski & Mosig rule to accelerate the WP34S Tanh-Sinh further with qthsh. The spreadsheet I shared also shows the qthsh has far fewer ALERT than WP34S. For example x^(-3/4)*(1-x)^(-0.25)/(3-2*x) completely fails with the WP34S algorithm after 395 evaluations, but is accurate with qthsh after just 69 evaluations (integral 113 in the list).

Therefore, I am very confident that the suggested optimizations are more than worthwhile to reduce the computational cost significantly (especially on a calculator) and to produce more accurate results in (almost) every case.

Just my 2c.

- Rob

Albert Chan · 04-08-2021, 01:58 PM

(04-08-2021 11:20 AM)emece67 Wrote: integrals (1 - sqrt(2)*cosh(x)/sqrt(cosh(2*x)), from -inf to +inf) needed one more level.
This particular case worsened dramatically, from 13 correct digits with strength reduction to only 4 with these changes)

I believe there is nothing to worry about.
Integrand suffered from subtraction cancellations.
A slight shift to the nodes/weights (even if more precise) does not translate to better result.

Even for 4 good digits, consider yourself lucky. mpmath quadts only get 3.

>>> f = lambda x: 1-sqrt(2)*cosh(x)/sqrt(cosh(2*x))
>>> quadts(f, [-inf, inf], error=True)
(mpf('-0.69306562029329821'), mpf('0.001'))

Actually, mpmath "cheated", calcuating in higher precision.
See what happen if we pre-calculated sqrt(2), under mp.dps=15 setting.

>>> f2 = lambda x, k=sqrt(2): 1-k*cosh(x)/sqrt(cosh(2*x))
>>> quadts(f2, [-inf, inf], error=True) # failed convergence
(mpf('-1886.3975082035695'), mpf('1.0'))

This is a weakness of tanh-sinh quadrature. It relied on accurate sample points.
Gauss-Legendre quadrature, using curve-fitting of samples, handle "fuzzy" more easily.

>>> quadgl(f2, [-inf, inf], error=True)
(mpf('-0.69314718056249158'), mpf('1.0e-14'))

We could rewrite integrand, to perform better. Let c = cosh(x), d = cosh(2x)
With half-angle formula, we have c*c = (1+d)/2

1 - √(2)*c/√(d) = 1 - √(1+1/d) = -(1/d) / (1 + sqrt(1+1/d))

>>> def f3(x): y=1/cosh(2*x); return -y/(1 + sqrt(1+y))
...
>>> quadts(f3, [-inf, inf], error=1)
(mpf('-0.69314718055994529'), mpf('1.0e-29'))

(04-06-2021 11:36 PM)robve Wrote: When reusing previous points is better as default for f on [a,b]:

Code:

1/(1+COS(x)^x) [-1.570796327,1.570796327] lim x->pi/2+ f(x)=1 lim x->-pi/2- f(x)=0 x^2/(1-COS(x)) [0,1.570796327] lim x->0- f(x)=0 x*SIN(x)/(1-COS(x)) [0,3.141592654] lim x->0- f(x)=0

Same issue with all of these. A simple rewrite "sharpen" integrand greatly.

First one had the form 1/(1+u). With identity 1/(1+u) + 1/(1+1/u) = 1/(1+u) + u/(1+u) = 1

∫(1/(1+cos(x)^x), x=-pi/2 .. pi/2) = ∫(1, x=0 .. pi/2) = pi/2

With identity sin(x/2)^2 = (1-cos(x))/2, we can rewrite the others:

x^2/(1-cos(x)) = x^2 / (2*sin(x/2)^2) = (x/sin(x/2))^2 / 2
x*sin(x)/(1-cos(x)) = (x*2*sin(x/2)*cos(x/2) / (2*sin(x/2)^2)) = x / tan(x/2)

Albert Chan · 04-08-2021, 04:27 PM

(04-08-2021 01:32 PM)robve Wrote: Using exp(t) in the inner loop instead of the strength-reduced expt recurrence should be the most accurate. Try it.

I did. It make no difference.

Same conclusion when I compared strength-reduced exp vs supposed more accurate expm1.
I even tried "improved u", when t get closer to 1. Again, no difference.

u = exp(1/t - t) = exp((1-t)*(1+1/t))

Think of this as Newton's method.
Once starting to converge, every iteration doubled accuracy.
In other words, correction term only required half precision. (guess had the other half)

Same with quadrature, required precision keep reducing.
Inaccuracy of strength-reduced expt does not contribute to the sum.