HP PPL and Statistical distributions

HP PPL and Statistical distributions - Printable Version

+- HP Forums (https://www.hpmuseum.org/forum)
+-- Forum: HP Calculators (and very old HP Computers) (/forum-3.html)
+--- Forum: HP Prime (/forum-5.html)
+--- Thread: HP PPL and Statistical distributions (/thread-16635.html)

Pages: 1 2

HP PPL and Statistical distributions - robmio - 04-11-2021 08:49 AM

Goodmorning everyone. Can any of you point me to an algorithm written in HP PPL for the calculation of the "studentized range q distribution" CDF?

Thanks, Roberto.

RE: HP PPL and Statistical distributions - robmio - 04-11-2021 09:51 AM

In truth, HP PRIME is able to directly solve the integral that calculates the "P_Value" of the "studentized range q distribution". However, if the HP Prime emulator resolves the integral in about 1 second, the "hardware" HP Prime G2 resolves the integral in about 33 seconds. I therefore ask if there is an algorithm written in HP PPL to quickly calculate the "P_Value" of "studentized range q distribution", even using the HP Prime G2.
Thanks again, Roberto.

#cas
Studentized_Range_q_P_Value(q,k,ν):=
BEGIN
LOCAL funzione;
funzione:=(√(2*π)*k*ν^(ν/2))/
(Gamma(ν/2)*2^(ν/2-1))*∫(s^(ν-1)*
e^((-ν*s^2/2))/(√(2*π))*∫(e^((-z^2)/2)/
(√(2*π))*(normald_cdf(0,1,z+q*s)-
normald_cdf(0,1,z))^(k-1),z,
-(∞),+∞),s,0,+∞);
RETURN 1-funzione;
END;
#end

RE: HP PPL and Statistical distributions - compaqdrew - 04-13-2021 05:43 AM

I did some digging on this. The root of the issue is a typical problem of mathematical calculation. Expressions which are nice for math, are not always nice numerically. Even my Mathematica struggles on it. You need either a simpler expression or a fast approximate method. Of course the nature of either is it will be less general than the pure form, but how to handle that depends a lot on your application.

On the expression side, Wikipedia has nice closed-form solutions for special values of k, if you can be satisfied with a less general solution. If those don't work it might be possible to get some bounds for your application and ask a big boy CAS like Mathematica if it can find a nice solution. I did ask it to simplify the general formula on the offchance it can prove some new math for us, it appears it cannot.

The simplest and typical implementation is to use tables and interpolate. For example, this is the approach given here. I also saw an approach like this recommended in one of the older papers on this function. Basically, load up a matrix with the table, do lookups and average nearby values. One of the papers I read recommended harmonic interpolation between table entries but that may be overkill.

Another approach is to use a numerical method to slice open the nasty integrals, such as some kind of quadrature rule. Glancing at the formula, I'd guess you use some Laguerre method on one integral and a Hermite method on the other. Fortunately we don't have to guess, there are known algorithms for this function. Here is an older method (indeed, seems to be Laguerrish) and here is a modern one from R. Some of these methods have limitations on the parameters. There's also this survey of similar methods which has a large appendix with implementations. Unfortunately none of these are HP PPL, but it's probably straightforward to translate.

RE: HP PPL and Statistical distributions - robmio - 04-13-2021 08:45 AM

Hello, thank you very much for such a thorough answer. I'll ask you another question: the calculation of numerical integrals, by utilising the HP 50g calculator, provides, in setting the values, the calculation accuracy. For example, if I set a "Fixed 3" "number format", the accuracy with which the numerical integral is calculated includes up to the third decimal number. By setting a "low" accuracy, the HP 50g calculates the integral more quickly. Is it possible to set, in the HP PRIME calculator, a "low" calculation accuracy, in order to speed up the calculation of the integral?

best regards, Roberto Mioni.

RE: HP PPL and Statistical distributions - compaqdrew - 04-13-2021 08:18 PM

This might be better answered by somebody who knows the calculator better. Part of the issue is that for a function like Integrate, it tries a lot of different methods. I believe that Prime, Giac/Xcas, and some HP48 routines can take a crack at integration, and each of those codebases have many methods inside them. I think some of these methods do look at CAS or Home precision, but I'm not sure all of them do. Based on the errors I get running your program, it seems a lot of time is spent trying different methods, and so that may be an issue regardless of accuracy.

You might have better luck using a specific method. There is a builtin romberg function, I don't know if it can handle your integrals but it's worth a try to replace them with romberg calls and see how it performs. There's also some numerical methods for the Prime written by Jhonatan Peretz that you can try, in particular he has a Legendere quadrature that might work well.

Back to your question, there are specific integration methods that trade off time for accuracy. Monte carlo integration is the first that comes to mind, since you can reduce the iteration count. There is a double integral version for PPL by Eddie Shore, but it's probably simple enough you can port the simpler single algorithm from anywhere. The trick will be the infinite integration limits. Usually you can just use large/small values if the contribution outside the range will be small.

RE: HP PPL and Statistical distributions - Albert Chan - 05-18-2021 03:38 PM

(04-13-2021 08:18 PM)compaqdrew Wrote: The trick will be the infinite integration limits.
Usually you can just use large/small values if the contribution outside the range will be small.

This is indeed the case.

Recently, we discussed tanh-sinh, exp-sinh, sinh-sinh algorithm here
Combined as 1, we have quad (my Lua implementation here, emace67 double_exponential.py here).

First, I clean-up CAS code, and run in XCas (s lower limit changed to 0. produced some speed-up)

Code:

Studentized_Range_q_P_Value(q,k,ν):=

BEGIN

return 1 -

√(2/π) * k * (ν/2)^(ν/2) / Gamma(ν/2)

* ∫(s^(ν-1)*exp(-ν*s*s/2)

* ∫(exp(-z*z/2) * normald_cdf(z,z+q*s)^(k-1)

, z = -∞ .. +∞)

, s = 0. .. +∞);

END;

Picked an example, from studentized-range-q-table
From first page, P-value = 0.10, k=8, df=12, we have q = 4.511

XCas> Studentized_Range_q_P_Value(4.511, 8, 12)

Evaluation time: 6.24, returned 0.100017379207

Now, rewrite code in lua, using quad to integrate.

Code:

do

mathx = require'mathx'

local erf, tgamma = mathx.erf, mathx.tgamma

local exp, sqrt, pi, huge = math.exp, math.sqrt, math.pi, math.huge

local quad = require'quad'.quad

local function cdf(a, b)    -- area from a to b

    a = erf(0.7071067811865476*a)

    b = erf(0.7071067811865476*b)

    return (b-a) * 0.5

end

function Studentized_Range_q_P_Value(q,k,v)

    local c = sqrt(2/pi) * k * (v/2)^(v/2) / tgamma(v/2)

    local k1, v1 = k-1, v-1

    local function fs(s)

        local function fz(z) return exp(-z*z/2) * cdf(z,z+q*s)^k1 end

        return s^v1 * exp(-v*s*s/2) * quad(fz, -huge, huge)

    end

    return 1 - c * quad(fs, 0, huge)

end

end

lua> Studentized_Range_q_P_Value(4.511, 8, 12)
0.10001737920797527

We got the same answer ... But, LuaJIT take 0.003 second Big Grin

RE: HP PPL and Statistical distributions - robmio - 05-18-2021 05:23 PM

Hello Albert Chan, how do I write the lua code in the HP PRIME?

Thanks a lot, Roberto

RE: HP PPL and Statistical distributions - robmio - 05-18-2021 06:03 PM

Is it possible to implement the Lua implementation of quad () for HP PRIME (perhaps with Python)?

RE: HP PPL and Statistical distributions - Albert Chan - 05-18-2021 11:15 PM

Hi, robmio

Here is the translated quad, in PPL.
I hope this is right, as I am learning PPL as I code.

Compare to Lua or Python, there are some gothas:

1. all local variables have to declare up front.
2. there is no Lua elseif or Python elif equivalent.
3. there is no logic precedence: a OR b AND c translated to ((a OR b) AND c)
4. comparison operators has "sugar" in it, which may translated wrongly.

Cas> a > b > c → a > b AND b > c ... looks good
Cas> a > b > c > d → ((a>b) AND (b>c)) > d ???
Cas> (a>b) == (mode == 2) → a>b and b == mode == 2 ???

Code:

#cas

quad6(f, a, b, d, n, eps)

BEGIN

LOCAL k, c, t0, t1, t, p, s, err;

LOCAL fp, fm, u, r, x, y, mode;

fp := fm := k := c := 0.;

t0 := 7.38905609893065; // e^2

a := approx(a);

b := approx(b);

mode := (a-a!=0) + 2*(b-b!=0);

IF mode == 0 THEN       // tanh-sinh

    c,d,s := d, (b-a)/2, f((a+b)/2)

ELSE

    IF mode == 3 THEN   // sinh-sinh

        s := f(c)

    ELSE                // exp-sinh

        c, s := a, b;   // c = finite edge

        IF mode == 1 THEN c, s := b, a END

        IF s < 0 THEN d:=-d END

        s := f(c+d)

    END

END

REPEAT

    p, t, t1 := 0, sqrt(t0), t0;

    IF k==0 THEN t1 := t END

    t0 := t;

    IF mode == 0 THEN           // tanh-sinh

        REPEAT

            u := exp(1/t-t);

            r := 2*u/(1+u);

            x := d*r;

            IF a+x != a THEN

                y := f(a+x);

                IF y-y==0 THEN fp:=y ELSE fp*=c END

            END

            IF b-x != b THEN

                y := f(b-x);

                IF y-y==0 THEN fm:=y ELSE fm*=c END

            END

            y := (t+1/t)*r/(1+u) * (fp+fm);

            p, t := p+y, t*t1;

        UNTIL abs(y) <= eps*abs(p);

    ELSE

        t *= 0.5;

        REPEAT

            r := exp(t-0.25/t);

            u := r; fp := 0;

            IF mode == 3 THEN   // sinh-sinh

                r := 0.5*r - 0.5/r;

                u := 0.5*u + 0.5/u;

                y := f(c - d*r);

                IF y-y==0 THEN fp := y*u END

            ELSE                // exp-sinh

                x := c + d/r;

                IF x == c THEN break END

                y := f(x);

                IF y-y==0 THEN fp := y/u END

            END

            y := f(c + d*r);

            IF y-y == 0 THEN fp += y*u END

            fp *= (t+0.25/t);

            p, t := p+fp, t*t1;

        UNTIL abs(fp) <= eps*abs(p);

    END

    s, err, k := s+p, abs(s-p), k+1;

// print(k, d*ldexp(s,1-k), err/abs(s));

UNTIL err <= 10*eps*abs(s) OR k>n;

IF mode==1 OR (mode==3 and a>b) THEN d:=-d END

return d*ldexp(s,1-k), err/abs(s);

END;

quad(f,a,b) := quad6(f,a,b,1,6,1e-9);

#end

Usage:

Cas> pdf(x) := exp(-x*x/2) / sqrt(2*pi)

Cas> quad(pdf, -1, 1) → [0.682689492137, 2.08159857137e−14]
Cas> quad(pdf, 1, inf) → [0.158655253928, 1.81990772641e−10]
Cas> quad(pdf, 1, -inf) → [-0.841344746, 7.74963867073e−11]
Cas> quad(pdf, -inf, 1) → [ 0.841344746, 7.74963867073e−11]
Cas> quad(pdf, -inf, inf) → [1., 8.45545855554e−13]

RE: HP PPL and Statistical distributions - Albert Chan - 05-19-2021 02:39 PM

I was unable to get double integrals using only quad. Any ideas ?

Second best option, wrap outer integral with quad, inner with int.

Code:

#cas // Studentized P_Value, from q,k,v

SRP_qkv(q,k,v):=

1 - √(2/π)*k*(v/2)^(v/2)/Gamma(v/2)

* quad(s -> s^(v-1)*exp(-v*s*s/2) * 

  int(exp(-z*z/2) * normald_cdf(z,z+q*s)^(k-1)

, z, -inf, inf)

, 0., inf) (1)

#end

Cas> SRP_qkv(4.511, 8, 12) → 0.100017379208, finished in 0.45 second

For unknown reason, Studentized_Range_q_P_Value(4.511, 8, 12) hang the emulator.
Is there a way to interrupt calculations, besides restarting emulator ?

RE: HP PPL and Statistical distributions - Albert Chan - 05-19-2021 04:56 PM

(05-19-2021 02:39 PM)Albert Chan Wrote: I was unable to get double integrals using only quad. Any ideas ?

Problem solved ! Smile

Parameters of function also required declared as local variables.
I had discovered this when checking memory of Cas variables.

I would have guessed parameters local by default ... weird.

Code:

#cas // Studentized P_Value, from q,k,v

SRP_qkv(q,k,v)

BEGIN

LOCAL s, z;

1 - √(2/π)*k*(v/2)^(v/2)/Gamma(v/2)

* quad(s -> s^(v-1)*exp(-v*s*s/2) * 

  quad(z -> exp(-z*z/2) * normald_cdf(z,z+q*s)^(k-1)

, -inf, inf) (1)

, 0, inf) (1)

END

#end

Cas> SRP_qkv(4.511, 8, 12) → 0.100017379208, finished in 0.184 second

RE: HP PPL and Statistical distributions - robmio - 05-19-2021 06:00 PM

Dear Albert Chan, it's strange, even your latest version doesn't work on the HP PRIME emulator. Where am I doing wrong?

RE: HP PPL and Statistical distributions - Albert Chan - 05-19-2021 06:35 PM

Hi, robmio

Your screenshot was the one I get, without LOCAL s,z;

I did a emulator reset, to be sure: build 2.1.14181 (2018 10 16)
Then, load back quad and SRP_qkv ... Everything is working fine.

After running SRP_qkv(4.511, 8, 12) again. These are in Cas variables:

SRP_qkv (Function) 2KB
quad (Function) 0.25KB
quad6 (Function) 9KB

Paradoxically, with hard reset, LOCAL s,z; is not needed anymore.

Code:

#cas // Studentized P_Value, from q,k,v

SRP_qkv(q,k,v) :=

1 - √(2/π)*k*(v/2)^(v/2)/Gamma(v/2)

* quad(s -> s^(v-1)*exp(-v*s*s/2) * 

  quad(z -> exp(-z*z/2) * normald_cdf(z,z+q*s)^(k-1)

, -inf, inf) (1)

, 0, inf) (1)

#end

RE: HP PPL and Statistical distributions - robmio - 05-19-2021 07:05 PM

I followed your instructions - it works fine!

RE: HP PPL and Statistical distributions - Albert Chan - 05-20-2021 12:40 PM

Now that we have it running, we may optimize a bit.

Inner integral, \(\int_{-∞}^∞\), work best if bulk of area inside \(\int_{-d}^d\) (quad defaulted d = 1)

Inner integrand is bell-shaped, with center shifted left, we like to expand d = 4 (*)

One way is simple substitution: \(\int_{-∞}^∞ f(x)\,dx = \int_{-∞}^∞ f(4u)(4\,du) \)
Now, LHS \(\int_{-4}^4\) = RHS \(\int_{-1}^1\)

This is the reason quad6() exposed other parameters. We can simply set d = 4.

Code:

#cas // Studentized P_Value, from q,k,v

SRP_qkv(q,k,v) :=

1 - √(2/π)*k*(v/2)^(v/2)/Gamma(v/2)

* quad (s -> s^(v-1) * exp(-v*s*s/2) * 

  quad6(z -> exp(-z*z/2) * normald_cdf(z,z+q*s)^(k-1)

, -inf, inf, 4., 6, 1e-9) (1)

, 0, inf) (1)

#end

Cas> SRP_qkv(4.511, 8, 12) → 0.100017379208, finished in 0.11 second

---

(*) without shift, quad(z->exp(-z^2/2), -inf, inf) work better with d=8
However, if shifted, it might get worse, as d=9 example shows.

lua> Q = require'quad'
lua> f = function(z) return exp(-z*z/2) end
lua> Q.quad(Q.count(f), -huge, huge, 1), Q.n
2.506628274630998 119
lua> Q.quad(Q.count(f), -huge, huge, 2), Q.n
2.506628274630838 101
lua> Q.quad(Q.count(f), -huge, huge, 4), Q.n
2.506628274630999 43
lua> Q.quad(Q.count(f), -huge, huge, 8), Q.n
2.506628274630997 33
lua> Q.quad(Q.count(f), -huge, huge, 9), Q.n
2.506628274630931 55

RE: HP PPL and Statistical distributions - robmio - 05-20-2021 04:10 PM

Optimization cuts execution times in half. thank you very much

RE: HP PPL and Statistical distributions - Albert Chan - 05-22-2021 12:50 PM

A nice thing about infinity, you can compress it, shift it, and it is still infinity.

cdf function, Φ(z) = 1/2 * (1+erf(z/√2)).
We can substitute x=√2*z, and avoid calls to Φ(z), which internally called erf.

XCas moyal.cc Wrote:static gen normal_cdf(const gen & g,GIAC_CONTEXT){
return rdiv(erf(ratnormal(plus_sqrt2_2*g),contextptr)+plus_one,2,contextptr)
}

The substitution also simplified exp(-z*z/2) to exp(-x*x)

Do the same way for outer integral, letting y = √2*s, and pull out the constants, we have:
(d adjusted to offset effect of scaling: 4/√2 ≈ 2.8, 1/√2 ≈ 0.71)

Code:

#cas // Studentized P_Value, from q,k,v

SRP_qkv(q,k,v) :=

1 - k*2^(2-k) * v^(v/2)/(Gamma(v/2)*√π)

* quad6(y -> y^(v-1) * exp(-v*y*y) *

  quad6(x -> exp(-x*x) * (erf(x+q*y)-erf(x))^(k-1)

, -inf, inf, 2.8, 6, 1e-9) (1)

, 0.0, inf, 0.71, 6, 1e-9) (1)

#end

Cas> SRP_qkv(4.511, 8, 12) → 0.100017379208, finished in 0.085 second

RE: HP PPL and Statistical distributions - robmio - 05-22-2021 05:18 PM

Thanks a lot to Albert Chan who reduced the computation time to 1 second / 1.2 seconds (HP PRIME G2).
In some posts I have noticed that the HP PRIME calculator becomes even faster if programmed in PYTHON language. Therefore, I tried to translate Albert Chan's quad6-program from #Cas ... #end to PYTHON. However, I was unable to get the program to work - where am I wrong? Below the program:

Code:

#PYTHON EXPORT quadr6

from sys import *

from math import *

f=argv[0]

a=float(argv[1])

b=float(argv[2])

d=float(argv[3])

n=float(argv[4])

eps=float(argv[5])

fp = fm = k = c = 0.

t0 = 7.38905609893065 # e^2

mode = (a-a != 0) + 2* (b-b != 0)

if mode == 0: # tanh-sinh

    c, d, s = d, (b-a)/2, f((a+b)/2)

else:

    if mode == 3: # sinh-sinh

        s = f(c)

    else: # exp-sinh

        if mode == 2:

            c = a

        else:

            c =b

        if bool(a>b) == bool(mode==2):

            d = −d

        s = f(c+d)

while err >=10*eps*abs(s) or k<n:

    p, t, t1 = 0, sqrt(t0), t0

    if k==0:

        t1 = t

    t0 = t

    if mode == 0:

        while abs(y) >= eps*abs(p):

            u = exp(1/t-t)

            r = 2*u/(1+u)

            x = d*r

            if a+x != a:

                y = f(a+x)

                if y-y == 0:

                    fp = y

                else:

                    fp *= c

            if b-x != b:

                y = f(b-x)

                if y-y==0:

                    fm =y

                else:

                    fm *=c

            y = (t+1/t)*r/(1+u) * (fp+fm)

            p, t = p+y, t*t1

    else:

        t *= 0.5

        while abs(fp) >= eps*abs(p):

            r = exp(t-0.25/t)

            u = r

            fp =0

            if mode == 3:

                r = 0.5*r - 0.5/r

                u = 0.5*u + 0.5/u

                y = f(c-d*r)

                if y-y == 0:

                    fp = y*u

            else:

                x = c + d/r

                if x == c:

                    break

                y = f(x)

                if y-y == 0:

                    fp = y/u

            y = f(c+d*r)

            if y-y == 0:

                fp += y*u

            fp *= (t+0.25/t)

            p, t = p+fp, t*t1

    s, err, k = s+p, abs(s-p), k+1

if mode == 1 or (mode==3 and a>b):

    d = −d

ris=[d*ldexp(s,1-k), err/abs(s)] 

print(ris) 

#end

EXPORT Quadratura_Python(f,a,b,d,n,eps)

BEGIN

PYTHON(quadr6,f,a,b,d,n,eps);

END;

RE: HP PPL and Statistical distributions - Albert Chan - 05-22-2021 06:46 PM

Some issues with PYTHON program (assumed it is true Python):

1. Turning repeat until to while loop must ensure it run at least once.
Loop counts must also match. It might be safer translate repeat-until as:

while 1:
...
if until_condition: break

2. True Python does not turn 1./0. to +inf, but raise ZeroDivisionError instead.
Why Python's ZeroDivisionError for floating-point type is a bad and unnecessary feature

In fact, Python seldom return float('inf'), for floating math operations.
(1e300*1e300 return +inf, but 1e300 ** 2 raise OverflowError ... weird)

>>> from math import *
>>> log(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: math domain error
>>> exp(1000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: math range error

We have to catch exceptions instead. see emece67 double_exponential.py

This is why quad.lua translated to PPL. It behave closer to Lua:

CAS> 1./0. → +Inf
CAS> log(0.) → -Inf
CAS> exp(1000.) → +Inf

RE: HP PPL and Statistical distributions - Albert Chan - 05-23-2021 01:32 PM

PPL allowed comparing complex numbers with symbolic inf (not DOM_float inf = 1.0/0)

CAS> inf > 3+4i → 1
CAS> -inf > 3+4i → 0
CAS> type(approx(inf)) → DOM_SYMBOLIC

This feature extended quad for complex limits, without changes Smile

CAS> f(z) := (2/√π) *e^(-z*z)
CAS> erf(3+4i) → -120.186991395-27.7503372936*i
CAS> quad(f, 0, 3+4i)(1) → -120.186991395-27.750337294*i
CAS> quad(f, 3+4i, inf)(1) → 121.186991394+27.7503372904*i = 1 - erf(3+4i)
CAS> quad(f, inf, 3+4i)(1) → -121.186991394-27.7503372904*i = erf(3+4i) - 1