Internal Computation of Cumulative Distributions

The Museum of HP Calculators

HP Forum Archive 16

Internal Computation of Cumulative Distributions
Message #1 Posted by Les Wright on 18 June 2006, 12:12 a.m.

I am just full of questions tonight!
This takes a tangent a bit on a related thread from earlier in the week regarding the accuracy and precision of built-in calculator functions.
I am particularly interested in the upper-tail probababity distribution functions (UTPC, UTPT, UTPN, UTPF) which are part of the catalog of the 48 and 49 series and were introduced in the HP28 series if not sooner.
Lets take the upper-tail normal distribution function, UTPN. In the aforementioned earlier thread a contributor compared the results of 0 1 x UTPN for an impressive range of integer values of x to those of Mathematica, and the concurrence to all 12 digits is impressive.
Maybe this is a trade secret, but what does the calculator use to achieve such precision, and so fast at that? I am guessing series or continued fraction developments that converge quickly due to being hardware encoded.
Any ideas out there? In my quest for more precise algorithms to compute a UTPN analog on, say my HP42S or 41 series, I am assuming that there must be something out there more refined than the polynomial and rational approximations due to Cecil Hastings found in Abramowitz and Stegun, and elsewhere. Summing series works well but is slow and makes it difficult to use in Solver scenarios where one wishes to find the distribution score associated with a different probability, as a series needs to be summed for each guess.
Any enlightment is always greeted with great gratitude and occasional awe.
Les

Re: Internal Computation of Cumulative Distributions
Message #2 Posted by Han on 18 June 2006, 2:18 a.m.,
in response to message #1 by Les Wright

You can use pretty much any dissassembler to take a peek at the code for the command. Jazz is a good program just for this type of thing.

Re: Internal Computation of Cumulative Distributions
Message #3 Posted by James M. Prange (Michigan) on 18 June 2006, 3:24 a.m.,
in response to message #2 by Han

Or on a 49 series, use Nosy or maybe CQIF, which are both available at hpcalc.org.
Regards,
James

Re: Internal Computation of Cumulative Distributions
Message #4 Posted by Rodger Rosenbaum on 18 June 2006, 3:48 a.m.,
in response to message #1 by Les Wright

The references on this page should get you started to finding what you're looking for:
http://citeseer.ist.psu.edu/186363.html


Re: Internal Computation of Cumulative Distributions
Message #5 Posted by Les Wright on 18 June 2006, 1:21 p.m.,
in response to message #4 by Rodger Rosenbaum

Thank you so much.
I will admit that, as someone without mathematical training beyond a little calculus, statistics, and abstract algebra in university, much of the literature may be hard for me to penetrate, but it is a big start.
I will look at the disssembler programs too but must confess that I don't understand assembler or machine code so it may not help me. I will need to understand the "brains" of my calculators better.
The thing that has fascinated me about the HP28/48/49 UTPN function is how for larger x the command 0 1 x UTPN does not give 0, but rather a very tiny figure with large negative exponent. This is obviously NOT the result of the stardard series developments of the cumulative distribution, which, for x > 8 or so produce 1 within 12-digit precision, which on subtraction from 1 to get the upper tail valu gives 0. However, 0 1 47 UTPN gives impressively 1.77975640867E-482, which apparently is correct, and I ask amazedly "how does it do that?" When I look at Abramowitz and Stegun, the continued fraction developments look more promising since the emphasis is on successive quotients and products rather than sums and differences, and this is more likely to preserve all the digits while the zeros before them accumulate. That said, I suspect the HP routine is likely more sophisticated, and probably selects a different algorithm according to the size of the argument.
My efforts to extend the Hastings approximations a few terms by least squares means using Levenberg-Marquart is not as fruitful as I would hope it would be, and I have learned since that this is usually not a good way to go for a number of reasons. I am fascinated by the methods outline but not well explained in Hastings' 1955 text that aim not to minimize the total sum of the squared deviations but rather seek to "level" the error curve so the maximum absolute deviation is minimized. The chebyshev() function in Maple, to my great disappointment, produces large and unwieldy polynomials that do well in a narrow range of argument but go bonkers quickly outside of it. At least the Hastings approximations are accurate and well-behaved throughout the entire desired domain (the various transformations ensure that), even if I am having no luck extending them to 12-digit precision.
Anyway, thanks for the references. I will bone up on my reading now.
Les


Re: Internal Computation of Cumulative Distributions
Message #6 Posted by Rodger Rosenbaum on 20 June 2006, 5:21 p.m.,
in response to message #5 by Les Wright

Les, what is the input argument range of Hasting's approximation of the UTPN function?

Re: Internal Computation of Cumulative Distributions
Message #7 Posted by Les Wright on 21 June 2006, 1:11 a.m.,
in response to message #6 by Rodger Rosenbaum

All nonnegative reals.

Re: Internal Computation of Cumulative Distributions
Message #8 Posted by Rodger Rosenbaum on 21 June 2006, 1:31 a.m.,
in response to message #7 by Les Wright

As you said, 0 1 47 UTPN on the HP49G+ gives 1.77975640867E-482; what does Hastings give for that input?
Is the Hastings approximation short enough that you could give it to us?

Re: Internal Computation of Cumulative Distributions
Message #9 Posted by Les Wright on 21 June 2006, 11:55 a.m.,
in response to message #8 by Rodger Rosenbaum

To avoid typing error I link you to the appropriate page in Abraham and Stegun here.
I have been playing around with 26.2.19 and and 26.2.17 is also of interest to me.
I don't have my relevant calc on hand now, but I recall that if you use the complement of each of these formulae to give Q(x), the upper-tail probability, Q(47) is of the order of 2.1e-482 or 2.2e-482 vs. the actual value of about 1.77e-482.
Thanks to the recent acquisition of Forman Acton's great book and some further playing around in Maple I am learning quite a bit more about approximation theory and minimax methods. But the Hastings book, though of admittedly historical interest now, continues to fascinate me. I am quickly learning that Hastings and his team used a minimax "equal ripple" criterion to carry out the various approximations, though such wording is not to be found in the book. Owners of the book will know that it consists for panels from a filmstrip presentation used to support lectures on the subject. Like in modern day PowerPoint handouts, there seem to be some gaps in the narrative that were probably best filled in by "being there".
Formula 26.2.19 is, at least in its Q(x) complement version, what is used by the SigmaNormd routine in the HP41 Stats Pac. It is surprisingly good for a fairly short formula. My hope was to be able to understand either Hastings' or more contemporary minimax methods to extend it by a term or two or to extend the coefficients out to 12 digits to get 10 or maybe 11 digits on a 12 digit calc like the HP42S. (I know that rounding error will make 12 digits elusive if not impossible, unless there is some way for the programmer to access the extra internal digits.)
I may be looking at some sort of composite routine. In Maple I have figured the the series representation converges rapidly for z < 1 and the continued fraction version for z > 2. The place in between can maybe be served by one or perhaps two rational approximations of the Remez or Chebyshev-Pade ilk.
Why do I want to do this if the most impressive UTPN function already exists? Simple--just because. I actually find the cerebral challenge oddly gratifying. What I do for a living has no useful application for any of this. My family and friends think I have lost my mind.
Thanks for turning your mind to my little intellectual adventure.
Les

Re: Internal Computation of Cumulative Distributions
Message #10 Posted by Rodger Rosenbaum on 22 June 2006, 5:51 a.m.,
in response to message #9 by Les Wright

Les said:
I have been playing around with 26.2.19 and and 26.2.17 is also of interest to me.
I don't have my relevant calc on hand now, but I recall that if you use the complement of each of these formulae to give Q(x), the upper-tail probability, Q(47) is of the order of 2.1e-482 or 2.2e-482 vs. the actual value of about 1.77e-482.
I don't get this result, Les. From formula 26.2.19 (with the "1-" removed from the front of the formula), I get 1.57028276E-78 when x=47.
I also note that the claimed error, |e(x)| < 1.5E-7, is an absolute error, and not a relative error. The input argument range of 0 <= x < infinity is only possible because the function Q(x) goes to zero as x goes to infinity, so any series approximation that goes to zero as x goes to infinity will give that error for large values of x. If you examine the relative error of 26.2.19, it gets large rapidly as x gets larger than about 3.
Formula 26.2.12 works well for large x. You'll also notice that Q(x) for large values of x are given at the bottom of pages 972 and 973.

Re: Internal Computation of Cumulative Distributions
Message #11 Posted by Les Wright on 22 June 2006, 3:14 p.m.,
in response to message #10 by Rodger Rosenbaum

I trust your results better than mine. I was responding from memory, and obviously misremembered.
In Maple, albeit an older version, I am getting some promising rational approximations using the minimax() function in the numapprox package. I think the key is cutting up the desired domain into 4 or 5 shorter bits and coming up with an approximation for each.
Also, I think I am just going to have to be happy with 10 digits. 12 digits is difficult to get due to rounding error, and 11 digits seems to require more complicated formulae than 10 digits.
I will pay more attention to the asymptotic approximation you point out. It is probably a lot easier to implement on an HP41 or HP42S than a recursive continued fraction routine.
Thanks!
Les

Re: Internal Computation of Cumulative Distributions
Message #12 Posted by Les Wright on 22 June 2006, 3:31 p.m.,
in response to message #11 by Les Wright

Drat--
Maybe I am miscoding it in Maple, but so far I am finding the asymptotic expansion 26.2.12 pretty useless for what I want it for. Remember, within the limits of calculators I am interested in, "large" means 20 on the HP41 and 47 on the HP42.
I will check my work and keep you posted.
Les

Re: Internal Computation of Cumulative Distributions
Message #13 Posted by Rodger Rosenbaum on 22 June 2006, 10:47 p.m.,
in response to message #11 by Les Wright

I tweaked the Hastings formula, 26.2.19, adding two more terms and changing the final exponent. This augmented version has about two orders of magnitude lower error (1.5E-9). The parameters are:

d1 = .066490394 d2 = .028736092 d3 = .0048418539 d4 = .00021102572 d5 = 8.6356160E-5 d6 = 1.6802053E-5 d7 = -2.2584744E-6 d8 = 4.2881737E-7

The full expression for Q(x) is:
Q(x)=.5*(1+d1*x+d2*x^2+d3*x^3+d4*x^4+d5*x^5+d6*x^6+d7*x^7+d8*x^8)^-12
This formula is optimized for the range 1 < x < 7. I was able to get better accuracy from the extra 2 terms if I didn't go down to x=0, but rather only down to x=1.
Formula 26.2.11 seems to be the best one for low values of x. I found that 9 terms of 26.2.11 will give 10 digit accuracy, and 11 terms will give 13 digits of accuracy for 0 <= x <= 1.
Formula 26.2.12 works best for large values of x. In fact, if you examine this formula, you will see that as x becomes very large, the series approaches 1, and Z(x)/x approaches Q(x). However, this formula doesn't work well for small values of x. When x is small, the series part diverges as you add more terms. I find that 22 terms will give 11 digit accuracy for x > 7, and for x = 47, it gives 47 digit accuracy (strictly coincidental).
So, the thing to do (so far) would be to use the 9-term 26.2.11 for the range 0 <= x <= 1, the augmented 26.2.19 for 1 < x < 7, and the 22-term 26.2.12 for 7 < x.
Is your email address as given here in the forum info correct? I should email you some pix.

Re: Internal Computation of Cumulative Distributions
Message #14 Posted by Les Wright on 23 June 2006, 11:43 a.m.,
in response to message #13 by Rodger Rosenbaum

This is really cool! Can you tell me how you did the "tweaking"? Did you use a minimax routine, or was it mostly trial and error? Do you work in Maple, Mathematica, or some other CAS? Perhaps a similar tweaking of 26.2.17, which is already pretty accurate to begin with, is similarly possible.
My Maple encoding of the asymptotic expansion was in error (i messed up the indices in the sum and product operators), but I got the results you described after correcting. On pitfall is that the series is actually divergent, so one needs to keep the argument large enough and not go out to too many terms.
I am really keen to find some way to evaluate continued fractions in an RPN environment. It is easy enough to do in RPL--just port the Numerical Recipes routine, for example--but a bit more challenging on, as, an HP41. There must be a routine out there someplace. I will check the software library, but would be grateful for tips or leads.
My publicized email is correct and you can send me what you like.
Les

Re: Internal Computation of Cumulative Distributions
Message #15 Posted by Les Wright on 23 June 2006, 12:04 p.m.,
in response to message #14 by Les Wright

Routine 17 on this page may be close to what I seek. I am quite impressed that the routine incorporates a series AND continued fraction routine in only 78 steps. erf(), erfc() and the cumulative normal distribution are of course connected so adapting this can be a reasonable alternative to seeking a highly accurate closed form rational or polynomial approximation.
Les

Re: Internal Computation of Cumulative Distributions
Message #16 Posted by Les Wright on 23 June 2006, 2:26 p.m.,
in response to message #15 by Les Wright

I think this is my answer.
It gives full digit precision, or close to it, and doesn't seem too slow at all, though I have yet to test it on a real HP41 (I just have my PDA with P41CX, but have set the slowest CPU setting).
Since erfc() and the upper tail normal probability are related by a simple transformation relation, since the core integral is basically the same, it is a simple matter to adapt the program to give what I need, and it is easy enough to port to both HP42S and HP41C series.
I also have an answer regarding the continued fraction issue. The author of this routine quite simply fixed the number of iterations at 24, which in this case is more than enough for arguments above 2. With the examples I have tested I have got full precision most of the time, though occasionally I lose the last digit to rounding.
I will punch it into a "real" HP41 tonight to see how swift it is.
Les

[ Return to Index | Top of Index ]

Go back to the main exhibit hall