[WP34s] Parallel function

The Museum of HP Calculators

HP Forum Archive 21

[WP34s] Parallel function
Message #1 Posted by Dieter on 4 Nov 2012, 6:39 a.m.

As far as I can see the current implementation of the 34s parallel function (cf. sourceforge.net) works this way:
x*y par(x,y) = ----- x+y
This may lead to overflow as soon as the product in the nominator becomes too large, even if the result falls within the working range. And for (almost) any positive x, y it will.
The current code also checks whether x*y is zero in order to test if either x or y (or both) are zero -- in this case the returned result is zero as well. For very small values of x and/or y this may lead to underflow, returning a plain zero where the true result actually is very small, but not zero.
Example:
x y 34s || true ----------------------------------------- 1 E+4000 1 E+4000 infinity 5 E+3999 1 E-4000 1 E-4000 0 5 E-4001
There are numerous ways to compute the parallel function, and there usually is one best way to do it for a certain combination of the two arguments. Not all, but most overflow/underflow problems can be solved if the function would work this way:
1 par(x,y) = --------- 1/x + 1/y
001 LBL"PAR" 002 x=0? 003 RTN 004 1/x 005 x<>y 006 x=0? 007 RTN 008 1/x 009 + 010 1/x 011 RTN
That's why I would suggest an update of the parallel function. The code above is just a first example -- I am sure it can be done better (e.g. without slight rounding errors in the last digit). What do you think?
Dieter

Edited: 4 Nov 2012, 6:42 a.m.

Re: [WP34s] Parallel function
Message #2 Posted by Marcus von Cube, Germany on 4 Nov 2012, 11:00 a.m.,
in response to message #1 by Dieter

The problem with this implementation is that none of the arguments may be zero (which is valid and should return zero).

Re: [WP34s] Parallel function
Message #3 Posted by Dieter on 4 Nov 2012, 12:48 p.m.,
in response to message #2 by Marcus von Cube, Germany

Please take a look at the suggested code, especially line 002/003 and 006/007. Of course any of the two arguments may be zero -- in this case zero is returned. That's the same behaviour as in the current implementation and how you say it is supposed to work ("should return zero").
Dieter

Re: [WP34s] Parallel function
Message #4 Posted by fhub on 4 Nov 2012, 1:53 p.m.,
in response to message #3 by Dieter

Yes, it returns zero, but the stack (and LastX) is not correctly set.
Franz

Re: [WP34s] Parallel function
Message #5 Posted by Dieter on 4 Nov 2012, 2:22 p.m.,
in response to message #4 by fhub

The suggested code is not intended to replace the current implementation literally. It's a suggestion for a different way to compute the parallel function, here in the way it could be done in a standard 34s user program. An XROM routine with a similar approach would use the respective standard commands on entry (XIN DYADIC) and exit (xOUT xOUT_NORMAL) that also take care of the stack and last X.
Once again: the idea here is just the suggestion of a different way to compute the parallel function, thus overcoming the current limitations. Maybe (probably) there is an even better solution - what's your idea here ?-)
Dieter

Re: [WP34s] Parallel function
Message #6 Posted by Walter B on 4 Nov 2012, 2:38 p.m.,
in response to message #5 by Dieter

Franz, an XROM routine following the line Dieter suggested would return a proper stack and Lastx AFAIUI.

Re: [WP34s] Parallel function
Message #7 Posted by Marcus von Cube, Germany on 4 Nov 2012, 3:33 p.m.,
in response to message #3 by Dieter

Agreed, I just did look at the formula, not the suggested implementation.
I'm pretty sure that Pauli had a specific reason to implement || the way it is now. XROM executes in double precision with an extended exponent range so exponent overflow isn't really an issue here.

Re: [WP34s] Parallel function
Message #8 Posted by Dieter on 4 Nov 2012, 4:26 p.m.,
in response to message #7 by Marcus von Cube, Germany

Marcus,
Quote:
XROM executes in double precision with an extended exponent range so exponent overflow isn't really an issue here.
That's only true if the 34s is set to standard precision. Yes, for XROM routines in double precision the working range exceeds 1E+6000, which is more than enough to handle every input in standard precision, i.e. up to 1E+384. However, the user may just as well have the device set to DP, et voila...
Please take a look at the two examples I posted: values like 1E+4000 can be easily entered in DP mode - and will lead to overflow resp. underflow, just as shown there.
[DBLON] 4000 [10^x] [ENTER] [||] => infinity -4000 [10^x] [ENTER] [||] => 0
Dieter

Re: [WP34s] Parallel function
Message #9 Posted by Marcus von Cube, Germany on 4 Nov 2012, 4:34 p.m.,
in response to message #8 by Dieter

I doubt that 10^4000 is a meaningful physical dimension...
Exponents up to 999 can be entered directly in DP and they will not lead to an overflow here.
I'm still not sure why Pauli decided to use the present algorithm but I assume he had a cancellation issue in mind.

Re: [WP34s] Parallel function
Message #10 Posted by Paul Dale on 4 Nov 2012, 4:50 p.m.,
in response to message #9 by Marcus von Cube, Germany

Performance actually -- x*y / (x+y) is faster to calculate than 1 / (1/x+1/y) and given that full accuracy in double precision wasn't a requirement, I chose the former.
- Pauli

Re: [WP34s] Parallel function
Message #11 Posted by Paul Dale on 4 Nov 2012, 4:30 p.m.,
in response to message #7 by Marcus von Cube, Germany

When I implemented these two in xrom, I didn't consider the edge cases in double precision -- the code as is will work fine for single precision which was my main concern at the time. We've always said double precision accuracy is not guaranteed. This is still the case. Double precision was intended for internal implementation and exposed at the request of this community.
The previous C version would have worked in double precision too -- it has a much larger exponent range again.
Still, I don't mind improving our double precision performance if the cost is low which it seems to be here.
- Pauli

Re: [WP34s] Parallel function
Message #12 Posted by Dieter on 4 Nov 2012, 4:42 p.m.,
in response to message #11 by Paul Dale

As usual, it's a tradeoff: the current method is potentially a bit more accurate, at least in the last digit. Try x = y = 15 and the method I posted will be +2 ULP off (7.500....002 instead of 7.5). So the best possible implementation would use (at least) two different ways to compute the result, for instance depending on x*y returning "infinity" or not. ;-)
Dieter

Re: [WP34s] Parallel function
Message #13 Posted by Paul Dale on 4 Nov 2012, 4:48 p.m.,
in response to message #12 by Dieter

It would also be necessary to check x*y under flowing to zero.
- Pauli

Re: [WP34s] Parallel function
Message #14 Posted by Dieter on 4 Nov 2012, 5:09 p.m.,
in response to message #13 by Paul Dale

Ok, what about this one:
001 LBL"PAR" 002 RCL* Y 003 x!=0? 004 infinity? 005 SKIP 004 006 X<>Y 007 RCL+ L 008 / 009 RTN 010 X<> L 011 X=0? 012 RTN 013 1/x 014 X<>Y 015 X=0? 016 RTN 017 1/x 018 + 019 1/x 020 RTN
Just a thought, without any further testing. ;-)
Dieter

Re: [WP34s] Parallel function
Message #15 Posted by Paul Dale on 4 Nov 2012, 6:36 p.m.,
in response to message #14 by Dieter

It is worse than needing a comparison against zero -- if x*y goes subnormal it is possible to have as few as one significant digit in the product -- better to switch formulas before this point I think.
It is starting to seem not worth the effort...
- Pauli

Re: [WP34s] Parallel function
Message #16 Posted by Werner on 5 Nov 2012, 2:41 a.m.,
in response to message #1 by Dieter

Or, the best of both worlds:
*LBL"PAR" X=0? GTO 00 ENTER RCL+ ST Z / *LBL 00 * RTN
(code 42S-style, as I don't own a 34S. Sadly, perhaps ;-)

Re: [WP34s] Parallel function
Message #17 Posted by Paul Dale on 5 Nov 2012, 2:45 a.m.,
in response to message #16 by Werner

This would suffer the same problems as my original code -- it uses the same formula.
42S style isn't far removed from 34S -- the 42S was a guiding light for the project in many ways.
- Pauli

Re: [WP34s] Parallel function
Message #18 Posted by Werner on 5 Nov 2012, 2:50 a.m.,
in response to message #17 by Paul Dale

No, it doesn't. Try it.
Werner
perhaps too short a reply: it does use the same formula, but a different order of operations, and avoids overflow. Try it
Edited: 5 Nov 2012, 2:52 a.m.

Re: [WP34s] Parallel function
Message #19 Posted by Paul Dale on 5 Nov 2012, 3:04 a.m.,
in response to message #18 by Werner

Yeah, it avoids the overflow, not thinking straight a the moment. It also seems like it avoids the underflow as well.
It probably needs a test for x = -y to produce a -infinity result but this is easy.
Thanks.
- Pauli

Re: [WP34s] Parallel function
Message #20 Posted by Dieter on 5 Nov 2012, 3:09 p.m.,
in response to message #18 by Werner

Great - I knew there was a better way to do this. ;-) However, the code may behave differently depending on the order of the two arguments, i.e. whether X or Y is the larger value. Since par(x,y) = par(y,x) I think this should be avoided. I am not quite sure, but maybe the code works best if X is the larger value (avoids underflow here and there). A simple x<y?  x<>y could do the trick. At least it returns consistent results for par(x,y) and par(y,x).
BTW - during these tests with very large numbers like 10⁶⁰⁰⁰ I noticed a quite ..."special behaviour" of the 10^x function. Usually integer arguments should return exact powers of ten, but somewhere there is a point where - in DP mode - this is no longer guaranteed:
500 [10^x] => 1.00000...0 E+500 999 [10^x] => 1.00000...0 E+999 2000 [10^x] => 9.99999...9 E+1999 6000 [10^x] => 9.99999...8 E+5999
Dieter

Re: [WP34s] Parallel function
Message #21 Posted by Walter B on 5 Nov 2012, 3:38 p.m.,
in response to message #20 by Dieter

I feared such academic problems would arise. We should have left DP closed to the public - but we didn't. Perhaps we shall rethink that decision ...

Re: [WP34s] Parallel function
Message #22 Posted by Dieter on 8 Nov 2012, 9:02 a.m.,
in response to message #21 by Walter B

Let's get back to the facts.

The "problem" here is the error. Not the error report.

Without DP mode there was no way to design and test XROM functions in user code, this way replacing former internal routines like, for instance, my humble contributions, the Normal quantile or Lambert's W. If you don't like the average user to access this mode, simply have it mapped to a special key sequence and do not mention it in the manual. Maybe it can be shown on the last page in small print and ROT13-encoded. #-)

The error in the exponential functions can be explained with a closer look at the present code. Functions like 2^x or 10^x are implemented by calling the general power function y^x, using the relation y^x = e^x * ln y. All this is done with 39-digit internal precision. It can be shown that during the evaluation of 10⁶⁰⁰⁰ an error of 1 ULP in ln 10 will cause a relative error of 10^-34 or 1 unit in the 34th digit. The shown errors are larger, so obviously ln 10 has a slightly larger error.

The current (internal) constants catalogue includes exact values for ln 2 and ln 10. Using these instead of evaluating the logs at runtime would yield two significant advantages:
First of all there is a substantial gain in speed for 2^x and 10^x, as the 34s log functions are generally on the slow side.
Second, the mentioned cases 10²⁰⁰⁰ and 10⁶⁰⁰⁰ should return more accurate results: the 39-digit value of ln 10 has a roundoff error of merely 1,1 E-39 which means that the relative error of the final result should be within 1 E-35. Even with an additional error of 1 ULP in 6000 * ln 10 it's still 1 E-34.

That's why I think this error can be fixed easily.
Dieter

Re: [WP34s] Parallel function
Message #23 Posted by Paul Dale on 8 Nov 2012, 3:52 p.m.,
in response to message #22 by Dieter

You're correct I'm not using the builtin log2 and log10 constants for 2^x and 10^x :-( I missed this optimisation unfortunately. I do use them for log₂ and log₁₀ at least.
Now to see if the code can be easily changed to pass in the log of base....
- Pauli

Re: [WP34s] Parallel function
Message #24 Posted by Paul Dale on 8 Nov 2012, 3:58 p.m.,
in response to message #23 by Paul Dale

The next build will include this optimisation :)
- Pauli

Re: [WP34s] Parallel function
Message #25 Posted by Walter B on 9 Nov 2012, 6:19 p.m.,
in response to message #22 by Dieter

Thanks for the details. As you see, this detailed report triggered some improvement, especially points 3 and 4 :-) No doubt the original error report wasn't the problem, but it wasn't leading to a solution yet - the second report was far better in that aspect. Thanks again.
BTW, DP mode is covered in an appendix of the manual - no easy way to hide it even more ;-)

Re: [WP34s] Parallel function
Message #26 Posted by Marcus von Cube, Germany on 5 Nov 2012, 4:04 p.m.,
in response to message #20 by Dieter

We do not special case 10^integer. The error you mention is 2 ULPs. I think we can live with that.

[ Return to Index | Top of Index ]

Go back to the main exhibit hall