# HP Forums

Full Version: Faster inverse gamma and factorial for the WP 34S
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I noticed that the inverse gamma library routine included with the 34S (discussed here) was quite slow for small inputs. I've added a very simple second algorithm that uses the secant method and converges faster for small inputs: < ~2.3 in single precision mode and < ~1e17 in double precision mode. For example, 1.6 vs 5.3 seconds in single precision mode or 2.1 vs 13.5 seconds in double precision mode with 0.89 as the input. While it's a noticeable improvement, it's probably far from optimal since I'm not an expert on numerical algorithms, so I'd be interested in better versions.

Code:
```LBL'![^-1]'           // Inverse factorial   LocR 10   SF .00 LBL'[GAMMA][^-1]'     // Inverse gamma   LocR 10   FS?S D       // Set flag D and save old status     SF .01   STOM .01   SSIZE8   STOS .02     // Input in .02   // Estimate inverse gamma   # [pi]   STO Z        // Save for later use (SP limit)   # 86   /   STO T        // Save for later use (max.error)   RCL+ Y   # [pi]   STO+ X   SQRT   /   LN   STO .00   # eE   /   W[sub-p]   STO/ .00   # 20   x[<->] .00   // Loop counter in .00   # 1/2   +   DBL?     SKIP 1   SKIP 2   x[<=]? .00   // Reuse loop counter for another purpose     GTO'SM'    // DP: ~result <= 20   x[<=]? T     GTO'SM'    // SP: ~result <= 3.14   // Find precise result for large arguments using   // the original invgamma code from the 34S library. LBL'LL'        // Loop for large arguments   [<->] XZZX   GAMMA        // X=g(Xn-1), Y=T=input, Z=Xn-1   /   +/-   INC X   RCL Z   LN   RCL* L   STO+ X   DEC X   RCL/ T   /   STO+ X   [<->] XZZY   -            // X=Xn, Y=Xn-1, Z=input   DSZ .00      // Safeguard against slow convergence     CNVG? 3       GTO'RX'  // Return X   GTO'LL'      // Loop   // Find precise result for small arguments using the secant method LBL'SM'   [<->] XXXX   GAMMA   RCL- .02   [<->] XYXY   SIGN   RCL* A       // Max error: we need ~ 0.027, close enough   -            // X=Xn-1, Y=f(Xn-2), Z=Xn-2 LBL'SL'        // Loop for small arguments   [<->] XXYZ   GAMMA   RCL .02   DSZ .00      // Safeguard against slow convergence     CNVG? 3       GTO'RZ'  // Return Z   -            // X=f(Xn-1), Y=Xn-1, Z=f(Xn-2), T=Xn-2   STO- Z   RCL Y        // X=Xn-1, Y=f(Xn-1), Z=Xn-1, T=f(Xn-2)-f(Xn-1), A=Xn-2   RCL- A   RCL/ T   SPEC?        // T=f(Xn-2)-f(Xn-1) was zero     GTO'RZ'    // Return Z   RCL* Y   RCL+ Z       // X=Xn, Y=f(Xn-1), Z=Xn-1   GTO'SL'      // Loop LBL'RZ'        // Return value in Z   x[<->] Z LBL'RX'        // Return value in X   FS? .00      // Called as inverse factorial?     DEC X   x[<->] .02   STO L   RCLS .02   FC? .01      // Restore flag D     CF D   RCLM .01   END```
Edit: Fixed code.
(12-04-2014 06:14 AM)Bit Wrote: [ -> ]While it's a noticeable improvement, it's probably far from optimal since I'm not an expert on numerical algorithms, so I'd be interested in better versions.
Bit: When I was helping Torsten Manz improve his HP-15C simulator, I did a lot of research on good numerical algorithms, and one of the better examples of good coding and well selected numerical algorithms was the "Python" math library. The guys who wrote the Python math library used some really good algorithms. I also found that the Wolfram Alpha web site is a nice reference to check the accuracy of your algorithms. I don't know that your particular functions "Inverse Gamma" & "Inverse Factorial" are included in the Python math library, but if they are, they will probably be fast, accurate and interesting to look at as examples. Hope this helps, Barry
(02-03-2015 05:05 PM)BarryMead Wrote: [ -> ]
(12-04-2014 06:14 AM)Bit Wrote: [ -> ]While it's a noticeable improvement, it's probably far from optimal since I'm not an expert on numerical algorithms, so I'd be interested in better versions.
Bit: When I was helping Torsten Manz improve his HP-15C simulator, I did a lot of research on good numerical algorithms, and one of the better examples of good coding and well selected numerical algorithms was the "Python" math library. The guys who wrote the Python math library used some really good algorithms. I also found that the Wolfram Alpha web site is a nice reference to check the accuracy of your algorithms. I don't know that your particular functions "Inverse Gamma" & "Inverse Factorial" are included in the Python math library, but if they are, they will probably be fast, accurate and interesting to look at as examples. Hope this helps, Barry
Thank you for the advice. Unfortunately the inverse functions don't seem to be included in this library, but if you were referring to another math library, please let me know.
(02-06-2015 01:39 AM)Bit Wrote: [ -> ]If you were referring to another math library, please let me know.
Bit I did a google search for "python math libraries inverse gamma" and found this code snippet here.
Without knowledge of your intended goal, and qualitative evaluation of the function in operation, I wouldn't be able to tell you if it is of any value or not.
Code:
``` import numpy as np import math import scipy.special def _lambert_w(z):   """   Lambert W function, principal branch.   See http://en.wikipedia.org/wiki/Lambert_W_function   Code taken from http://keithbriggs.info/software.html   """   eps=4.0e-16   em1=0.3678794411714423215955237701614608   assert z>=-em1, 'LambertW.py: bad argument %g, exiting.'%z   if 0.0==z:        return 0.0   if z<-em1+1e-4:       q=z+em1       r=math.sqrt(q)       q2=q*q       q3=q2*q       return\        -1.0\        +2.331643981597124203363536062168*r\        -1.812187885639363490240191647568*q\        +1.936631114492359755363277457668*r*q\        -2.353551201881614516821543561516*q2\        +3.066858901050631912893148922704*r*q2\        -4.175335600258177138854984177460*q3\        +5.858023729874774148815053846119*r*q3\        -8.401032217523977370984161688514*q3*q   if z<1.0:       p=math.sqrt(2.0*(2.7182818284590452353602874713526625*z+1.0))       w=-1.0+p*(1.0+p*(-0.333333333333333333333+p*0.152777777777777777777777))   else:       w=math.log(z)   if z>3.0:        w-=math.log(w)   for i in xrange(10):       e=math.exp(w)       t=w*e-z       p=w+1.0       t/=e*p-0.5*(p+1.0)*t/p       w-=t       if abs(t)<eps*(1.0+abs(w)):            return w   raise AssertionError, 'Unhandled value %1.2f'%z def _gamma_inverse(x):   """   Inverse the gamma function.   http://mathoverflow.net/questions/12828/inverse-gamma-function   """   k=1.461632 # the positive zero of the digamma function, scipy.special.psi   assert x>=k, 'gamma(x) is strictly increasing for x >= k, k=%1.2f, x=%1.2f' % (k, x)   C=math.sqrt(2*np.pi)/np.e - scipy.special.gamma(k) # approximately 0.036534   L=np.log((x+C)/np.sqrt(2*np.pi))   gamma_inv = 0.5+L/_lambert_w(L/np.e)   return gamma_inv```
(02-06-2015 03:33 AM)BarryMead Wrote: [ -> ]
(02-06-2015 01:39 AM)Bit Wrote: [ -> ]If you were referring to another math library, please let me know.
Bit I did a google search for "python math libraries inverse gamma" and found this code snippet here.
Without knowledge of your intended goal, and qualitative evaluation of the function in operation, I wouldn't be able to tell you if it is of any value or not.
Code:
``` import numpy as np ...```

That piece of code implements the same initial approximation that's employed in the 34S algorithm but doesn't go further. Here's some explanation and some more. That approximation seems quite good (but I'm not an expert). What I've been trying to do is make the subsequent iterations converge faster.
(02-06-2015 03:57 AM)Bit Wrote: [ -> ]That approximation seems quite good (but I'm not an expert). What I've been trying to do is make the subsequent iterations converge faster.
I haven't tried your program or confirmed it's accuracy with wolfram alpha, but it seems like you have already improved upon the widely accepted and well established algorithms in math geek circles. Nice Job!
Bit: I have a question related to testing out your inverse gamma function. When I used cut and paste to
grab your code into a text file, and tried to compile it with the wp34s_asm.pl program I couldn't get it to
make a "Ram" image to test the program out in ram. Do you know of a way to get the assembler
to generate a ram image that can be copied to the wp34s.dat file for testing of a program?
If I try to generate a "ROM" image then I have to manipulate whole libraries and run into library
management complexities. I know I could manually key the program into RAM, but that is
prone to human typo errors. I have never gotten the assembler to do that simple task have you?
(02-06-2015 04:38 AM)BarryMead Wrote: [ -> ]Bit: I have a question related to testing out your inverse gamma function. When I used cut and paste to
grab your code into a text file, and tried to compile it with the wp34s_asm.pl program I couldn't get it to
make a "Ram" image to test the program out in ram. Do you know of a way to get the assembler
to generate a ram image that can be copied to the wp34s.dat file for testing of a program?
If I try to generate a "ROM" image then I have to manipulate whole libraries and run into library
management complexities. I know I could manually key the program into RAM, but that is
prone to human typo errors. I have never gotten the assembler to do that simple task have you?

What you need is the wp34s_lib program, a wp34s.dat memory file and the code saved in a text file (I've just made a few minor changes so copy-paste it again). The following command will add the contents of 'sourcefile' to your memory file:
tools/wp34s_lib.pl sourcefile -ilib path_to_wp34s.dat -f -state
(02-06-2015 05:28 AM)Bit Wrote: [ -> ]tools/wp34s_lib.pl sourcefile -ilib path_to_wp34s.dat -f -state
Thanks, Bit.
I had gotten part of this to work in the past, but for some reason if your original program from RAM that
you disassemble contains any "HOTKEY" labels A .. D then the assembler chokes on the program
and you get nothing when you try to reassemble the program. I think this is a design error in the assembler personally. I think people should be able to keystroke any program they want into ram and be able to
disassemble it into a text file and reassemble it into a ram image without error.
Inverse Gamma Accuracy Issue.
Bit: When I enter the following into wolfram alpha:
1.5=gamma(x) solve for x, I get an answer: 2.6627663453201472954412987

When I enter 1.5 and use the compiled inverse gamma function on the wp34sgui_Bit.exe eumulator, in double precision mode I get 2.705794548255370.

If I use the same compiled wp34s.dat image on the QT emulator, or the wp34sgui.exe (non _Bit) emulator, I get the same answer as wolfram alpha.
Also if I use the _Bit emulator in single precision mode I seem to get the same answer (less digits) as wolfram alpha.
Why is that?
(02-06-2015 07:39 AM)BarryMead Wrote: [ -> ]Inverse Gamma Accuracy Issue.
Bit: When I enter the following into wolfram alpha:
1.5=gamma(x) solve for x, I get an answer: 2.6627663453201472954412987

When I enter 1.5 and use the compiled inverse gamma function on the wp34sgui_Bit.exe eumulator, in double precision mode I get 2.705794548255370.

If I use the same compiled wp34s.dat image on the QT emulator I get the same answer as wolfram alpha.
Why is that?

My builds support a few extra instructions and because of that the instruction codes are different and wp34s.dat files that contain programs aren't compatible with the official versions. You'd need another wp34s.op file to be able to compile the source correctly for my builds.

There's no secret sauce: You can apply my patches to a source tree, recompile everything and generate a wp34s.op file. But I'll include it in my next release to make it easy.
(02-06-2015 07:58 AM)Bit Wrote: [ -> ]You'd need another wp34s.op file to be able to compile the source correctly for my builds.
Thanks again Bit. I was puzzled by the discrepancy. I look forward to your next build with
the custom opcode file.
In the mean time, I stepped through the compiled program to find where it differed from the source listing and corrected the offending opcode.
It was the "DBL?". Which got mis-compiled into "DBLOFF". When I fixed that one bad instruction, the program works to full accuracy.
(02-06-2015 07:58 AM)Bit Wrote: [ -> ]You'd need another wp34s.op file to be able to compile the source correctly for my builds.

Bit: Does this mean that the wp34s-lib.dat file is special for your compiles as well. If so could you please include that file in the next build as well as the custom wp34s.op opcode file?

With both of these files included we can run the wp34sgui_Bit.exe program and it will behave exactly
like the calc_xtal_full.bin realbuild. And we would be able to add/remove library routines to/from
the standard Bit build.

Eventually I would LOVE to duplicate your full toolchain setup on my Linux system. I know that is a big undertaking, but with it I could contribute more to the community.

So far I have not been able to get the
yagarto-bu-2.20.1_gcc-4.6.0-c-c++_nl-1.19.0_gdb-7.2_eabi_20110328.exe
which installs to do anything under wine on my linux system once installed.
I am not sure what have done wrong. Thanks ever so much for all of your valuable time and patience.

Barry
(02-06-2015 02:45 PM)BarryMead Wrote: [ -> ]
(02-06-2015 07:58 AM)Bit Wrote: [ -> ]You'd need another wp34s.op file to be able to compile the source correctly for my builds.

Bit: Does this mean that the wp34s-lib.dat file is special for your compiles as well. If so could you please include that file in the next build as well as the custom wp34s.op opcode file?
Yes and yes: I'll include those two files in my next builds. I didn't in the past because I didn't think the emulator would be used for anything but quickly testing the new features.

(02-06-2015 02:45 PM)BarryMead Wrote: [ -> ]Eventually I would LOVE to duplicate your full toolchain setup on my Linux system. I know that is a big undertaking, but with it I could contribute more to the community.
It's not very difficult actually, but this is not the right thread to discuss it. I've sent you a message.
(12-04-2014 06:14 AM)Bit Wrote: [ -> ]I noticed that the inverse gamma library routine included with the 34S (discussed here) was quite slow for small inputs. I've added a very simple second algorithm that uses the secant method and converges faster for small inputs: < ~2.3 in single precision mode and < ~1e17 in double precision mode. For example, 1.6 vs 5.3 seconds in single precision mode or 2.1 vs 13.5 seconds in double precision mode with 0.89 as the input. While it's a noticeable improvement, it's probably far from optimal since I'm not an expert on numerical algorithms, so I'd be interested in better versions.

The standard lib that comes with the 34s firmware contains a digamma routine - so why not use the obvious and setup a simple Newton solver for Γ-1(a):

x := x + (a/Γ(x) – 1) / Ψ(x)

The idea of using Ψ was already mentioned in the 2013 thread, but the solution posted there replaced Ψ with an approximation. Using "the real thing" seems to work better, at least for small arguments.

Since the digamma routine does not preserve the stack (at least the current version doesn't) two registers are used. R0 stores the input a, R1 holds the current approximation. The code is simple and straightforward:

Code:
```LBL"IGM" STO 00 #pi #086 / + CNST 87   // that's sqrt(2*pi) / LN RCL X #eE / Wp / #1/2 + LBL 00 STO 01 XEQ'Ψ' RCL 00 RCL 01 Γ - RCL/L RCL/Y RCL 01 ENTER RCL+Z CNVG? 03 RTN GTO 00```

Timings:
An input of a = 0,89 returns the result in 1,9 (SP) resp. 2,7 (DP) seconds.

The above code is a first quick and dirty version, so take care. But it seems to run almost as fast as your version while it's much more compact and needs no separate handling of small or large arguments.

Dieter
(02-07-2015 07:34 PM)Dieter Wrote: [ -> ]The standard lib that comes with the 34s firmware contains a digamma routine - so why not use the obvious and setup a simple Newton solver for Γ-1(a):

x := x + (a/Γ(x) – 1) / Ψ(x)

That's a good idea, thank you. When I have the time, I'll see if this method is any faster if used with the built-in digamma routine (which is a compile time option, not available by default).
(02-08-2015 01:51 AM)Bit Wrote: [ -> ]That's a good idea, thank you. When I have the time, I'll see if this method is any faster if used with the built-in digamma routine (which is a compile time option, not available by default).
If you combine the source code for the Digamma function with Dieter's Inverse Gamma function, the combined program is longer than Bit's original Inverse Gamma function (in RAM memory steps). It might be worth the extra memory space if it is significantly faster. But since it is not part of the standard flash library, it is not really fair to say it is a shorter program.
If the digamma route is interesting and custom images are acceptable, I've implemented this function both in xrom (trunk/xrom/digamma.wp34s) and native C (trunk/unused/digamma.c).

- Pauli
(02-08-2015 02:07 AM)BarryMead Wrote: [ -> ]If you combine the source code for the Digamma function with Dieter's Inverse Gamma function, the combined program is longer than Bit's original Inverse Gamma function (in RAM memory steps).

Yes, that's true of course. But you can also include the relevant parts of the current digamma function into the inverse gamma routine. As far as I can tell, only positive arguments occur, and also some overhead in digamma can be removed. The final result may look like this:

Code:
```LBL"IGM" STO 00 #pi #086 STO 04 / + CNST 87   // that's sqrt(2*pi) / LN RSD 05    // make sure x>=-1 RCL X #eE / RSD 15    // and x > -1/e Wp / #1/2 + LBL 00 STO 01 Clx STO 02 #008 RCL 01 1/x STO+02 x<> L INC X x<?Y BACK 005 1/x STO 03 x² FILL #020 / #021 1/x - * #001 SDR 001 + * DEC X * #012 / RCL 03 LN - #1/2 RCL*03 - RCL-02 RCL 00 RCL 01 Γ - RCL/L RCL/Y RCL 01 ENTER RCL+Z CNVG? 03 RTN DSZ 04 GTO 00 ERR 20```

This version has 69 lines and it runs even faster. Γ-1(0,89) now is returned in 1,6 (SP) resp. 2,3 (DP) seconds.

The code also addresses a problem for arguments that are too close to the Gamma minimum at 0,88560 31944 10888 70027 88159 00582 58873 32... Here the current inverse Gamma implementation throws an error (Lambert's W is undefined for x<–1/e). This is handled by the two RSDs.

Arguments very close to the mentioned Gamma minimum still take longer, but they return a result instead of an error message. That is: they do, except in some extremely close cases. Γ-1(0,88560 31944 10888 70027 88159 1) still works, but accuracy may be limited. Arguments even closer will not converge within reasonable time, so the code is limited to 86 iterations. Otherwise a "No root found" error is generated. In this case X and Y hold the two last approximations and the last adjustment is in Z, so that the user at least gets a result along with a rough impression of its accuracy.

Dieter
(02-08-2015 03:34 PM)Dieter Wrote: [ -> ]This version has 69 lines and it runs even faster. Γ-1(0,89) now is returned in 1,6 (SP) resp. 2,3 (DP) seconds.
Wow Dieter that is nice. I am going to compile and experiment with this one. Thanks!
Pages: 1 2
Reference URL's
• HP Forums: https://www.hpmuseum.org/forum/index.php
• :