HP Forums
Faster inverse gamma and factorial for the WP 34S - Printable Version

+- HP Forums (http://www.hpmuseum.org/forum)
+-- Forum: Not HP Calculators (/forum-7.html)
+--- Forum: Not quite HP Calculators - but related (/forum-8.html)
+--- Thread: Faster inverse gamma and factorial for the WP 34S (/thread-2553.html)

Pages: 1 2


Faster inverse gamma and factorial for the WP 34S - Bit - 12-04-2014 06:14 AM

I noticed that the inverse gamma library routine included with the 34S (discussed here) was quite slow for small inputs. I've added a very simple second algorithm that uses the secant method and converges faster for small inputs: < ~2.3 in single precision mode and < ~1e17 in double precision mode. For example, 1.6 vs 5.3 seconds in single precision mode or 2.1 vs 13.5 seconds in double precision mode with 0.89 as the input. While it's a noticeable improvement, it's probably far from optimal since I'm not an expert on numerical algorithms, so I'd be interested in better versions.

Code:
LBL'![^-1]'           // Inverse factorial
  LocR 10
  SF .00

LBL'[GAMMA][^-1]'     // Inverse gamma
  LocR 10
  FS?S D       // Set flag D and save old status
    SF .01
  STOM .01
  SSIZE8
  STOS .02     // Input in .02

  // Estimate inverse gamma
  # [pi]
  STO Z        // Save for later use (SP limit)
  # 86
  /
  STO T        // Save for later use (max.error)
  RCL+ Y
  # [pi]
  STO+ X
  SQRT
  /
  LN
  STO .00
  # eE
  /
  W[sub-p]
  STO/ .00
  # 20
  x[<->] .00   // Loop counter in .00
  # 1/2
  +

  DBL?
    SKIP 1
  SKIP 2
  x[<=]? .00   // Reuse loop counter for another purpose
    GTO'SM'    // DP: ~result <= 20
  x[<=]? T
    GTO'SM'    // SP: ~result <= 3.14

  // Find precise result for large arguments using
  // the original invgamma code from the 34S library.
LBL'LL'        // Loop for large arguments
  [<->] XZZX
  GAMMA        // X=g(Xn-1), Y=T=input, Z=Xn-1
  /
  +/-
  INC X
  RCL Z
  LN
  RCL* L
  STO+ X
  DEC X
  RCL/ T
  /
  STO+ X
  [<->] XZZY
  -            // X=Xn, Y=Xn-1, Z=input
  DSZ .00      // Safeguard against slow convergence
    CNVG? 3
      GTO'RX'  // Return X
  GTO'LL'      // Loop

  // Find precise result for small arguments using the secant method
LBL'SM'
  [<->] XXXX
  GAMMA
  RCL- .02
  [<->] XYXY
  SIGN
  RCL* A       // Max error: we need ~ 0.027, close enough
  -            // X=Xn-1, Y=f(Xn-2), Z=Xn-2

LBL'SL'        // Loop for small arguments
  [<->] XXYZ
  GAMMA
  RCL .02
  DSZ .00      // Safeguard against slow convergence
    CNVG? 3
      GTO'RZ'  // Return Z
  -            // X=f(Xn-1), Y=Xn-1, Z=f(Xn-2), T=Xn-2
  STO- Z
  RCL Y        // X=Xn-1, Y=f(Xn-1), Z=Xn-1, T=f(Xn-2)-f(Xn-1), A=Xn-2
  RCL- A
  RCL/ T
  SPEC?        // T=f(Xn-2)-f(Xn-1) was zero
    GTO'RZ'    // Return Z
  RCL* Y
  RCL+ Z       // X=Xn, Y=f(Xn-1), Z=Xn-1
  GTO'SL'      // Loop

LBL'RZ'        // Return value in Z
  x[<->] Z
LBL'RX'        // Return value in X
  FS? .00      // Called as inverse factorial?
    DEC X
  x[<->] .02
  STO L
  RCLS .02
  FC? .01      // Restore flag D
    CF D
  RCLM .01
  END
Edit: Fixed code.


RE: Faster inverse gamma and factorial for the WP 34S - BarryMead - 02-03-2015 05:05 PM

(12-04-2014 06:14 AM)Bit Wrote:  While it's a noticeable improvement, it's probably far from optimal since I'm not an expert on numerical algorithms, so I'd be interested in better versions.
Bit: When I was helping Torsten Manz improve his HP-15C simulator, I did a lot of research on good numerical algorithms, and one of the better examples of good coding and well selected numerical algorithms was the "Python" math library. The guys who wrote the Python math library used some really good algorithms. I also found that the Wolfram Alpha web site is a nice reference to check the accuracy of your algorithms. I don't know that your particular functions "Inverse Gamma" & "Inverse Factorial" are included in the Python math library, but if they are, they will probably be fast, accurate and interesting to look at as examples. Hope this helps, Barry


RE: Faster inverse gamma and factorial for the WP 34S - Bit - 02-06-2015 01:39 AM

(02-03-2015 05:05 PM)BarryMead Wrote:  
(12-04-2014 06:14 AM)Bit Wrote:  While it's a noticeable improvement, it's probably far from optimal since I'm not an expert on numerical algorithms, so I'd be interested in better versions.
Bit: When I was helping Torsten Manz improve his HP-15C simulator, I did a lot of research on good numerical algorithms, and one of the better examples of good coding and well selected numerical algorithms was the "Python" math library. The guys who wrote the Python math library used some really good algorithms. I also found that the Wolfram Alpha web site is a nice reference to check the accuracy of your algorithms. I don't know that your particular functions "Inverse Gamma" & "Inverse Factorial" are included in the Python math library, but if they are, they will probably be fast, accurate and interesting to look at as examples. Hope this helps, Barry
Thank you for the advice. Unfortunately the inverse functions don't seem to be included in this library, but if you were referring to another math library, please let me know.


RE: Faster inverse gamma and factorial for the WP 34S - BarryMead - 02-06-2015 03:33 AM

(02-06-2015 01:39 AM)Bit Wrote:  If you were referring to another math library, please let me know.
Bit I did a google search for "python math libraries inverse gamma" and found this code snippet here.
Without knowledge of your intended goal, and qualitative evaluation of the function in operation, I wouldn't be able to tell you if it is of any value or not.
Code:

import numpy as np
import math
import scipy.special

def _lambert_w(z):
  """
  Lambert W function, principal branch.
  See http://en.wikipedia.org/wiki/Lambert_W_function
  Code taken from http://keithbriggs.info/software.html
  """
  eps=4.0e-16
  em1=0.3678794411714423215955237701614608
  assert z>=-em1, 'LambertW.py: bad argument %g, exiting.'%z
  if 0.0==z: 
      return 0.0
  if z<-em1+1e-4:
      q=z+em1
      r=math.sqrt(q)
      q2=q*q
      q3=q2*q
      return\
       -1.0\
       +2.331643981597124203363536062168*r\
       -1.812187885639363490240191647568*q\
       +1.936631114492359755363277457668*r*q\
       -2.353551201881614516821543561516*q2\
       +3.066858901050631912893148922704*r*q2\
       -4.175335600258177138854984177460*q3\
       +5.858023729874774148815053846119*r*q3\
       -8.401032217523977370984161688514*q3*q
  if z<1.0:
      p=math.sqrt(2.0*(2.7182818284590452353602874713526625*z+1.0))
      w=-1.0+p*(1.0+p*(-0.333333333333333333333+p*0.152777777777777777777777))
  else:
      w=math.log(z)
  if z>3.0: 
      w-=math.log(w)
  for i in xrange(10):
      e=math.exp(w)
      t=w*e-z
      p=w+1.0
      t/=e*p-0.5*(p+1.0)*t/p
      w-=t
      if abs(t)<eps*(1.0+abs(w)): 
          return w
  raise AssertionError, 'Unhandled value %1.2f'%z

def _gamma_inverse(x):
  """
  Inverse the gamma function.
  http://mathoverflow.net/questions/12828/inverse-gamma-function
  """
  k=1.461632 # the positive zero of the digamma function, scipy.special.psi
  assert x>=k, 'gamma(x) is strictly increasing for x >= k, k=%1.2f, x=%1.2f' % (k, x)
  C=math.sqrt(2*np.pi)/np.e - scipy.special.gamma(k) # approximately 0.036534
  L=np.log((x+C)/np.sqrt(2*np.pi))
  gamma_inv = 0.5+L/_lambert_w(L/np.e)
  return gamma_inv



RE: Faster inverse gamma and factorial for the WP 34S - Bit - 02-06-2015 03:57 AM

(02-06-2015 03:33 AM)BarryMead Wrote:  
(02-06-2015 01:39 AM)Bit Wrote:  If you were referring to another math library, please let me know.
Bit I did a google search for "python math libraries inverse gamma" and found this code snippet here.
Without knowledge of your intended goal, and qualitative evaluation of the function in operation, I wouldn't be able to tell you if it is of any value or not.
Code:

import numpy as np
...

That piece of code implements the same initial approximation that's employed in the 34S algorithm but doesn't go further. Here's some explanation and some more. That approximation seems quite good (but I'm not an expert). What I've been trying to do is make the subsequent iterations converge faster.


RE: Faster inverse gamma and factorial for the WP 34S - BarryMead - 02-06-2015 04:16 AM

(02-06-2015 03:57 AM)Bit Wrote:  That approximation seems quite good (but I'm not an expert). What I've been trying to do is make the subsequent iterations converge faster.
I haven't tried your program or confirmed it's accuracy with wolfram alpha, but it seems like you have already improved upon the widely accepted and well established algorithms in math geek circles. Nice Job!


RE: Faster inverse gamma and factorial for the WP 34S - BarryMead - 02-06-2015 04:38 AM

Bit: I have a question related to testing out your inverse gamma function. When I used cut and paste to
grab your code into a text file, and tried to compile it with the wp34s_asm.pl program I couldn't get it to
make a "Ram" image to test the program out in ram. Do you know of a way to get the assembler
to generate a ram image that can be copied to the wp34s.dat file for testing of a program?
If I try to generate a "ROM" image then I have to manipulate whole libraries and run into library
management complexities. I know I could manually key the program into RAM, but that is
prone to human typo errors. I have never gotten the assembler to do that simple task have you?


RE: Faster inverse gamma and factorial for the WP 34S - Bit - 02-06-2015 05:28 AM

(02-06-2015 04:38 AM)BarryMead Wrote:  Bit: I have a question related to testing out your inverse gamma function. When I used cut and paste to
grab your code into a text file, and tried to compile it with the wp34s_asm.pl program I couldn't get it to
make a "Ram" image to test the program out in ram. Do you know of a way to get the assembler
to generate a ram image that can be copied to the wp34s.dat file for testing of a program?
If I try to generate a "ROM" image then I have to manipulate whole libraries and run into library
management complexities. I know I could manually key the program into RAM, but that is
prone to human typo errors. I have never gotten the assembler to do that simple task have you?

What you need is the wp34s_lib program, a wp34s.dat memory file and the code saved in a text file (I've just made a few minor changes so copy-paste it again). The following command will add the contents of 'sourcefile' to your memory file:
tools/wp34s_lib.pl sourcefile -ilib path_to_wp34s.dat -f -state


RE: Faster inverse gamma and factorial for the WP 34S - BarryMead - 02-06-2015 06:09 AM

(02-06-2015 05:28 AM)Bit Wrote:  tools/wp34s_lib.pl sourcefile -ilib path_to_wp34s.dat -f -state
Thanks, Bit.
I had gotten part of this to work in the past, but for some reason if your original program from RAM that
you disassemble contains any "HOTKEY" labels A .. D then the assembler chokes on the program
and you get nothing when you try to reassemble the program. I think this is a design error in the assembler personally. I think people should be able to keystroke any program they want into ram and be able to
disassemble it into a text file and reassemble it into a ram image without error.


RE: Faster inverse gamma and factorial for the WP 34S - BarryMead - 02-06-2015 07:39 AM

Inverse Gamma Accuracy Issue.
Bit: When I enter the following into wolfram alpha:
1.5=gamma(x) solve for x, I get an answer: 2.6627663453201472954412987

When I enter 1.5 and use the compiled inverse gamma function on the wp34sgui_Bit.exe eumulator, in double precision mode I get 2.705794548255370.

If I use the same compiled wp34s.dat image on the QT emulator, or the wp34sgui.exe (non _Bit) emulator, I get the same answer as wolfram alpha.
Also if I use the _Bit emulator in single precision mode I seem to get the same answer (less digits) as wolfram alpha.
Why is that?


RE: Faster inverse gamma and factorial for the WP 34S - Bit - 02-06-2015 07:58 AM

(02-06-2015 07:39 AM)BarryMead Wrote:  Inverse Gamma Accuracy Issue.
Bit: When I enter the following into wolfram alpha:
1.5=gamma(x) solve for x, I get an answer: 2.6627663453201472954412987

When I enter 1.5 and use the compiled inverse gamma function on the wp34sgui_Bit.exe eumulator, in double precision mode I get 2.705794548255370.

If I use the same compiled wp34s.dat image on the QT emulator I get the same answer as wolfram alpha.
Why is that?

My builds support a few extra instructions and because of that the instruction codes are different and wp34s.dat files that contain programs aren't compatible with the official versions. You'd need another wp34s.op file to be able to compile the source correctly for my builds.

There's no secret sauce: You can apply my patches to a source tree, recompile everything and generate a wp34s.op file. But I'll include it in my next release to make it easy.


RE: Faster inverse gamma and factorial for the WP 34S - BarryMead - 02-06-2015 08:09 AM

(02-06-2015 07:58 AM)Bit Wrote:  You'd need another wp34s.op file to be able to compile the source correctly for my builds.
Thanks again Bit. I was puzzled by the discrepancy. I look forward to your next build with
the custom opcode file.
In the mean time, I stepped through the compiled program to find where it differed from the source listing and corrected the offending opcode.
It was the "DBL?". Which got mis-compiled into "DBLOFF". When I fixed that one bad instruction, the program works to full accuracy.


RE: Faster inverse gamma and factorial for the WP 34S - BarryMead - 02-06-2015 02:45 PM

(02-06-2015 07:58 AM)Bit Wrote:  You'd need another wp34s.op file to be able to compile the source correctly for my builds.

Bit: Does this mean that the wp34s-lib.dat file is special for your compiles as well. If so could you please include that file in the next build as well as the custom wp34s.op opcode file?

With both of these files included we can run the wp34sgui_Bit.exe program and it will behave exactly
like the calc_xtal_full.bin realbuild. And we would be able to add/remove library routines to/from
the standard Bit build.

Eventually I would LOVE to duplicate your full toolchain setup on my Linux system. I know that is a big undertaking, but with it I could contribute more to the community.

So far I have not been able to get the
yagarto-bu-2.20.1_gcc-4.6.0-c-c++_nl-1.19.0_gdb-7.2_eabi_20110328.exe
which installs to do anything under wine on my linux system once installed.
I am not sure what have done wrong. Thanks ever so much for all of your valuable time and patience.

Barry


RE: Faster inverse gamma and factorial for the WP 34S - Bit - 02-07-2015 04:57 PM

(02-06-2015 02:45 PM)BarryMead Wrote:  
(02-06-2015 07:58 AM)Bit Wrote:  You'd need another wp34s.op file to be able to compile the source correctly for my builds.

Bit: Does this mean that the wp34s-lib.dat file is special for your compiles as well. If so could you please include that file in the next build as well as the custom wp34s.op opcode file?
Yes and yes: I'll include those two files in my next builds. I didn't in the past because I didn't think the emulator would be used for anything but quickly testing the new features.

(02-06-2015 02:45 PM)BarryMead Wrote:  Eventually I would LOVE to duplicate your full toolchain setup on my Linux system. I know that is a big undertaking, but with it I could contribute more to the community.
It's not very difficult actually, but this is not the right thread to discuss it. I've sent you a message.


RE: Faster inverse gamma and factorial for the WP 34S - Dieter - 02-07-2015 07:34 PM

(12-04-2014 06:14 AM)Bit Wrote:  I noticed that the inverse gamma library routine included with the 34S (discussed here) was quite slow for small inputs. I've added a very simple second algorithm that uses the secant method and converges faster for small inputs: < ~2.3 in single precision mode and < ~1e17 in double precision mode. For example, 1.6 vs 5.3 seconds in single precision mode or 2.1 vs 13.5 seconds in double precision mode with 0.89 as the input. While it's a noticeable improvement, it's probably far from optimal since I'm not an expert on numerical algorithms, so I'd be interested in better versions.

The standard lib that comes with the 34s firmware contains a digamma routine - so why not use the obvious and setup a simple Newton solver for Γ-1(a):

x := x + (a/Γ(x) – 1) / Ψ(x)

The idea of using Ψ was already mentioned in the 2013 thread, but the solution posted there replaced Ψ with an approximation. Using "the real thing" seems to work better, at least for small arguments.

Since the digamma routine does not preserve the stack (at least the current version doesn't) two registers are used. R0 stores the input a, R1 holds the current approximation. The code is simple and straightforward:

Code:
LBL"IGM"
STO 00
#pi
#086
/
+
CNST 87   // that's sqrt(2*pi)
/
LN
RCL X
#eE
/
Wp
/
#1/2
+
LBL 00
STO 01
XEQ'Ψ'
RCL 00
RCL 01
Γ
-
RCL/L
RCL/Y
RCL 01
ENTER
RCL+Z
CNVG? 03
RTN
GTO 00

Timings:
An input of a = 0,89 returns the result in 1,9 (SP) resp. 2,7 (DP) seconds.

The above code is a first quick and dirty version, so take care. But it seems to run almost as fast as your version while it's much more compact and needs no separate handling of small or large arguments.

Dieter


RE: Faster inverse gamma and factorial for the WP 34S - Bit - 02-08-2015 01:51 AM

(02-07-2015 07:34 PM)Dieter Wrote:  The standard lib that comes with the 34s firmware contains a digamma routine - so why not use the obvious and setup a simple Newton solver for Γ-1(a):

x := x + (a/Γ(x) – 1) / Ψ(x)

That's a good idea, thank you. When I have the time, I'll see if this method is any faster if used with the built-in digamma routine (which is a compile time option, not available by default).


RE: Faster inverse gamma and factorial for the WP 34S - BarryMead - 02-08-2015 02:07 AM

(02-08-2015 01:51 AM)Bit Wrote:  That's a good idea, thank you. When I have the time, I'll see if this method is any faster if used with the built-in digamma routine (which is a compile time option, not available by default).
If you combine the source code for the Digamma function with Dieter's Inverse Gamma function, the combined program is longer than Bit's original Inverse Gamma function (in RAM memory steps). It might be worth the extra memory space if it is significantly faster. But since it is not part of the standard flash library, it is not really fair to say it is a shorter program.


RE: Faster inverse gamma and factorial for the WP 34S - Paul Dale - 02-08-2015 07:02 AM

If the digamma route is interesting and custom images are acceptable, I've implemented this function both in xrom (trunk/xrom/digamma.wp34s) and native C (trunk/unused/digamma.c).

- Pauli


RE: Faster inverse gamma and factorial for the WP 34S - Dieter - 02-08-2015 03:34 PM

(02-08-2015 02:07 AM)BarryMead Wrote:  If you combine the source code for the Digamma function with Dieter's Inverse Gamma function, the combined program is longer than Bit's original Inverse Gamma function (in RAM memory steps).

Yes, that's true of course. But you can also include the relevant parts of the current digamma function into the inverse gamma routine. As far as I can tell, only positive arguments occur, and also some overhead in digamma can be removed. The final result may look like this:

Code:
LBL"IGM"
STO 00
#pi
#086
STO 04
/
+
CNST 87   // that's sqrt(2*pi)
/
LN
RSD 05    // make sure x>=-1
RCL X
#eE
/
RSD 15    // and x > -1/e
Wp
/
#1/2
+
LBL 00
STO 01
Clx
STO 02
#008
RCL 01
1/x
STO+02
x<> L
INC X
x<?Y
BACK 005
1/x
STO 03

FILL
#020
/
#021
1/x
-
*
#001
SDR 001
+
*
DEC X
*
#012
/
RCL 03
LN
-
#1/2
RCL*03
-
RCL-02
RCL 00
RCL 01
Γ
-
RCL/L
RCL/Y
RCL 01
ENTER
RCL+Z
CNVG? 03
RTN
DSZ 04
GTO 00
ERR 20

This version has 69 lines and it runs even faster. Γ-1(0,89) now is returned in 1,6 (SP) resp. 2,3 (DP) seconds.

The code also addresses a problem for arguments that are too close to the Gamma minimum at 0,88560 31944 10888 70027 88159 00582 58873 32... Here the current inverse Gamma implementation throws an error (Lambert's W is undefined for x<–1/e). This is handled by the two RSDs.

Arguments very close to the mentioned Gamma minimum still take longer, but they return a result instead of an error message. That is: they do, except in some extremely close cases. Γ-1(0,88560 31944 10888 70027 88159 1) still works, but accuracy may be limited. Arguments even closer will not converge within reasonable time, so the code is limited to 86 iterations. Otherwise a "No root found" error is generated. In this case X and Y hold the two last approximations and the last adjustment is in Z, so that the user at least gets a result along with a rough impression of its accuracy.

Dieter


RE: Faster inverse gamma and factorial for the WP 34S - BarryMead - 02-08-2015 05:04 PM

(02-08-2015 03:34 PM)Dieter Wrote:  This version has 69 lines and it runs even faster. Γ-1(0,89) now is returned in 1,6 (SP) resp. 2,3 (DP) seconds.
Wow Dieter that is nice. I am going to compile and experiment with this one. Thanks!