Threaded Mode | Linear Mode

Bit · (This post was last modified: 02-06-2015 05:30 AM by Bit.)

I noticed that the inverse gamma library routine included with the 34S (discussed here) was quite slow for small inputs. I've added a very simple second algorithm that uses the secant method and converges faster for small inputs: < ~2.3 in single precision mode and < ~1e17 in double precision mode. For example, 1.6 vs 5.3 seconds in single precision mode or 2.1 vs 13.5 seconds in double precision mode with 0.89 as the input. While it's a noticeable improvement, it's probably far from optimal since I'm not an expert on numerical algorithms, so I'd be interested in better versions.

Code:

LBL'![^-1]'           // Inverse factorial

  LocR 10

  SF .00

LBL'[GAMMA][^-1]'     // Inverse gamma

  LocR 10

  FS?S D       // Set flag D and save old status

    SF .01

  STOM .01

  SSIZE8

  STOS .02     // Input in .02

  // Estimate inverse gamma

  # [pi]

  STO Z        // Save for later use (SP limit)

  # 86

  /

  STO T        // Save for later use (max.error)

  RCL+ Y

  # [pi]

  STO+ X

  SQRT

  /

  LN

  STO .00

  # eE

  /

  W[sub-p]

  STO/ .00

  # 20

  x[<->] .00   // Loop counter in .00

  # 1/2

  +

  DBL?

    SKIP 1

  SKIP 2

  x[<=]? .00   // Reuse loop counter for another purpose

    GTO'SM'    // DP: ~result <= 20

  x[<=]? T

    GTO'SM'    // SP: ~result <= 3.14

  // Find precise result for large arguments using

  // the original invgamma code from the 34S library.

LBL'LL'        // Loop for large arguments

  [<->] XZZX

  GAMMA        // X=g(Xn-1), Y=T=input, Z=Xn-1

  /

  +/-

  INC X

  RCL Z

  LN

  RCL* L

  STO+ X

  DEC X

  RCL/ T

  /

  STO+ X

  [<->] XZZY

  -            // X=Xn, Y=Xn-1, Z=input

  DSZ .00      // Safeguard against slow convergence

    CNVG? 3

      GTO'RX'  // Return X

  GTO'LL'      // Loop

  // Find precise result for small arguments using the secant method

LBL'SM'

  [<->] XXXX

  GAMMA

  RCL- .02

  [<->] XYXY

  SIGN

  RCL* A       // Max error: we need ~ 0.027, close enough

  -            // X=Xn-1, Y=f(Xn-2), Z=Xn-2

LBL'SL'        // Loop for small arguments

  [<->] XXYZ

  GAMMA

  RCL .02

  DSZ .00      // Safeguard against slow convergence

    CNVG? 3

      GTO'RZ'  // Return Z

  -            // X=f(Xn-1), Y=Xn-1, Z=f(Xn-2), T=Xn-2

  STO- Z

  RCL Y        // X=Xn-1, Y=f(Xn-1), Z=Xn-1, T=f(Xn-2)-f(Xn-1), A=Xn-2

  RCL- A

  RCL/ T

  SPEC?        // T=f(Xn-2)-f(Xn-1) was zero

    GTO'RZ'    // Return Z

  RCL* Y

  RCL+ Z       // X=Xn, Y=f(Xn-1), Z=Xn-1

  GTO'SL'      // Loop

LBL'RZ'        // Return value in Z

  x[<->] Z

LBL'RX'        // Return value in X

  FS? .00      // Called as inverse factorial?

    DEC X

  x[<->] .02

  STO L

  RCLS .02

  FC? .01      // Restore flag D

    CF D

  RCLM .01

  END

Edit: Fixed code.