Threaded Mode | Linear Mode

lcwright1964 · (This post was last modified: 08-24-2015 08:28 AM by lcwright1964.)

(08-23-2015 09:01 PM)Dieter Wrote: Now maybe you or someone else may setup a "really" optimized approximation including the final 3,84... coefficient. ;-)

Dieter, I have tracked down much of my gamma-for-calculators musings, and am blown away to note that I did much of this nearly a decade ago--indeed, I have found emails with Glen Pugh and Viktor Toth from 2006! Feels like yesterday.

At any rate, I think I have come to some conclusions but have never systematically set them out. Here are some.

1. A Lanczos approach, using either the original formula of Lanczos himself, other derivations due to Viktor Toth and Paul Godfrey, and optimized versions using Glen Pugh's research like I have attempted here, just cannot "shine" unless one can take advantage of a calculator's full internal precision to store constants and carry on interim calculations to more digits than ultimately get displayed. This is why Angel's M-CODE implementation is so good. It does this. Lanczos does away with the need for the Pochhammer shift-and-divide for small arguments, and it therefore saves space and time. However, accessing and using those internal guard digits are crucial, because the constants themselves are rounded approximations and the whole thing begins with rounding error even before the first calculation is made. So I don't think I will go ahead and program my 10-digit Lanczos on the 41 or 67/97. It is a curiosity, but the performance will disappoint compared to other options. (Afterthought--I did manage a SysRPL Lanczos gamma for HP49g+/HP50G years ago that used internal extended precision, but it was redundant as the calculator's built in gamma was fine. My incomplete gamma implementation was more original, but I digress...)

2. Spouge seems tempting for calculators because it is Lanczos-like in its performance for small arguments, and one computes coefficients simply and on the fly. However, it needs more terms, the coefficients calculated get truncated to 10 digits (on the 41 and 67) before they are used, and all of the rounding issues come into play. It would do better in MCODE or SysRPL, but would take longer than Lanczos unless one pre-computed and stored the coefficients. And if one is going to go to that trouble one would opt for Lanczos, which needs fewer terms to begin with.

3. Peter Luschny's page discourages the use of Lanczos and various takes on the Stirling series, preferring convergent continued fraction approaches like Stieltjes and his own variants. I have actually worked with a Stieltjes CF up to four terms, and it is easy to program and seems to get nine digits reliably on the 41. However, the Pochhammer shift-and-divide is still needed for small arguments, and in my case the cut off seems to be 6 or 7. It is easy to program as the coefficients are small fractions that can be expressed exactly.

4. This all said, on the HP41 I still have not been able to better JM Baillard's GAM+ implementation using five terms of the Stirling series for lnGamma and a shift-divide cut-off of 5. He cleverly factors and rearranges things to keep the constants small and exact, and in my experience he gets 10 digits much of the time and when there is just 9 the last is off by 2 or 3 ULP at most. He also cleverly sends the code to FACT when the argument is an integer, giving exact results in those cases. I have been tempted to try extending his work by one term, to the one with 691/360360 as coefficient, but I really don't know if I will improve on things much aside from lowering the shift-divide cut-off slightly, at the price of taking longer to compute the series every time.

5. I have also discovered that programming gamma or factorial for these calculators raises some questions about how to best implement exponentiation and how to handle the factors out in front. Should one just take the logarithm of everything, add up, and exponentiate the lot just once? Or just some of it, with certain of the terms added to the series before exponentiation and the rest computed directly and multiplied in? And if so, which ones? I have learned that fiddling with this has impacts on rounding that can change answers and degrade performance, and I don't know for sure what is best, since the LN, e^x, SQRT, and y^x functions that crop up at this step introduce their own rounding error when they are used in interim calculations. Baillard, without comment, seems to have hit upon something, since his handling seems better than any tweaking I do. Moreover, one has to introduce some rearrangement or scaling to avoid overflow even when the final result is within range. One approach I have seen is computing the z^z factor (or similar) that crops up as (z^(z/2))^2, so that one avoids overflow when z > 56 or so, but that means more interim results truncated on being sent to the stack, and more propagated rounding error. It really can get complicated!

I think what I am concluding is that Namir's implementation of Spouge here, and our efforts to refine it, is an excellent academic exercise but really Spouge is not the best way to "do" gamma for our old friends the 41 and 67/97. MCODE that accesses three extra digits will improve accuracy, and I agree that Angel's choice of a straight-forward Lanczos approach with constants represented internally to more than 10 digits of precision (Viktor give 12 digits) should work well, and I am keen to know if a formula optimized based on Glen Pugh's work can fair even better. But there would be no practical use for a Spouge MCODE version, because the on-the-fly coefficient calculations take time, even if the results are more accurate. Of course a Stieltjes CF or Stirling series approach, taken to enough terms, would give good results too in MCODE, but the shift/divide requirement for small arguments adds execution time. So overall it seems that doing what Angel has done is the way to go there, and I wish I knew how to program MCODE so I could see how my own version fares--it sure does well in Mathematica, even when I experiment with constraining working precision.

As for FOCAL on the 41 and keystroke programming the 67? Well, I can't improve on JMB's GAM+ unless I best understand and appreciate it, so I will leave it alone except for trying to port it stroke-for-stroke to the 67/97 (as much as that is possible). When done that I will post to the appropriate forum.

This raises the question: what use is Spouge? I can see it being useful in arbitrary precision environments where you determine the desired number of terms and compute the coefficients as you go based on desired precision and the easily computed error bound. But even still I don't think that is what most arbitrary precision implementations do. Luschny's page talks of algorithms due to Rutishauser and Akiyama-Tanigawa which compute Bernoulli numbers on the fly and as such produce a Stieltjes CF expansion as large or small as required depending on one's a priori desired number of correct digits. Mathematica and Maple, or course, give Gamma with complete accuracy based on one's desired accuracy, with the limitations being computer power and time. I wonder how they do it?

But that's another question for another forum.

Les

PS. I should share that a lot of my experimenting and testing of FOCAL code has used Thomas Okken's Free42 on several platforms, as RAW files created from text with hp41uc work there too. But Free42 displays 12-digits, not 10, and computes everything internally to up to 25 digits and doesn't truncate on display, so of course it makes my 41 programming efforts look better by nullifying rounding error. FWIW, I think Thomas went with Lanczos in programming Free42's built-in GAMMA, but I haven't looked at his source code in years...