Threaded Mode | Linear Mode

Dieter · 08-23-2015, 09:01 PM

(08-22-2015 11:51 PM)lcwright1964 Wrote: Getting back to the original topic, I have used that very same Maple worksheet and Pugh's recommendations to give this n = 4 (i.e. 5 coefficient) version for a 10-digit FOCAL environment:

(0.3264892413e-1 + 1.188688981/z - 1.068410309/(z+1) + .1887119902/(z+2) - 0.2745021709e-2/(z+3)) * ((z+ 3.840881909)/exp(1))^(z-1/2)

In an ideal situation (i.e., a couple of guard digits kicking around) this is supposed to give between 9.5 and 11 edd in the z = 0 to 70 range

I admit I do not have the means nor the mathematical skills to determine a set of coefficients optimized for the domain z = 0...70 or 71, but I tried a simplified approach in that I assumed the last coefficient 3,840881909 to be "exact". ;-)

This way a simple linear equation system can be used to get an optimized set of constants for the desired range, and indeed the results are similar to what you posted. The optimized coefficients now yield a max. relative error within ±4 E–11. Which translates to "10,3 edd" or better. Compared to the original data set this means almost one more valid digit.

However, this accuracy level requires constants with more than 10 significant digits. With coefficients rounded to 10 digits a relative error within ±7 E–11 is possible. As you already mentioned, on a 10-digit calculator like the '41 or '67 of course the effective accuracy is less since the whole calculation is done with merely 10 digits.

Now maybe you or someone else may setup a "really" optimized approximation including the final 3,84... coefficient. ;-)

Dieter

lcwright1964 · (This post was last modified: 08-24-2015 08:28 AM by lcwright1964.)

(08-23-2015 09:01 PM)Dieter Wrote: Now maybe you or someone else may setup a "really" optimized approximation including the final 3,84... coefficient. ;-)

Dieter, I have tracked down much of my gamma-for-calculators musings, and am blown away to note that I did much of this nearly a decade ago--indeed, I have found emails with Glen Pugh and Viktor Toth from 2006! Feels like yesterday.

At any rate, I think I have come to some conclusions but have never systematically set them out. Here are some.

1. A Lanczos approach, using either the original formula of Lanczos himself, other derivations due to Viktor Toth and Paul Godfrey, and optimized versions using Glen Pugh's research like I have attempted here, just cannot "shine" unless one can take advantage of a calculator's full internal precision to store constants and carry on interim calculations to more digits than ultimately get displayed. This is why Angel's M-CODE implementation is so good. It does this. Lanczos does away with the need for the Pochhammer shift-and-divide for small arguments, and it therefore saves space and time. However, accessing and using those internal guard digits are crucial, because the constants themselves are rounded approximations and the whole thing begins with rounding error even before the first calculation is made. So I don't think I will go ahead and program my 10-digit Lanczos on the 41 or 67/97. It is a curiosity, but the performance will disappoint compared to other options. (Afterthought--I did manage a SysRPL Lanczos gamma for HP49g+/HP50G years ago that used internal extended precision, but it was redundant as the calculator's built in gamma was fine. My incomplete gamma implementation was more original, but I digress...)

2. Spouge seems tempting for calculators because it is Lanczos-like in its performance for small arguments, and one computes coefficients simply and on the fly. However, it needs more terms, the coefficients calculated get truncated to 10 digits (on the 41 and 67) before they are used, and all of the rounding issues come into play. It would do better in MCODE or SysRPL, but would take longer than Lanczos unless one pre-computed and stored the coefficients. And if one is going to go to that trouble one would opt for Lanczos, which needs fewer terms to begin with.

3. Peter Luschny's page discourages the use of Lanczos and various takes on the Stirling series, preferring convergent continued fraction approaches like Stieltjes and his own variants. I have actually worked with a Stieltjes CF up to four terms, and it is easy to program and seems to get nine digits reliably on the 41. However, the Pochhammer shift-and-divide is still needed for small arguments, and in my case the cut off seems to be 6 or 7. It is easy to program as the coefficients are small fractions that can be expressed exactly.

4. This all said, on the HP41 I still have not been able to better JM Baillard's GAM+ implementation using five terms of the Stirling series for lnGamma and a shift-divide cut-off of 5. He cleverly factors and rearranges things to keep the constants small and exact, and in my experience he gets 10 digits much of the time and when there is just 9 the last is off by 2 or 3 ULP at most. He also cleverly sends the code to FACT when the argument is an integer, giving exact results in those cases. I have been tempted to try extending his work by one term, to the one with 691/360360 as coefficient, but I really don't know if I will improve on things much aside from lowering the shift-divide cut-off slightly, at the price of taking longer to compute the series every time.

5. I have also discovered that programming gamma or factorial for these calculators raises some questions about how to best implement exponentiation and how to handle the factors out in front. Should one just take the logarithm of everything, add up, and exponentiate the lot just once? Or just some of it, with certain of the terms added to the series before exponentiation and the rest computed directly and multiplied in? And if so, which ones? I have learned that fiddling with this has impacts on rounding that can change answers and degrade performance, and I don't know for sure what is best, since the LN, e^x, SQRT, and y^x functions that crop up at this step introduce their own rounding error when they are used in interim calculations. Baillard, without comment, seems to have hit upon something, since his handling seems better than any tweaking I do. Moreover, one has to introduce some rearrangement or scaling to avoid overflow even when the final result is within range. One approach I have seen is computing the z^z factor (or similar) that crops up as (z^(z/2))^2, so that one avoids overflow when z > 56 or so, but that means more interim results truncated on being sent to the stack, and more propagated rounding error. It really can get complicated!

I think what I am concluding is that Namir's implementation of Spouge here, and our efforts to refine it, is an excellent academic exercise but really Spouge is not the best way to "do" gamma for our old friends the 41 and 67/97. MCODE that accesses three extra digits will improve accuracy, and I agree that Angel's choice of a straight-forward Lanczos approach with constants represented internally to more than 10 digits of precision (Viktor give 12 digits) should work well, and I am keen to know if a formula optimized based on Glen Pugh's work can fair even better. But there would be no practical use for a Spouge MCODE version, because the on-the-fly coefficient calculations take time, even if the results are more accurate. Of course a Stieltjes CF or Stirling series approach, taken to enough terms, would give good results too in MCODE, but the shift/divide requirement for small arguments adds execution time. So overall it seems that doing what Angel has done is the way to go there, and I wish I knew how to program MCODE so I could see how my own version fares--it sure does well in Mathematica, even when I experiment with constraining working precision.

As for FOCAL on the 41 and keystroke programming the 67? Well, I can't improve on JMB's GAM+ unless I best understand and appreciate it, so I will leave it alone except for trying to port it stroke-for-stroke to the 67/97 (as much as that is possible). When done that I will post to the appropriate forum.

This raises the question: what use is Spouge? I can see it being useful in arbitrary precision environments where you determine the desired number of terms and compute the coefficients as you go based on desired precision and the easily computed error bound. But even still I don't think that is what most arbitrary precision implementations do. Luschny's page talks of algorithms due to Rutishauser and Akiyama-Tanigawa which compute Bernoulli numbers on the fly and as such produce a Stieltjes CF expansion as large or small as required depending on one's a priori desired number of correct digits. Mathematica and Maple, or course, give Gamma with complete accuracy based on one's desired accuracy, with the limitations being computer power and time. I wonder how they do it?

But that's another question for another forum.

Les

PS. I should share that a lot of my experimenting and testing of FOCAL code has used Thomas Okken's Free42 on several platforms, as RAW files created from text with hp41uc work there too. But Free42 displays 12-digits, not 10, and computes everything internally to up to 25 digits and doesn't truncate on display, so of course it makes my 41 programming efforts look better by nullifying rounding error. FWIW, I think Thomas went with Lanczos in programming Free42's built-in GAMMA, but I haven't looked at his source code in years...

lcwright1964 · (This post was last modified: 08-24-2015 09:28 AM by lcwright1964.)

(08-23-2015 09:01 PM)Dieter Wrote: Now maybe you or someone else may setup a "really" optimized approximation including the final 3,84... coefficient. ;-)

Not quite sure what you did to tweak things. I started with the 3.84... (which is actually 4.34 - .5) parameter as provided by Pugh's research--this is in an appendix to the thesis. The table values Pugh produced gives the values of the parameter that provides the best performance of the Lanczos formula with a given number of terms. However, Pugh examined not only real arguments, but allowable complex arguments too, and his reported max relative errors reflect this. So it could be that there is something with the same parameter and the same number of terms that does even better if one confines oneself to not only the positive real line, but the very specific subdomain of about 0 < z < 70. I don't know what you did to optimize my function as presented, but I would like to know how you went about it and what you got.

The theory behind where Pugh got his optimized parameters is discussed in the thesis and much of it sails over my head. Like I said above, I started with the parameter for my desired number of terms, run things through a Maple worksheet that ports Viktor Toth's C++ implementation of Godfrey's matrix approach to computing the coefficients (Pugh references this in the thesis), compute everything to an arbitrary number of digits, and in this case just present 10 digits as that is what would be used in FOCAL.

If you are curious about more digits with this number of terms, here is 16 digit precision to match the precision of Pugh parameter in full:

(0.3264892412943410e-1 + 1.188688981251149/z - 1.068410309377202/(z+1) + .1887119901913100/(z+2) - 0.2745021708565992e-2/(z+3)) * ((z + 3.840881908632582)/exp(1))^(z-1/2)

If curious you could try your optimization method on my n = 5 result, reproduced here with 16-digit coefficients:

(0.9446965718573752e-2 + 1.367185253712206/z - 1.839535625334429/(z+1) + .6838535775172448/(z+2) - 0.6542905498247415e-1/(z+3) + 0.6572071053708592e-3/(z+4)) * ((z + 5.081000210222555)/exp(1))^(z-1/2)

Finally, Angel uses an n = 6 version as calculated by Lanczos himself, but rearranged by Viktor Toth. Here is my n = 6 version using Pugh's suggested parameter, again expressed with 16-digit coefficients for your optimizing pleasure:

(0.2849626875039468e-2 + 1.519901663012889/z - 2.706750200734360/(z+1) + 1.554955033608236/(z+2) - .3214751818926898/(z+3) + 0.1886890596091082e-1/(z+4) - 0.1321092292846819e-3/(z+5)) * ((z + 6.279505747515941)/exp(1))^(z-1/2)

Glen Pugh continued to refine his work in subsequent years, and in 2006 shared with me an extended and corrected table of his parameters, up to n = 100. It doesn't include all of the columns of the original thesis table, but it does have the relevant ones. It is attached.

I am keen to learn about your strategy of optimizing these formulae for use on a specific real domain. If you are squeezing out an extra digit of accuracy, maybe we could generate something that is shorter for Angel's MCODE yet does just as well. If you have managed to produce an improved n = 4 formula with 10-digit constants that gives 10+ edd in the 0 to 70 domain, then perhaps Angel might have something to consider that offers real improvement over the excellent results his existing code already gets with an n = 6 formula using an arbitrary parameter that we know from Pugh's research is not optimal.

Let me know how you are doing this. You really have me curious! If you can start by sending me your optimized version of, say, my n = 4 result, that would really be fun to play with in Mathematica, particularly if you can produce one with more digits based on what I have given above.

Les

Dieter · (This post was last modified: 08-24-2015 07:39 PM by Dieter.)

Les, first of all thank you very much for sharing your insights regarding the Gamma function. There is a lot of interesting information in your post, and yes: the most crucial point is the limited working precision of the calculators we use.

(08-24-2015 09:23 AM)lcwright1964 Wrote: So it could be that there is something with the same parameter and the same number of terms that does even better if one confines oneself to not only the positive real line, but the very specific subdomain of about 0 < z < 70.

Right, that's the basic idea. I have been using such approximations (mostly rational ones) for other functions where the domain is limited by the calculator's working range. Take for instance the Normal quantile function. There is the well-known Hastings approximation which seems work for any argment greater than zero. But since the '41 only works down to 1E–99, the same (2;3) rational approximation may be used with a different set of coefficients optimized for exactly this working range. Et voilà: the resulting error is much lower and we can squeeze out another valid digit.

(08-24-2015 09:23 AM)lcwright1964 Wrote: I don't know what you did to optimize my function as presented, but I would like to know how you went about it and what you got.

OK, here we go.

First of all, my approximation determines the factorial instead of Gamma, so the argument is off by one. But the basic idea is the same for both cases. Simply take a look at the Lanczos approximation:

x! = [a0 + a1/(x+1) + a2/(x+2) + a3/(x+3) + a4/(x+4)] * ((x+1+c)/e)^(x+1-0,5)

There are six unknowns: a0...a4 and c, the constant in the second factor. This non-linear twist makes things a bit more complicated, since I have no tools for solving non-linear equation systems. So I simply decided to set c to the value you already metioned, i.e. c = 3,84088190863. This way the remaining coefficients can be determined quite easily – you can set up a simple linear equation system:

[a0 + a1 * 1/(x+1) + a2 * 1/(x+2) + a3 * 1/(x+3) + a4 * 1/(x+4)] = x! / ((x+4,84088190863)/e)^(x+0,5)

Do this for five different x, spread over the range 0...70, and you get a linear system with five equations in five unknowns. Which can be solved easily and gives five coefficients a0...a4.

Now plot an error graph and see where the error has its maxima and minima. Adjust your five nodes so that the error peaks even out and start over again. After some iterations the nodes were at 0, near 1.54 and 5.43, at 16.6 and 53. This way I got the following 12-digit result:

Code:

a0 =  3,2648 92414 09 E-02

a1 =  1,1886 88980 04

a2 = -1,0684 10297 87

a3 =  1,8871 19655 59 E-01

a4 = -2,7450 07080 40 E-03

With c = 3,84088190863 the relative error for 0!...70! oscillates nicely between plus and minus 3,9 E–11. Which means 10,4 edd.

(08-24-2015 09:23 AM)lcwright1964 Wrote: If curious you could try your optimization method on my n = 5 result, reproduced here with 16-digit coefficients:

I did all this in a simple Excel spreadsheet with standard 15-digit accuracy. My Gamma/Factorial implementation (the reference against which the error is calculated) has 12-13 valid digits, so all this is a bit of working at the limit. That's why I do not trust my results for the n=5 case, although the resulting coefficients are very close to the ones you posted. But at least I can say that another term will reduce the relative error in 0...70 to not much more than about ±10^–12.

(08-24-2015 09:23 AM)lcwright1964 Wrote: I am keen to learn about your strategy of optimizing these formulae for use on a specific real domain. If you are squeezing out an extra digit of accuracy, maybe we could generate something that is shorter for Angel's MCODE yet does just as well.

If you have managed to produce an improved n = 4 formula with 10-digit constants that gives 10+ edd in the 0 to 70 domain, then perhaps Angel might have something to consider that offers real improvement over the excellent results his existing code already gets with an n = 6 formula using an arbitrary parameter that we know from Pugh's research is not optimal.

What I can say is, that n=4 and 12-digit coefficients allow for a relative error less than ±4 E–11 (if evaluated exactly), so the 10-digit result should exact or maybe ±1 ULP off. Which seems to be comparable to the current Sandmath GAMMA implementation. One more term would probably yield 11 digits, or even twelve (if evaluated with sufficient precision).

(08-24-2015 09:23 AM)lcwright1964 Wrote: Let me know how you are doing this. You really have me curious! If you can start by sending me your optimized version of, say, my n = 4 result, that would really be fun to play with in Mathematica, particularly if you can produce one with more digits based on what I have given above.

Well, it's not magic. Just simple (really simple) math. ;-)

Dieter

lcwright1964 · 08-24-2015, 08:40 PM

(08-24-2015 07:32 PM)Dieter Wrote: Well, it's not magic. Just simple (really simple) math. ;-)

Eureka, I get it! You are basically applying some of the principles of the Remez algorithm in a manual way--i.e., you sample some points along the range in question, plot the error curve, and tweak things in the polynomial or rational approximation to smooth out the peaks and valleys. I haven't thought about this sort of stuff in ages. It frankly never occurred to me that one might take a certain rational or polynomial approximation and make it better this way. I always thought of this sort of minimax optimization as creating such formulae, not tweaking existing ones. Clever! Will try that myself.

And thank you for your kind words on my thoughts. I have taken some time off work this summer and the loss of structure has turned my sleep around, so I am up musing about this and other things at all odd hours.

Thanks for sharing your approach. I think I follow you very clearly, and it is most informative indeed.

Les

Dieter · (This post was last modified: 08-24-2015 11:19 PM by Dieter.)

(08-24-2015 08:40 PM)lcwright1964 Wrote: Eureka, I get it! You are basically applying some of the principles of the Remez algorithm in a manual way--i.e., you sample some points along the range in question, plot the error curve, and tweak things in the polynomial or rational approximation to smooth out the peaks and valleys.

Exactly. Usually I do these approximations even closer to the Remez method.

I now took another look at the n=4 coefficients, and guess what, it gets even better. I varied the c constant a bit and – for x=0...70 – the result did improve. With c=3,83 and 12-digit coefficients a relative error within ±2 E–11 is possible. That's 10,7 edd.

With 15-digit coefficients I just arrived at ±1,7 E–11. I wonder where we might get with an optimized set of all six coefficients, i.e. including c.

Dieter

lcwright1964 · 08-25-2015, 01:56 AM

(08-24-2015 11:14 PM)Dieter Wrote: Exactly. Usually I do these approximations even closer to the Remez method.

I have been looking at the MinimaxApproximation function in Mathematica which does this sort of thing. Alas, it seems to me it can only produce rational approximations in the strict sense--i.e. the numerator and denominator are each a polynomial in a certain argument.

With these Lanczos approximations, the series is in 1/z, 1/z+1, 1/z+2, etc., rather than just constant, z, z^2, z^3, etc. This is where Viktor Toth's rearrangement comes into play, to give

(z!/stuff in front)*Product(z+i, i=0..n) ~= polynomial in z of degree n

as the original approximation to be maximized.

Or I could just stay with the original form, as you have, and fiddle with things in a spread sheet Smile

Stuff to think about!

Les

Dieter · (This post was last modified: 08-25-2015 09:43 PM by Dieter.)

(08-25-2015 01:56 AM)lcwright1964 Wrote: With these Lanczos approximations, the series is in 1/z, 1/z+1, 1/z+2, etc., rather than just constant, z, z^2, z^3, etc. This is where Viktor Toth's rearrangement comes into play, to give

(z!/stuff in front)*Product(z+i, i=0..n) ~= polynomial in z of degree n

as the original approximation to be maximized.

All I know is that Mathematica and Maple (and most probably also others) offer some tools for rational and other approximations. But I do not have access to such software so I cannot say more about this.

However...

(08-25-2015 01:56 AM)lcwright1964 Wrote: Or I could just stay with the original form, as you have, and fiddle with things in a spread sheet

...this approach has its special advantages. You can taylor an approximation exactly to meet your specific needs. For instance you may want to have better accuracy over a certain interval while for the rest a somewhat less accurate result will do. Or you may define a different error measure, for instance a max. number of ULPs the approximation can be off. That's what I like about the manual approach.

Addendum:

Les, you wanted an approxmation for n=5. I did some calculations on Free42 with 34-digit BCD precision. The exact coefficients are not fixed yet, but with c=5.081 it looks like the relative error for x=0...70 is about ±4 E–13.

Dieter

lcwright1964 · 08-25-2015, 11:29 PM

Hey Dieter,

This is excellent. I have learned that Free42 is an excellent place to observe the behaviour of HP41 programs. The programs are completely compatible without modification, 12-digits are displayed, and everything is done internally to 25 digits (not 34 like you say--that is DP mode in the WP34s). Indeed, working with JMB's GAM+ routine I have concluded there is likely little benefit to adding a term to the Stirling series. With ample available precision on Free42 GAM+ unedited computes the Gamma of 5.000000001, just a tad about the shift-divide cutoff, to about 10.4 edd. A further term can't improve on this on the HP41--indeed, even more computations will add to rounding error. And there is little to be gained by lowering the shift-divide cutoff, as adding a term likely adds more operations than what might be saved by lowering the cutoff.

For your Lanczos formula your reciprocals start at 1/z+1 instead of 1/z in my versions. Does this permit you to make sure the error curve passes through z = 0?

Les

Dieter · 08-26-2015, 09:50 AM

(08-25-2015 11:29 PM)lcwright1964 Wrote: This is excellent. I have learned that Free42 is an excellent place to observe the behaviour of HP41 programs. The programs are completely compatible without modification...

I agree that Free42 is an excellent piece of software, but I would not recommend it for testing HP41 programs. The different precision and working range may and will lead to different results. For instance numbers may not round to zero while they do on a physical '41, so x=0? and x=y? tests may behave differently and iterations may not terminate. Since the '42 supports complex arithmetics, some operations (e.g. the log of a negative number) will return a complex result instead of an error message (which may be the intended behaviour of the original HP41 program). The same is true for the larger working range for which the program may not have been designed. In general, Free42 does not use HP code, so possibly even simple 12-digit results may differ from those of a hardware '42.

But of course there is a solution: V41 essentially behaves like a real HP41 and so I think this is the best option for testing programs for this calculator.

(08-25-2015 11:29 PM)lcwright1964 Wrote: ...12-digits are displayed, and everything is done internally to 25 digits (not 34 like you say--that is DP mode in the WP34s).

This may be true for the original Free42, but there also are newer versions that use 34 digit BCD arithmetics ("Free42 Decimal"). And so indeed 3 [sqrt] [x²] 3 [–] returns –1E–33. Take a look at the Free42 website, where it says:

"Free42 Decimal uses the Intel Decimal Floating-Point Math Library; it uses IEEE 754-2008 quadruple precision decimal floating-point, which consumes 16 bytes per number, and gives 34 decimal digits of precision, with exponents ranging from -6143 to +6144."

(08-25-2015 11:29 PM)lcwright1964 Wrote: Indeed, working with JMB's GAM+ routine I have concluded there is likely little benefit to adding a term to the Stirling series. With ample available precision on Free42 GAM+ unedited computes the Gamma of 5.000000001, just a tad about the shift-divide cutoff, to about 10.4 edd. A further term can't improve on this on the HP41--indeed, even more computations will add to rounding error. And there is little to be gained by lowering the shift-divide cutoff, as adding a term likely adds more operations than what might be saved by lowering the cutoff.

This essentially matches my results. I think the only way to get 10 digits out of an HP41 is 13-digit MCode. Here my approximations may be useful since they deliver the same accuracy level as the current Sandmath GAMMA implementation, but with two terms less. I tried the n=4 approximation on my trusted 12-digit HP35s and I got 10 valid digits, here and there the last one rounded the wrong way (i.e. ±1 ULP). Maybe Angel is listening. ;-)

BTW I now have an optimized approximation for n=5. The largest relative error for x = 0...70 (i.e. 1...71 for Gamma) now is ±3,9 E–13. I did this on Free42 with 34-digit support, so I'm not sure what the required working precision might be. I guess something like 16 digits would get you more or less close to this.

(08-25-2015 11:29 PM)lcwright1964 Wrote: For your Lanczos formula your reciprocals start at 1/z+1 instead of 1/z in my versions. Does this permit you to make sure the error curve passes through z = 0?

That's simply because my approximations calculate x! resp. Gamma(x+1). Yes, the error is zero at z=0, but that's simply because that's one of the nodes in the Remez simulation. ;-)

Dieter

lcwright1964 · 08-26-2015, 12:50 PM

(08-26-2015 09:50 AM)Dieter Wrote:
(08-25-2015 11:29 PM)lcwright1964 Wrote: This is excellent. I have learned that Free42 is an excellent place to observe the behaviour of HP41 programs. The programs are completely compatible without modification...

I agree that Free42 is an excellent piece of software, but I would not recommend it for testing HP41 programs. The different precision and working range may and will lead to different results....

But of course, and that's kind of my point. What I should have been clear about is that it lets me test various Gamma implementations, written in FOCAL, in a high precision environment where 12 digits--two more--are show. So I can play around with extra terms and shift-divide cutoffs and see where it is I get 10 digit accuracy where this is ample working precision. This is important to know, because when execution is confined to the 10 digit environment of the HP41 trying to squeeze out another digit by extending by an extra term or altering the shift/divide threshold (where relevant) will NOT improve things, and indeed more computations will be done that will add to rounding error. That is what I meant.

For example, in the GAM+ program of JMB that I love so much, it is evident that he worked it up in a higher precision environment, because the number of terms used and the selection of 5 as the shift/divide threshold gives at LEAST 10 good digits when there is ample working precision. This will be degraded somewhat in the HP41 10 digit environment, but adding another term, as contemplated, will NOT improve it there. It will give more digits when they are available, such as on the Free42, but won't confer any benefit on the HP41. That's what I mean.

And thank you for correcting me on the number of internal digits in Free42 Decimal. It used to be about 25-digit precision when Thomas used the BCD-10000 library. I didn't know he switched to the Intel quadruple precision library. I do suspect that not all of the Free42's functions have been extended to the higher precision. For example, I have a little program that computes the upper and lower incomplete gammas to machine convergence, and can inefficiently use them as a gold standard for the "complete" gamma--e.g., Gamma[5.5] = UpperIncompleteGamma[5.5,5.5] + LowerIncompleteGamma[5.5,5.5]. (I know this is NOT an efficient way to compute Gamma, since these incomplete gamma routines compute series and continued fractions to dozens of terms to machine convergence, but it is fast enough in Free42 at native processor speed, yet would take much longer on a real 41C or 42S.) Testing this against the Free42's built in GAM shows the latter to do no better than about 23 or 24 digits for a few arguments, so the older code is still there despite the more precise quadruple precision numbers. Practically speaking this makes no difference, and I suspect Thomas will leave it alone. The good thing about Free42's high internal precision is that you know that what you see in the display to 12 digits is completely correct (and then some) provided the algorithms are correct.

Les

Dieter · (This post was last modified: 08-27-2015 06:08 AM by Dieter.)

(08-26-2015 12:50 PM)lcwright1964 Wrote: But of course, and that's kind of my point. What I should have been clear about is that it lets me test various Gamma implementations, written in FOCAL, in a high precision environment where 12 digits--two more--are show. So I can play around with extra terms and shift-divide cutoffs and see where it is I get 10 digit accuracy where this is ample working precision.

OK – that's of course a useful application. I often do this kind of stuff with the 34s. Here you can even choose between 16 and 34 digits. ;-)

(08-26-2015 12:50 PM)lcwright1964 Wrote: For example, in the GAM+ program of JMB that I love so much, it is evident that he worked it up in a higher precision environment, because the number of terms used and the selection of 5 as the shift/divide threshold gives at LEAST 10 good digits when there is ample working precision. This will be degraded somewhat in the HP41 10 digit environment, but adding another term, as contemplated, will NOT improve it there. It will give more digits when they are available, such as on the Free42, but won't confer any benefit on the HP41. That's what I mean.

That's essentially the same point I mentioned in a post earlier in this thread regarding the Spouge method, when I said that a=7 instead of 12.5 will yield the same or even better accuracy on a real '41.

(08-26-2015 12:50 PM)lcwright1964 Wrote: And thank you for correcting me on the number of internal digits in Free42 Decimal. It used to be about 25-digit precision when Thomas used the BCD-10000 library. I didn't know he switched to the Intel quadruple precision library. I do suspect that not all of the Free42's functions have been extended to the higher precision.
(...)
Testing this against the Free42's built in GAM shows the latter to do no better than about 23 or 24 digits for a few arguments, so the older code is still there despite the more precise quadruple precision numbers.

Ah, yes, this may be possible. This is were the WP34s performs better. Usually you get 30+ digits in DP mode. Although only the 16 digits in SP mode are granted.

Finally, here are the coefficients for the n=5 case, optimized for x=0..70 (resp. 1...71 for Gamma). They differ slightly from what you posted, but the largest relative error here is as small as 3,75 E–13.

Code:

 c =  5,081

a0 =  9,44696770446255924982608 E-3

a1 =  1,36718522540552597224102

a2 = -1,83953548282705973597554

a3 =  6,83853459467545887340920 E-1

a4 = -6,54290299376343116198247 E-2

a5 =  6,57205667880559284049346 E-4

The values are rounded to 24 digits, but I think 16 should do. I did a short test on a WP34s in SP mode and it looks like the error stays below 3,8 E–13.

Edit: I did a final iteration which, for c = 5,081 and sufficient working precision, gave a max. rel. error of 3,7491 E–13 over the mentioned interval.

Dieter

Dieter · 08-27-2015, 11:21 PM

(08-24-2015 09:23 AM)lcwright1964 Wrote: If you can start by sending me your optimized version of, say, my n = 4 result, that would really be fun to play with in Mathematica, particularly if you can produce one with more digits based on what I have given above.

OK, this is what I got until now for the n=4. I think this should be close to the optimum. If evaluated exactly, the maximum relative error for x = 0...70 is 1,599 E–11. But also with 13 digits in MCode the result should be sufficiently close to this.

Here are two sets of coefficients with 16 resp. 12-13 digits:

Code:

      16 digits                         12-13 digits

---------------------------------------------------------

 c =  3,8306414                    c =  3,8306414

a0 =  3,298498349730427 E-2       a0 =  3,29849834973 E-2

a1 =  1,187103291606358           a1 =  1,187103291606

a2 = -1,062638608755227           a2 = -1,062638608755

a3 =  1,860986212445602 E-1       a3 =  1,86098621245 E-1

a4 = -2,629985803711166 E-3       a4 = -2,62998580380 E-3

Dieter

Dieter · (This post was last modified: 09-19-2015 05:34 AM by Dieter.)

(08-27-2015 11:21 PM)Dieter Wrote: OK, this is what I got until now for the n=4. I think this should be close to the optimum. If evaluated exactly, the maximum relative error for x = 0...70 is 1,599 E–11.

So that's 1...71 for the Gamma function.

Since at least negative arguments require the reflection formula, the approximation should handle values down to 0.5, i.e. where x = 1–x. So I set up another approximation which was optimized for 0.5≤x≤71. Here are the new coefficients. 10 to 13 digits are enough to keep the error within ±3.2 E–11.

Code:

 c =  3,838

a0 =  3,27431510667 E-02

a1 =  1,188242945545

a2 = -1,0667849602

a3 =  1,87974297577 E-01

a4 = -2,712278856 E-03

This way arguments below 0.5 can be handled by the reflection formula.

And, for the record, here are the coefficients of a n=5 approximation for the same domain (0,5≤x≤71). Evaluated exactly, the relative error is within ±4,2 E–13. At x=0,5 the error is close to zero. Within the mentioned error bounds the approximation may be used down to x=0,057.

Code:

 c =  5,081

a0 =  9,4469677044570384442 E-03

a1 =  1,3671852254158201121

a2 = -1,8395354829397881396

a3 =  6,8385345981848506558 E-01

a4 = -6,5429030358398827535 E-02

a5 =  6,5720584070280550576 E-04

These are 20 digits, but a few less should do either. ;-)

Dieter