Threaded Mode | Linear Mode

Dieter · 12-19-2015, 09:00 AM

(12-19-2015 01:39 AM)rprosperi Wrote:
(12-18-2015 11:26 PM)quantalume Wrote: I don't think anyone has any business doing problems with dof over 70 on a pocket calculator, let alone a vintage one. :-)

Ahh, I see you are new here. I'm afraid common sense about what folks want and expect out of pocket calculators does not apply here. Which of course is what makes it so interesting!

I remember a discussion regarding the WP34s statistical distributions where the question was how to handle degrees of freedom > 10000. And recently there was a thread with a complaint because of problems with a parameter of 1E+9. ;-)

Dieter

Dieter · 12-19-2015, 09:18 AM

(12-18-2015 11:26 PM)quantalume Wrote: I replaced the LBL 70 - 90 with LBL 02 - 05 (single byte). This reduced program length by 7 bytes and improved speed slightly..

I often choose two-byte LBLs and the corresponding 3-byte GTOs because they are able to record their target address for longer jumps. You know that once a GTO has found the LBL it jumps to, this position is recorded within the GTO command ("compiled GTO") so that subsequent branches are executed much faster. Here the 3-byte-versions allow for longer jumps than their 2-byte counterparts, so just to be sure I prefer the former.

In this program two-byte GTOs (resp. single-byte LBLs) may be sufficient, at least in most cases. What makes me wonder is your remark that this speeded up the program slightly. I would have expected no change at best.

Dieter

Namir · 12-19-2015, 02:37 PM

(12-18-2015 10:25 PM)Dieter Wrote:
(12-17-2015 12:52 AM)Namir Wrote: Here is a Visual Basic implementation for the cumulative distribution function for the F statistics. It is based on the Handbook of Mathematical Function.

Namir, thank you very much for this extensive implementation.
But why do you do it this complicated instead of a simple regularized Beta function? Which could be used for other distributions as well.

Could you try x=3 with 1 and 5 degrees of freedom? I tried your code in Excel VBA (with the usual syntax modifications for my older version) and got a result of 0,535... while the correct result is 0,856... The problem occurs for df1=1 and df2 odd. Maybe I did something wrong here?

BTW I like the Gamma implementation which seems very exact. It seems to use a polynomial approximation for x=0...1. For comparison, here's my humble Gamma(k) function which I use for the distributions – after all only integers and half-integers have to be handled:

Code:

If k = Int(k) Then g = 1 Else g = sqrt(Pi) Do While k > 1.25 k = k - 1 g = g * k Loop Gamma = g

;-)

Dieter

Thanks Dieter for the observation. I found the bug:

If Df2 > 1 And Df1 > 0 Then

And changed it (in the original thread) to:

If Df2 > 1 And Df1 > 1 Then

No I get the same result for Fcfd(1,5,3).

Namir

quantalume · 12-19-2015, 06:22 PM

(12-19-2015 09:18 AM)Dieter Wrote: I often choose two-byte LBLs and the corresponding 3-byte GTOs because they are able to record their target address for longer jumps. You know that once a GTO has found the LBL it jumps to, this position is recorded within the GTO command ("compiled GTO") so that subsequent branches are executed much faster. Here the 3-byte-versions allow for longer jumps than their 2-byte counterparts, so just to be sure I prefer the former.

In this program two-byte GTOs (resp. single-byte LBLs) may be sufficient, at least in most cases. What makes me wonder is your remark that this speeded up the program slightly. I would have expected no change at best.

You obviously know a lot more about 41C programming than I do. I only measured the execution speed difference one time, and it was something like 0.25 seconds out of 17. It may well have come down to human error. Now that I have F distribution on my calc, perhaps I should repeat the experiment multiple times and determine the significance level of my measurement. ;-)

My only wish now would be trying to squeeze the program onto a single card (224 bytes). It currently stands at 252 bytes with the 1-byte LBLs, and the only other economy that immediately stands out is replacing the alpha LBLs, which wouldn't be enough. However, your beta function is valuable enough on its own, so perhaps it should become a permanent part of my calculator's memory.

David

Dieter · 12-19-2015, 09:16 PM

(12-19-2015 06:22 PM)quantalume Wrote: It may well have come down to human error. Now that I have F distribution on my calc, perhaps I should repeat the experiment multiple times and determine the significance level of my measurement. ;-)

This would require a t-test – and maybe that's the solution for the following problem:

(12-19-2015 06:22 PM)quantalume Wrote: My only wish now would be trying to squeeze the program onto a single card (224 bytes). It currently stands at 252 bytes with the 1-byte LBLs, and the only other economy that immediately stands out is replacing the alpha LBLs, which wouldn't be enough.

Ah, the good old 224 byte limit – I remember the days when I used the card reader more often than I do now. I think we don't get this program below this limit without sacrificing relevant functionality. So the solution might be another one: expand it (!), so that it fills two cards. ;-)

I just did some slight improvements, it's about 200 lines now. But now that we have a regularized Beta function: what about adding Student's t distribution? It can be evaluated easily using the same Beta function. And it allows to test the significance of means two from different samples. ;-)

Two cards with Fisher's F and Student's t distribution don't sound like a bad idea to me. And later you may add two more cards for the Normal and Chi² distribution, both based on the regularized Gamma function.

Dieter

quantalume · 12-20-2015, 12:28 AM

(12-19-2015 09:16 PM)Dieter Wrote:
(12-19-2015 06:22 PM)quantalume Wrote: It may well have come down to human error. Now that I have F distribution on my calc, perhaps I should repeat the experiment multiple times and determine the significance level of my measurement. ;-)

This would require a t-test – and maybe that's the solution for the following problem:

Is the t-test not a special case of one-way ANOVA?

(12-19-2015 09:16 PM)Dieter Wrote: Two cards with Fisher's F and Student's t distribution don't sound like a bad idea to me. And later you may add two more cards for the Normal and Chi² distribution, both based on the regularized Gamma function.

It sounds like a very good plan, though I fear we are reinventing the wheel at some point. That's perhaps inevitable when working with vintage calculators. ;-) Do you know anything about Ángel's XMSTAT ROM? All I can find is a quick-reference guide.

Ángel Martin · 12-20-2015, 06:59 AM

(12-20-2015 12:28 AM)quantalume Wrote: It sounds like a very good plan, though I fear we are reinventing the wheel at some point. That's perhaps inevitable when working with vintage calculators. ;-) Do you know anything about Ángel's XMSTAT ROM? All I can find is a quick-reference guide.

The XMSTAT module is in flux at the moment. The QRG shows that it contains all those distributions (Chi2, Normal, T-Student, F-Snedecor, and Binomial) - and lots of other statistical functions. The Distributions part in particular is a mix of old and new code, and I'm torn about getting rid of the old code and replace it with the newer one (not Dieter's, that was even newer). On one hand I like the ingenuous approach taken in the older programs but the code takes too log and occupies much space.... so we'll see.

'AM

Dieter · 12-20-2015, 01:17 PM

(12-20-2015 06:59 AM)Ángel Martin Wrote: On one hand I like the ingenuous approach taken in the older programs

OK – I just realized the difference between ingenious and ingenuous. *8)
Now what exactly is this ingenuous approach in the older programs?

Dieter

Dieter · (This post was last modified: 12-20-2015 05:03 PM by Dieter.)

(12-19-2015 02:37 PM)Namir Wrote: Thanks Dieter for the observation. I found the bug:

If Df2 > 1 And Df1 > 0 Then

And changed it (in the original thread) to:

If Df2 > 1 And Df1 > 1 Then

No I get the same result for Fcfd(1,5,3).

Yes, this example now returns the correct result.
But then try df2=1 and df1 odd. The problem persists:
The CDF for x=3, df1=5, df2=1 is returned as 0,839 while the true result is 0,588...

Instead of all these nested if-then-else-elseif constructions for odd and even dof and various complicated trig-power-series: why don't you simply use the regularized Beta function? Simple, reliable, straightforward and easy to implement.

Dieter

Ángel Martin · 12-20-2015, 05:27 PM

(12-20-2015 01:17 PM)Dieter Wrote: Now what exactly is this ingenuous approach in the older programs?

The ability to do more (or the same) with less. Now it's easy for me to call 1/GMF, BETA or ERF as convenient MCODE functions in the arsenal but back then it was a more demanding task. As I say it's in flux, will probably end up going the "newer" way regardless... yielding to the advantages and so jettison the nostalgic component/

Namir · 12-20-2015, 11:15 PM

(12-20-2015 05:02 PM)Dieter Wrote: [
Yes, this example now returns the correct result.
But then try df2=1 and df1 odd. The problem persists:
The CDF for x=3, df1=5, df2=1 is returned as 0,839 while the true result is 0,588...

Instead of all these nested if-then-else-elseif constructions for odd and even dof and various complicated trig-power-series: why don't you simply use the regularized Beta function? Simple, reliable, straightforward and easy to implement.

Dieter

I am implementing the equations in Abramowitz & Stegun's handbook of math. Show me some VB code using the Beta function?

Namir

Dieter · (This post was last modified: 12-21-2015 07:56 PM by Dieter.)

(12-20-2015 11:15 PM)Namir Wrote: I am implementing the equations in Abramowitz & Stegun's handbook of math.

OK, what about A&S 26.6.2 then?
Here the Fisher CDF is defined by means of the regularized Beta function. ;-)

(12-20-2015 11:15 PM)Namir Wrote: Show me some VB code using the Beta function?

The following code uses the continued fraction method according to A&S 26.5.8. It is based on the modified Lentz algorithm as suggested in "Numerical Recipes in C", chapter 6.4, with some changes and simplifications. For fast convergence and improved accuracy x should be low, so either x = df2/(df2 + F*df1) is used, or 1–x = F*df1/(df2 + x*df1) with swapped parameters a and b.

I have split the functionality into four independent routines so that these can be used for other applications as well. For instance for Student's t or the Binomial distribution.

Note that here the upper tail integral Q(F) is calculated. For the lower tail P(F) simply exchange ibeta and 1–ibeta in the two if-then-else branches of FCDF.

Code:

Const Pi = 3.141592653589793

Function FCDFQ(ByVal F, ByVal df1, ByVal df2)

x1 = df2 / (df2 + F * df1)

x2 = F * df1 / (df2 + F * df1)

' calculate regularized Beta using min(x, 1-x)

If x1 < x2 Then

   FCDFQ = ibeta(x1, df2 / 2, df1 / 2)

Else

   FCDFQ = 1 - ibeta(x2, df1 / 2, df2 / 2)

End If

End Function

Function ibeta(ByVal x, ByVal a, ByVal b)

tiny = 1E-300   ' a value close to smallest floating point number

c = 1

d = 1 - (a + b) * x / (a + 1)

If (Abs(d) < tiny) Then d = tiny

d = 1 / d

h = d

m = 1

Do

   amm = a + m + m

   aa = m * (b - m) * x / ((amm - 1) * amm)

   d = 1 + aa * d

   If (Abs(d) < tiny) Then d = tiny

   c = 1 + aa / c

   If (Abs(c) < tiny) Then c = tiny

   d = 1 / d

   h = (d * c) * h

   aa = -(a + m) * (a + b + m) * x / (amm * (amm + 1))

   d = 1 + aa * d

   If (Abs(d) < tiny) Then d = tiny

   c = 1 + aa / c

   If (Abs(c) < tiny) Then c = tiny

   d = 1 / d

   adj = d * c

   h = h * adj

   m = m + 1

Loop Until (adj = 1) Or (m > 200)

If m > 200 Then

   ibeta = -1    ' flag error due to no convergence, should never be returned

Else

   ibeta = h * x ^ a * (1 - x) ^ b / a / betafn(a, b)

End If

End Function

Function hGamma(ByVal k)  ' evaluates Gamma for k=0.5, 1, 1.5, 2, 2.5, 3, ...

k = Int(2 * k + 0.5) / 2 ' round input to integer or half-integer

If k = Int(k) Then g = 1 Else g = Sqr(Pi)

Do While k > 1.25

  k = k - 1

  g = g * k

Loop

hGamma = g

End Function

Function betafn(ByVal v, ByVal w)

betafn = hGamma(v) * hGamma(w) / hGamma(v + w)

End Function

This is what I currently use in Excel for calculating the upper F integral.

Dieter

Namir · 12-21-2015, 09:52 PM

(12-21-2015 07:24 PM)Dieter Wrote: This is what I currently use in Excel for calculating the upper F integral.

Dieter

Thanks for your code. I will look at it tomorrow.

Two days ago, I found some interesting C code for F CDF in an old edition of Numerical Recipe. Today I looked at the third edition of Numerical Recipes and was very pleased to see a lot of code for probability distribution functions, Gamma, incomplete Gamma, inverse incomplete gamma, inverse error functions, and so on. It is a wonderful reference. I translated several functions into Visual Basic and tested them. They all did well.

Namir

Namir · 12-22-2015, 05:54 AM

Thanks Dieter. I also found the code for ibeta in Numerical Recipes. I like your compact function hGamma (for multiples of 0.5) because it works well the Beta function with arguments that are multiples of 0.5 which is well suited to calculate the F CDF function.

I found a lot of interesting function in Numerical Recipes (3rd ed) in the chapter that discusses the Beta, incomplete Gamma, and so on. I was surprised to experiment with the book's implementation of function Gammaln and see that exp(Gammaln(x)) gives good results for Gamma(x) even small values of x. Until now, I had the impression that Gammaln was good only for large values of x.

Namir

Dieter · (This post was last modified: 12-22-2015 07:01 AM by Dieter.)

(12-22-2015 05:54 AM)Namir Wrote: I found a lot of interesting function in Numerical Recipes (3rd ed) in the chapter that discusses the Beta, incomplete Gamma, and so on. I was surprised to experiment with the book's implementation of function Gammaln and see that exp(Gammaln(x)) gives good results for Gamma(x) even small values of x. Until now, I had the impression that Gammaln was good only for large values of x.

The NR version I read calculates Gamma with an approximation which is good for about 10 digits (if evaluated in double precision). Of course lnGamma often is the only way to handle large arguments. In statistics we often deal with quotients of Gamma values, or more generally with quotients of very large numbers where the result is of perfectly "normal" order. Here using logs is very useful.

The accuracy limits of a lnGamma function show up at large results: even if lnGamma is exact to, say, 16 significant digits and the integer part has four digits, ±1 ULP of potential error equals ±1 E–12. Which in turn means that Gamma = exp(lnGamma) may have a relative error of 1 E–12, so we get no more than 12 digits accuracy for Gamma itself. On the other hand at least there is no overflow error. ;-)

Dieter

Paul Dale · 12-22-2015, 07:30 AM

(12-22-2015 05:54 AM)Namir Wrote: Numerical Recipes

This source is somewhat frowned upon in numerical analysis circles. There are a lot of simplifications in their algorithms and limited or no error handling. These books also gloss over the details a bit too much. I don't remember the details of several of the sources I used for the 34S but I think it inherited some of these algorithms which has hurt in the statistical distributions e.g. Sad

There definitely are better implementations of the incomplete beta and gamma functions and these will be on the cards for the next (am I really that crazy ????) calculator firmware I help develop.

- Pauli

Dieter · (This post was last modified: 12-22-2015 08:51 PM by Dieter.)

(12-22-2015 07:30 AM)Paul Dale Wrote: This source is somewhat frowned upon in numerical analysis circles. There are a lot of simplifications in their algorithms and limited or no error handling.

I think the idea of the NR book simply is to show a first approach that works "good enough". A real world implementation often needs more care. And after all these are just "recipes". The real cooking is up to you. ;-)

(12-22-2015 07:30 AM)Paul Dale Wrote: There definitely are better implementations of the incomplete beta and gamma functions

Could you say a bit more about this? I have used the suggested method (A&S 26.5.8) for the regularized beta function in the 35s distributions pack as well as in Excel's VBA – e.g. for testing the various quantile functions ;-) – and I did not find much to complain about. But I always like to hear about better methods.

Edit: among my PDF files I just found a paper by A. R. DiDonato and M. P. Jarnagin, titled "The Efficient Calculation of the Incomplete Beta-Function Ratio for Half-Integer Values of the Parameters a, b". I seem to remember that you mentioned it some time ago. However, the discussed methods are not trivial.

Dieter

Paul Dale · 12-22-2015, 09:34 PM

(12-22-2015 07:50 PM)Dieter Wrote: Could you say a bit more about this?

I don't have my references to hand at the moment. There was the one for the incomplete beta that split the domain in four regions instead of two which avoided the slow convergence we're seeing on the 34S for large arguments.

I don't think I'd seen the DiDonato and Jarnagin reference before. Handy for the statistical functions.

Pauli

Dieter · (This post was last modified: 12-25-2015 03:00 PM by Dieter.)

(12-21-2015 07:24 PM)Dieter Wrote: This is what I currently use in Excel for calculating the upper F integral.

Addendum: here is a slightly more compact version of the regularized Beta function. While in the previous implementation one loop calculates both the odd and even terms of the continued fraction, here is a version that computes them separately:

Code:

Function ibeta(ByVal x, ByVal a, ByVal b)

tiny = 1E-300

c = 1

d = 1 - (a + b) * x / (a + 1)

If (Abs(d) < tiny) Then d = tiny

d = 1 / d

h = d

k = 2

Do

   m = k \ 2

   r = x / ((a + k - 1) * (a + k))

   If k Mod 2 = 0 Then

      aa = m * (b - m) * r

   Else

      aa = -(a + m) * (a + b + m) * r

   End If

   c = aa / c + 1

   If (Abs(c) < tiny) Then c = tiny

   d = aa * d + 1

   If (Abs(d) < tiny) Then d = tiny

   d = 1 / d

   adj = c * d

   h = h * adj

   k = k + 1

Loop Until (adj = 1) Or (m > 200)

If m > 200 Then

   ibeta = -1

Else

   ibeta = h * x ^ a * (1 - x) ^ b / a / betaf(a, b)

End If

End Function

In this particular case I think the underflow tests (if abs(x)<tiny then x=tiny) follow an addition of 1, so maybe they can be replaced with "if x=0 then...". Maybe the floating point experts can comment on this.

Dieter

Namir · 12-26-2015, 03:03 PM

Thanks for the update Dieter. Hope you are enjoying the holidays. May 2016 brings us more new cool algorithms, smartphone apps, and new machines (just wishing).

Namir