HP Forums

Full Version: Binomial Probability Distribution
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
This program calculates the odds of a specified number of successes from independent trials with a known probability of success.

This requires a combinations function named COMB that calculates nCr. If you need one, see here for a few options.

Inputs:
t: Number of trials
z: Individual probability of success
y: Minimum number of successes
x: Maximum number of successes

Probability will be returned to x.

Example:
What is the probability of obtaining at least two even numbers when rolling five standard 6-sided dice?

5 ENTER
.5 ENTER
2 ENTER
5 XEQ BINPR

Result: 0.8125 (81.25%)

Edit: Rewritten with Dieter's suggested optimizations. It's a bit longer, but faster. I'm sure there's still room for more elegant stack/register use here. I'll wait until those shake out before I pencil the new listing into my Moleskine. Smile

Running it with inputs 120, 1/6, 40, 120 yields 6.419629920E-6 in just over a minute. Compare to the version on my 48 using a direct summation of terms and repeated COMB calls returning 6.4196298769E-6. That's an error of 6.71E-7%, which isn't a terrible sacrifice for having the program finish inside of a lunar cycle (or possibly solar orbit) now.

Code:
t: Number of trials
z: Individual probability of success
y: Minimum number of successes
x: Maximum number of successes

01 LBL "BINPR"
02 X<>Y
03 -
04 STO 00
05 RDN
06 STO 01
07 X<>Y
08 STO 02
09 LASTX
10 STO 03
11 XEQ "COMB"
12 RCL 01
13 RCL 03
14 Y^X
15 *
16 1
17 RCL 01
18 -
19 RCL 02
20 RCL 03
21 -
22 Y^X
23 *
24 RCL 00
25 X>0?
26 GTO 00
27 RDN
28 RTN
29 LBL 00
30 RDN
31 STO 04
32 RCL 01
33 ENTER
34 ENTER
35 1
36 X<>Y
37 -
38 /
39 X<>Y
40 LBL 01
41 RCL Y
42 *
43 RCL 03
44 1
45 +
46 /
47 RCL 02
48 RCL 03
49 -
50 *
51 ST+ 04
52 1
53 ST+ 03
54 RDN
55 DSE 00
56 GTO 01
57 RCL 04
58 RTN
59 END
(05-24-2015 05:58 PM)Dave Britten Wrote: [ -> ]This program calculates the odds of a specified number of successes from independent trials with a known probability of success.

This requires a combinations function named COMB that calculates nCr. If you need one, see here.

Dave, it looks like your program evaluates COMB in every single loop so that it should not run particulary fast. But there is a simple fix: once nCr is known, the following nCr+1 can be evaluated very easily from this. Even the whole Binomial PDF for x+1 can be calculated from the PDF of the previous x, so that the slow power function in the loop can be avoided as well.

Dieter
(05-25-2015 09:51 PM)Dieter Wrote: [ -> ]
(05-24-2015 05:58 PM)Dave Britten Wrote: [ -> ]This program calculates the odds of a specified number of successes from independent trials with a known probability of success.

This requires a combinations function named COMB that calculates nCr. If you need one, see here.

Dave, it looks like your program evaluates COMB in every single loop so that it should not run particulary fast. But there is a simple fix: once nCr is known, the following nCr+1 can be evaluated very easily from this. Even the whole Binomial PDF for x+1 can be calculated from the PDF of the previous x, so that the slow power function in the loop can be avoided as well.

Dieter

Thanks for the tip. I'll see if I can incorporate that. I've just always been doing the looped nCr approach.
(05-25-2015 10:11 PM)Dave Britten Wrote: [ -> ]Thanks for the tip. I'll see if I can incorporate that. I've just always been doing the looped nCr approach.

I just tried it and it works well.
Let n and p be the distribution parameters, and k1 resp. k2 the lower/upper bound of the random variable. Then this is the basic idea:
  • Calculate Comb(n, k1).
  • Use this to calculate the probability B(k1,n,p).
    This is the initial value of the sum you want to get.
  • Repeat the following loop k2–k1 times:
  • Calculate B(k+1) = B(k) * (n–k)/(k+1) * p/(1–p)
  • Add this to the sum, increase k by 1.
You can even prestore the constant p/(1–p) so that it has to be calculated only once.

Dieter
(05-25-2015 10:38 PM)Dieter Wrote: [ -> ]
(05-25-2015 10:11 PM)Dave Britten Wrote: [ -> ]Thanks for the tip. I'll see if I can incorporate that. I've just always been doing the looped nCr approach.

I just tried it and it works well.
Let n and p be the distribution parameters, and k1 resp. k2 the lower/upper bound of the random variable. Then this is the basic idea:
  • Calculate Comb(n, k1).
  • Use this to calculate the probability B(k1,n,p).
    This is the initial value of the sum you want to get.
  • Repeat the following loop k2–k1 times:
  • Calculate B(k+1) = B(k) * (n–k)/(k+1) * p/(1–p)
  • Add this to the sum, increase k by 1.
You can even prestore the constant p/(1–p) so that it has to be calculated only once.

Dieter

That appears to mesh with what I came up with: COMB(n, r+1)=COMB(n, r)/(r+1)*(n-r).
First post updated with a new listing. Thanks for the algebra optimization tips.
(05-24-2015 05:58 PM)Dave Britten Wrote: [ -> ]Running it with inputs 120, 1/6, 40, 120 yields 6.419629920E-6 in just over a minute. Compare to the version on my 48 using a direct summation of terms and repeated COMB calls returning 6.4196298769E-6. That's an error of 6.71E-7%, which isn't a terrible sacrifice for having the program finish inside of a lunar cycle (or possibly solar orbit) now.

The HP41 works with 10 digits, the HP48 with 12. For the former 1/6 equals 0,166 666 666 7, for the latter it's 0,166 666 666 667.

This means:
  • The correct result for p=1/6 is 6,4196298765918... E–6
  • The correct 10-digit result for p=0,1666666667 is 6,419629908 E–6.
    So the HP41 is pretty close.
  • The correct 12-digit result for p=0,166666666667 is 6,41962987691 E–6.
    So the HP48 is very close. Since COMB is an internal function and thus most probably exact in all 12 digits, this could be expected.
  • Compared to the true result with p exactly 1/6, both calculators offer similar accuracy: 8 out of 10 digits resp. 10 out of 12. ;-)
BTW my program returns the same result as yours. It has 70 steps including the COMB routine and the example runs in 53 seconds.

Dieter
(05-26-2015 03:47 PM)Dave Britten Wrote: [ -> ]First post updated with a new listing. Thanks for the algebra optimization tips.

Great. Here's my solution. It has 74 steps including its own COMB routine. If you want to use your own external function, simply replace steps 16...32 with your COMB command. The program then is down to 58 steps.

The code below implements another feature that I consider very useful especially for large n resp. wide ranges between k1 and k2.
Take for instance this example:

n=100, p=0,1, k=10...90
The result is 0,5487 and requires the summation of 80 single PDFs. The previous version of my program required 43,5 seconds for this.

However, there is not much sense in adding the PDFs for k=50, 60 or even 90. They are so small that they do not change the final result. At k=40 the PDF already has dropped to 2,47 E–15, so the addition loop can stop here, or even earlier.

This is checked by the program version below: if an additional PDF does not change the CDF sum, the program exits. In the example above the last added term is the PDF for k=36, which was 1,16 E–11 so that it did not change the final result of 0,5487. Which of course is returned much faster now: instead of 43,5 s only 18,5 s are required – less than half the time.

Finally, here is the code:

Code:
01 LBL"BINCDF"
02 x<y?
03 x<>y
04 INT
05 STO 04
06 RDN
07 INT
08 STO 03
09 RDN
10 STO 02
11 RDN
12 INT
13 STO 01
14 RCL 01
15 RCL 03
16 -
17 RCL 03
18 x<y?
19 x<>y     // leave min(n, n-k1) in Y
20 SIGN     // set x:=1
21 x>y?     // min(n, n-k1) = 0 ?
22 GTO 03   // then Comb=1, skip Comb loop
23 LBL 01
24 RCL Z    // calculate Comb(n, k1) iteratively
25 *
26 RCL Y
27 /
28 DSE Z
29 LBL 00   // dummy label = NOP
30 DSE Y
31 GTO 01
32 LBL 03
33 RCL 02
34 RCL 03
35 y^x
36 *
37 1
38 RCL 02
39 -
40 ST/ 02   // prestore p/(1-p)
41 RCL 01
42 RCL 03
43 -
44 STO 01
45 y^x
46 *
47 RCL 04
48 RCL 03
49 -
50 x=0?     // # PDF loops = 0 ?
51 GTO 04   // then quit
52 STO 04
53 x<>y
54 ENTER
55 LBL 02   // stack: PDF(k) CDF(k) z  t
56 RCL 01
57 RCL 02
58 *
59 *        // this way CDF(k) is pushed to T
60 SIGN
61 ST- 01
62 ST+ 03
63 x<> L
64 RCL 03
65 /        // stack: X=PDF(k), Y=CDF(k), Z=CDF(k-1), T=CDF(k-1)
66 +        // PDF in L, current and previous CDF in X and Y
67 x=y?     // result same as before?
68 GTO 04   // then quit
69 LastX    // else recall last PDF
70 DSE 04   // decrement counter
71 GTO 02   // and continue with next k
72 LBL 04
73 RDN
74 END

Now try your example:
Code:
120 [ENTER] 6 [1/x] 40 [ENTER] 120
XEQ"BINCDF"   =>   6,419629920 E–6

The result now is returned in 25 seconds. That's more than twice as fast as the standard approach since only PDFs between k=41 and k=59 had to be summed up. That's merely 19 loops instead of 80(!):

Code:
LastX  => 1,7254 E–16  (last summed = first negligible PDF(k))
RCL 03 => 59  (last k)
RCL 04 => 62  (number of saved loops + 1)

Dieter
Good idea with the early exit. I'll have to play around with that a bit.
(05-28-2015 01:44 AM)Dave Britten Wrote: [ -> ]Good idea with the early exit. I'll have to play around with that a bit.

Here is an update with the Comb loop counting up the denominator:

Code:
.. ...
21 x>y?     // min(n, n-k1) = 0 ?
22 GTO 03   // then Comb=1, skip Comb loop
23 RCL 01
24 LBL 01
25 *        // calculate Comb(n, k1) iteratively
26 DSE L
27 RCL 01
28 LastX
29 -
30 ST/ Y
31 X<> L
32 DSE Z
33 GTO 01
34 RDN
35 LBL 03
.. ...

That's three more steps, on the other hand the RCL 01 command in line 14 now can be omitted. So all in all it's two steps more. But accuracy improves – try your example once again:

Code:
120 [ENTER] 6 [1/x] 40 [ENTER] 120
XEQ"BINCDF"   =>   6,419629908 E–6

That's the exact result for a ten-digit value of 1/6. :-)

Dieter
Reference URL's