(42S) Binomial probability distribution with large input support
09-25-2019, 03:16 PM (This post was last modified: 10-01-2019 07:05 PM by Dave Britten.)
Post: #1
 Dave Britten Senior Member Posts: 1,453 Joined: Dec 2013
(42S) Binomial probability distribution with large input support
This program will calculate a cumulative binomial probability distribution between the specified lower and upper bounds on the number of successful trials. By making extensive use of logarithms and the log gamma function, probabilities for extremely large numbers of trials/successes can be computed without overflowing the intermediate combinatoric calculations.

Results are returned to X. This program uses numbered registers 00, 01, and 02.

Usage:

XEQ "BINOM"

You will be prompted to enter 4 arguments. Press [R/S] after entering each value.

N - Total number of trials
P - Probability of success of a single trial
MIN - Minimum number of successes
MAX - Maximum number of successes

MIN and MAX may be set to the same value if you only want to calculate the binomial probability mass of a single outcome. Note that MIN and MAX must be equal if MIN is so large that rounding would cause MIN+1=MIN, otherwise the program will go into an infinite loop. You can still calculate probability mass for absurdly large inputs like N = 4E20, P = 1/6, MIN = MAX = 4E20/6.

The calculation starts after entering MAX. After a few moments (or a lot of moments if you're doing a very large range of successes), results will be in X. Press [R/S] to run another calculation; you will be prompted for new input values.

Example:

Suppose you have a six-sided die you want to ensure is fair (i.e. it produces a truly uniform distribution). You roll the die 40,000 times, and get a result of one 6,378 times. Knowing that the mean of the distribution is 6,666.67 (N * P), you thus want to determine the odds of rolling any result outside the range 6,379 to 6,954 times.

[XEQ] "BINOM" [ENTER]
40000 [R/S]
6 [1/x] [R/S]
6379 [R/S]
6954 [R/S]

Result of rolling within the range: 99.989%
Result of rolling outside the range: 100% - 99.989% = 0.0112%

Conclusion: The die is probably not fair.

The algorithm

Notation
B(n, r, P): Binomial probability mass function for number of trials n, number of successes r, and individual probability of success P.

The program starts by computing ln(B(N, MIN, P)) using the natural logarithm of the gamma function:

ln(B(n, r, P))
= ln(nCr * P^r * (1 - P)^(n - r))
= ln(n!/r!/(n-r)! * P^r * (1 - P)^(n - r))
= lnΓ(n+1) - lnΓ(r+1) - lnΓ(n-r+1) + r * ln(P) + (n-r) * ln(1-P)

This allows direct computation of the probability using very large values of n or r that would cause the COMB function to overflow.

The program for the log gamma function is a direct conversion of the BASIC program shown here: http://www.rskey.org/fx850p This conversion is nothing novel, just a lot of stack stunts, and stack saving/restoration so it can be used from other programs.

Each successive term ln(B(N, r+1, P)) up to r+1=MAX is computed using:

B(n, r+1, P) = B(n, r, P) * (n-r)/(r+1) * P/(1-P)

...thus...

ln(B(n, r+1, P)) = ln(B(n, r, P)) + ln(n-r) - ln(r+1) + ln(P) - ln(1-P)

The antilogs of all of these terms are summed to produce a final cumulative probability.

Program Code

Code:
00 { 172-Byte Prgm } 01▸LBL "LNGAMM" 02 STO "GX" 03 R↓ 04 STO "GY" 05 R↓ 06 STO "GZ" 07 R↓ 08 STO "GT" 09 R↓ 10 1 11 X<>Y 12▸LBL 00 13 5 14 X<Y? 15 GTO 01 16 R↓ 17 STO× ST Y 18 1 19 + 20 GTO 00 21▸LBL 01 22 R↓ 23 1188 24 1/X 25 RCL÷ ST Y 26 RCL÷ ST Y 27 1680 28 1/X 29 - 30 RCL÷ ST Y 31 RCL÷ ST Y 32 1260 33 1/X 34 + 35 RCL÷ ST Y 36 RCL÷ ST Y 37 360 38 1/X 39 - 40 RCL÷ ST Y 41 RCL÷ ST Y 42 12 43 1/X 44 + 45 RCL ST Z 46 X↑2 47 RCL× ST Z 48 1/X 49 PI 50 × 51 2 52 × 53 LN 54 2 55 ÷ 56 RCL- ST Z 57 RCL ST Z 58 LN 59 RCL× ST T 60 + 61 RCL ST Y 62 RCL÷ ST T 63 + 64 RCL "GX" 65 STO ST L 66 R↓ 67 RCL "GT" 68 RCL "GZ" 69 RCL "GY" 70 R↑ 71 CLV "GX" 72 CLV "GY" 73 CLV "GZ" 74 CLV "GT" 75 END 00 { 160-Byte Prgm } 01▸LBL "BINOM" 02▸LBL 00 03 INPUT "N" 04 INPUT "P" 05 INPUT "MIN" 06 INPUT "MAX" 07 RCL "N" 08 1 09 + 10 XEQ "LNGAMM" 11 RCL "MIN" 12 1 13 + 14 XEQ "LNGAMM" 15 - 16 RCL "N" 17 RCL- "MIN" 18 1 19 + 20 XEQ "LNGAMM" 21 - 22 RCL "P" 23 LN 24 RCL× "MIN" 25 + 26 RCL "P" 27 +/- 28 LN1+X 29 RCL "N" 30 RCL- "MIN" 31 × 32 + 33 STO 00 34 0 35 STO 01 36 RCL "MIN" 37 STO 02 38▸LBL 03 39 RCL 00 40 E↑X 41 STO+ 01 42▸LBL 04 43 RCL 02 44 RCL "MAX" 45 X≤Y? 46 GTO 08 47 RCL "N" 48 RCL- 02 49 LN 50 RCL 02 51 LN1+X 52 - 53 RCL "P" 54 ENTER 55 LN 56 X<>Y 57 +/- 58 LN1+X 59 - 60 + 61 STO+ 00 62 1 63 STO+ 02 64 GTO 03 65▸LBL 08 66 RCL 01 67 STOP 68 GTO 00 69 END binom.zip (Size: 479 bytes / Downloads: 0)
09-28-2019, 10:39 AM
Post: #2
 wawa Junior Member Posts: 29 Joined: Dec 2013
RE: (42S) Binomial probability distribution with large input support
It works great ! thank you for this. Your program is much better than the one I used.
09-28-2019, 03:07 PM (This post was last modified: 09-28-2019 03:11 PM by Dave Britten.)
Post: #3
 Dave Britten Senior Member Posts: 1,453 Joined: Dec 2013
RE: (42S) Binomial probability distribution with large input support
(09-28-2019 10:39 AM)wawa Wrote:  It works great ! thank you for this. Your program is much better than the one I used.

Thanks, there's still some room for improvement, of course. I need to make it fail faster when trying the initial COMB approach. I think that could be done easily enough by first checking if COMB overflows at MIN or MAX, or at the mean (N * P rounded) if it lies between MIN and MAX. Being able to remove all the "SF 25...FC?C 25" in the main loop would improve the speed too.

I did a test run with N=4,000,000, P=0.2, MIN=0, MAX=100,000, and on USB power, it took about 9 minutes to return 0. About half of that was the program waiting for COMB to overflow. Compare that to my Casio fx-9860GII, which returns the same result in about 2 seconds using the BinomialCD function. I'm not sure it's worth the effort trying to make it select an algorithm based on which it thinks will be faster. It depends on how large MIN is (the COMB algorithm can start from MIN rather than 0), and how slow COMB will be between MIN and MAX. I'll probably just add a simple FS? at the beginning to allow forcing the recursive-sum-of-logs algorithm.

I'll probably tackle negative and inverse binomials after I get this buttoned up to my liking.
10-01-2019, 07:13 PM
Post: #4
 Dave Britten Senior Member Posts: 1,453 Joined: Dec 2013
RE: (42S) Binomial probability distribution with large input support
I took another pass at this and removed the COMB algorithm altogether, and replaced the calculation of the initial term B(N, MIN, P) with one based on the natural logarithm of the gamma function. This allows it to start directly from r = MIN instead of iterating from 0, yielding a probability mass in constant time with no loops. The cumulative probability is still a loop that sums the logarithms of the successive terms. I'm not sure if there's a way to do that without looping.
 « Next Oldest | Next Newest »

User(s) browsing this thread: 1 Guest(s)