(42S) Binomial probability distribution with large input support

09252019, 03:16 PM
(This post was last modified: 10012019 07:05 PM by Dave Britten.)
Post: #1




(42S) Binomial probability distribution with large input support
This program will calculate a cumulative binomial probability distribution between the specified lower and upper bounds on the number of successful trials. By making extensive use of logarithms and the log gamma function, probabilities for extremely large numbers of trials/successes can be computed without overflowing the intermediate combinatoric calculations.
Results are returned to X. This program uses numbered registers 00, 01, and 02. Usage: XEQ "BINOM" You will be prompted to enter 4 arguments. Press [R/S] after entering each value. N  Total number of trials P  Probability of success of a single trial MIN  Minimum number of successes MAX  Maximum number of successes MIN and MAX may be set to the same value if you only want to calculate the binomial probability mass of a single outcome. Note that MIN and MAX must be equal if MIN is so large that rounding would cause MIN+1=MIN, otherwise the program will go into an infinite loop. You can still calculate probability mass for absurdly large inputs like N = 4E20, P = 1/6, MIN = MAX = 4E20/6. The calculation starts after entering MAX. After a few moments (or a lot of moments if you're doing a very large range of successes), results will be in X. Press [R/S] to run another calculation; you will be prompted for new input values. Example: Suppose you have a sixsided die you want to ensure is fair (i.e. it produces a truly uniform distribution). You roll the die 40,000 times, and get a result of one 6,378 times. Knowing that the mean of the distribution is 6,666.67 (N * P), you thus want to determine the odds of rolling any result outside the range 6,379 to 6,954 times. [XEQ] "BINOM" [ENTER] 40000 [R/S] 6 [1/x] [R/S] 6379 [R/S] 6954 [R/S] Result of rolling within the range: 99.989% Result of rolling outside the range: 100%  99.989% = 0.0112% Conclusion: The die is probably not fair. The algorithm Notation B(n, r, P): Binomial probability mass function for number of trials n, number of successes r, and individual probability of success P. The program starts by computing ln(B(N, MIN, P)) using the natural logarithm of the gamma function: ln(B(n, r, P)) = ln(nCr * P^r * (1  P)^(n  r)) = ln(n!/r!/(nr)! * P^r * (1  P)^(n  r)) = lnΓ(n+1)  lnΓ(r+1)  lnΓ(nr+1) + r * ln(P) + (nr) * ln(1P) This allows direct computation of the probability using very large values of n or r that would cause the COMB function to overflow. The program for the log gamma function is a direct conversion of the BASIC program shown here: http://www.rskey.org/fx850p This conversion is nothing novel, just a lot of stack stunts, and stack saving/restoration so it can be used from other programs. Each successive term ln(B(N, r+1, P)) up to r+1=MAX is computed using: B(n, r+1, P) = B(n, r, P) * (nr)/(r+1) * P/(1P) ...thus... ln(B(n, r+1, P)) = ln(B(n, r, P)) + ln(nr)  ln(r+1) + ln(P)  ln(1P) The antilogs of all of these terms are summed to produce a final cumulative probability. Program Code Code: 00 { 172Byte Prgm } binom.zip (Size: 479 bytes / Downloads: 0) 

09282019, 10:39 AM
Post: #2




RE: (42S) Binomial probability distribution with large input support
It works great ! thank you for this. Your program is much better than the one I used.


09282019, 03:07 PM
(This post was last modified: 09282019 03:11 PM by Dave Britten.)
Post: #3




RE: (42S) Binomial probability distribution with large input support
(09282019 10:39 AM)wawa Wrote: It works great ! thank you for this. Your program is much better than the one I used. Thanks, there's still some room for improvement, of course. I need to make it fail faster when trying the initial COMB approach. I think that could be done easily enough by first checking if COMB overflows at MIN or MAX, or at the mean (N * P rounded) if it lies between MIN and MAX. Being able to remove all the "SF 25...FC?C 25" in the main loop would improve the speed too. I did a test run with N=4,000,000, P=0.2, MIN=0, MAX=100,000, and on USB power, it took about 9 minutes to return 0. About half of that was the program waiting for COMB to overflow. Compare that to my Casio fx9860GII, which returns the same result in about 2 seconds using the BinomialCD function. I'm not sure it's worth the effort trying to make it select an algorithm based on which it thinks will be faster. It depends on how large MIN is (the COMB algorithm can start from MIN rather than 0), and how slow COMB will be between MIN and MAX. I'll probably just add a simple FS? at the beginning to allow forcing the recursivesumoflogs algorithm. I'll probably tackle negative and inverse binomials after I get this buttoned up to my liking. 

10012019, 07:13 PM
Post: #4




RE: (42S) Binomial probability distribution with large input support
I took another pass at this and removed the COMB algorithm altogether, and replaced the calculation of the initial term B(N, MIN, P) with one based on the natural logarithm of the gamma function. This allows it to start directly from r = MIN instead of iterating from 0, yielding a probability mass in constant time with no loops. The cumulative probability is still a loop that sums the logarithms of the successive terms. I'm not sure if there's a way to do that without looping.


« Next Oldest  Next Newest »

User(s) browsing this thread: 1 Guest(s)