(42S) Binomial probability distribution with large input support
|
09-25-2019, 03:16 PM
(This post was last modified: 10-01-2019 07:05 PM by Dave Britten.)
Post: #1
|
|||
|
|||
(42S) Binomial probability distribution with large input support
This program will calculate a cumulative binomial probability distribution between the specified lower and upper bounds on the number of successful trials. By making extensive use of logarithms and the log gamma function, probabilities for extremely large numbers of trials/successes can be computed without overflowing the intermediate combinatoric calculations.
Results are returned to X. This program uses numbered registers 00, 01, and 02. Usage: XEQ "BINOM" You will be prompted to enter 4 arguments. Press [R/S] after entering each value. N - Total number of trials P - Probability of success of a single trial MIN - Minimum number of successes MAX - Maximum number of successes MIN and MAX may be set to the same value if you only want to calculate the binomial probability mass of a single outcome. Note that MIN and MAX must be equal if MIN is so large that rounding would cause MIN+1=MIN, otherwise the program will go into an infinite loop. You can still calculate probability mass for absurdly large inputs like N = 4E20, P = 1/6, MIN = MAX = 4E20/6. The calculation starts after entering MAX. After a few moments (or a lot of moments if you're doing a very large range of successes), results will be in X. Press [R/S] to run another calculation; you will be prompted for new input values. Example: Suppose you have a six-sided die you want to ensure is fair (i.e. it produces a truly uniform distribution). You roll the die 40,000 times, and get a result of one 6,378 times. Knowing that the mean of the distribution is 6,666.67 (N * P), you thus want to determine the odds of rolling any result outside the range 6,379 to 6,954 times. [XEQ] "BINOM" [ENTER] 40000 [R/S] 6 [1/x] [R/S] 6379 [R/S] 6954 [R/S] Result of rolling within the range: 99.989% Result of rolling outside the range: 100% - 99.989% = 0.0112% Conclusion: The die is probably not fair. The algorithm Notation B(n, r, P): Binomial probability mass function for number of trials n, number of successes r, and individual probability of success P. The program starts by computing ln(B(N, MIN, P)) using the natural logarithm of the gamma function: ln(B(n, r, P)) = ln(nCr * P^r * (1 - P)^(n - r)) = ln(n!/r!/(n-r)! * P^r * (1 - P)^(n - r)) = lnΓ(n+1) - lnΓ(r+1) - lnΓ(n-r+1) + r * ln(P) + (n-r) * ln(1-P) This allows direct computation of the probability using very large values of n or r that would cause the COMB function to overflow. The program for the log gamma function is a direct conversion of the BASIC program shown here: http://www.rskey.org/fx850p This conversion is nothing novel, just a lot of stack stunts, and stack saving/restoration so it can be used from other programs. Each successive term ln(B(N, r+1, P)) up to r+1=MAX is computed using: B(n, r+1, P) = B(n, r, P) * (n-r)/(r+1) * P/(1-P) ...thus... ln(B(n, r+1, P)) = ln(B(n, r, P)) + ln(n-r) - ln(r+1) + ln(P) - ln(1-P) The antilogs of all of these terms are summed to produce a final cumulative probability. Program Code Code: 00 { 172-Byte Prgm } binom.zip (Size: 479 bytes / Downloads: 2) |
|||
09-28-2019, 10:39 AM
Post: #2
|
|||
|
|||
RE: (42S) Binomial probability distribution with large input support
It works great ! thank you for this. Your program is much better than the one I used.
|
|||
09-28-2019, 03:07 PM
(This post was last modified: 09-28-2019 03:11 PM by Dave Britten.)
Post: #3
|
|||
|
|||
RE: (42S) Binomial probability distribution with large input support
(09-28-2019 10:39 AM)wawa Wrote: It works great ! thank you for this. Your program is much better than the one I used. Thanks, there's still some room for improvement, of course. I need to make it fail faster when trying the initial COMB approach. I think that could be done easily enough by first checking if COMB overflows at MIN or MAX, or at the mean (N * P rounded) if it lies between MIN and MAX. Being able to remove all the "SF 25...FC?C 25" in the main loop would improve the speed too. I did a test run with N=4,000,000, P=0.2, MIN=0, MAX=100,000, and on USB power, it took about 9 minutes to return 0. About half of that was the program waiting for COMB to overflow. Compare that to my Casio fx-9860GII, which returns the same result in about 2 seconds using the BinomialCD function. I'm not sure it's worth the effort trying to make it select an algorithm based on which it thinks will be faster. It depends on how large MIN is (the COMB algorithm can start from MIN rather than 0), and how slow COMB will be between MIN and MAX. I'll probably just add a simple FS? at the beginning to allow forcing the recursive-sum-of-logs algorithm. I'll probably tackle negative and inverse binomials after I get this buttoned up to my liking. |
|||
10-01-2019, 07:13 PM
Post: #4
|
|||
|
|||
RE: (42S) Binomial probability distribution with large input support
I took another pass at this and removed the COMB algorithm altogether, and replaced the calculation of the initial term B(N, MIN, P) with one based on the natural logarithm of the gamma function. This allows it to start directly from r = MIN instead of iterating from 0, yielding a probability mass in constant time with no loops. The cumulative probability is still a loop that sums the logarithms of the successive terms. I'm not sure if there's a way to do that without looping.
|
|||
« Next Oldest | Next Newest »
|
User(s) browsing this thread: 1 Guest(s)