(11C) Probability of No Repetitions
01-10-2020, 12:42 PM (This post was last modified: 01-11-2020 01:02 AM by Gamo.)
Post: #1 Gamo Senior Member Posts: 699 Joined: Dec 2016
(11C) Probability of No Repetitions
This program was adapted from HP-55 Statistic Book (Page 12)

Reference:
E. Parzen, Modern Probability Theory and its Applications,
John Wiley and Sons, 1960 (CH. 2 Page 46)

As stated in the book:

Probability of No Repetitions in a Sample

Suppose a sample of size n is drawn with replacement from population
containing m different objects. Let P be the probability that there are no
repetitions in the sample, then

P = [1- (1/m)][1- (2/m)]....[1- (n-1/m)]

Given integer m, n such that m ≥ n ≥ 1 this program finds the probability P.

Remark:
The execution time of the program depends on n; the larger n is, the longer it takes.
------------------------------------------------------------------------------
Example: HP-55 Statistic Book page 13

In a room containing n persons, what is the probability that no two or more
persons have the same birthday for n = 4, 23, 48?

Note: m = 365 // 365 is the days amount of birthday

[USER] [FIX] 2

365 [A] display 365.00 // Enter m

4 [B] display brieftly 0.98 then 2.00 // Enter n

23 [B] display briefly 0.49 the 51.00

48 [B] display briefly 0.04 then 96.00

In a room for the probability that at lease two of them will
have the same birthday

4 people in a room P = 0.98 or only 2%
23 people in a room P = 0.49 or 51%
48 people in a room P = 0.04 or 96%
--------------------------------------------------------------
Program:
Code:
 LBL A  // m STO 1 RTN LBL B  // n STO 2 1 STO 0 LBL 0 RCL 1 RCL 2 1 - STO 2 0 X=Y GTO 1 Rv X<>Y ÷ 1 X<>Y - STOx0 GTO 0 LBL 1 RCL 0 PSE 1 RCL 0 RND - EEX 2 x RTN

Gamo 1/2020
01-11-2020, 01:46 AM (This post was last modified: 01-11-2020 09:02 PM by Albert Chan.)
Post: #2
 Albert Chan Senior Member Posts: 1,591 Joined: Jul 2018
RE: (11C) Probability of No Repetitions
(01-10-2020 12:42 PM)Gamo Wrote:  Probability of No Repetitions in a Sample

Suppose a sample of size n is drawn with replacement from population
containing m different objects. Let P be the probability that there are no
repetitions in the sample, then

P = [1- (1/m)][1- (2/m)]....[1- (n-1)/m]

This is similar to thread (42S) Probability of Same Birthday Day

We can do approximation the same way: products of terms = exp of sum of ln(term)
Sum is approximated by integral with correction (bolded expression below)

If we let u = 1-x/m, we have x = m-m*u

XCas> f := ln(1-x/m)
XCas> expand(subst(int(f) - f/2 + diff(f)/12, x = m-m*u)) ﻿ ﻿ ﻿→ m*u - 1/(12*m*u) - ln(u)/2 - m*u*ln(u)
XCas> g(m,u) := -1/(12*m*u) - ln(u)*(0.5 + m*u) + m*u ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ // ≈ sum formula
XCas> p(m,n) := exp(g(m, 1.-n/m) - g(m, 1.-1/m)) ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿﻿ ﻿ ﻿ ﻿ ﻿﻿// ≈ product(1-x/m, x=1 .. n-1)

XCas> p(365, 4) ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿→ 0.9836440875
XCas> p(365,23) ﻿ ﻿ ﻿ ﻿ ﻿ ﻿→ 0.4927027657
XCas> p(365,48) ﻿ ﻿ ﻿ ﻿ ﻿ ﻿→ 0.03940202712
01-11-2020, 02:15 PM (This post was last modified: 04-28-2020 11:50 PM by Albert Chan.)
Post: #3
 Albert Chan Senior Member Posts: 1,591 Joined: Jul 2018
RE: (11C) Probability of No Repetitions
(01-10-2020 12:42 PM)Gamo Wrote:  P = [1- (1/m)][1- (2/m)]....[1- (n-1)/m]

Perhaps it is more useful by adding a (1-0/m) term, we have P = perm(m,n) / mn

Using the same idea to approximate ln(perm(m,n)):

XCas> f := ln(m-x)
XCas> expand(subst(int(f) - f/2 + diff(f)/12, x = m-u)) ﻿ ﻿ ﻿ ﻿ ﻿→ -1/(12*u) - ln(u)/2 - u*ln(u) + u
XCaS> g(u) := -1/(12*u) - ln(u)/2 - u*ln(u) + u
XCas> ln_perm(m,n) := g(m-n) - g(m) ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ // integral limit x = 0 to n, u=m-x

ln_perm() can be simplified and more accurate by putting tiny terms up front, and use logp1()

$$\large \ln(perm(m,n)) ≈ {-n\over 12m(m-n)} - [n + (m-n+½)\ln(1 - n/m)]+ n \ln(m)$$

For ln(P), above last term cancelled out.

XCas> ln_P(m,n) := -n/(12*m*(m-n)) - (n + (m-n+0.5)*ln(1-n/m))
XCas> ln_perm(m,n) := ln_P(m,n) + n*ln(m)

XCas> e^ln_P(365, 4) ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿→ 0.9836440875
XCas> e^ln_P(365,23) ﻿ ﻿ ﻿ ﻿ ﻿ ﻿→ 0.4927027657
XCas> e^ln_P(365,48) ﻿ ﻿ ﻿ ﻿ ﻿ ﻿→ 0.03940202712

Trivia: ln_perm(m,n) = ln_perm(m,a) + ln_perm(m-a, n-a), same as real permutation function.

if m ≈ n, ln_perm(m,n) is not very accurate.
With above trivia, splitting ln_perm(m,n) does not help.
However, we can replace the 2nd term with real ln(perm(m-a,n-a))

Example: ln(perm(69,68)) = ln(69!) ≈ 226.1905483
XCas> ln_perm(69,68) ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿→ 226.1882765 // slightly worse than log of Stirling's formula.
XCas> ln_perm(68,67) + ln(2) ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿→ 226.1902224 // perm(69-67, 68-67) = perm(2,1) = 2
XCas> ln_perm(68,66) + ln(6) ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿→ 226.1904485 // perm(69-66, 68-66) = perm(3,2) = 6