(32S, 32SII) Birthday paradox, hash collision probability
04-27-2020, 10:43 PM (This post was last modified: 04-30-2020 01:51 PM by Albert Chan.)
Post: #6
 Albert Chan Senior Member Posts: 1,676 Joined: Jul 2018
RE: (32S, 32SII) Birthday paradox, hash collision probability
(04-27-2020 07:54 PM)Albert Chan Wrote:  x + log(1-x) = x - (x + x^2/2 + x^3/3 + x^4/4 + ...) = -(x^2/2 + x^3/3 + x^4/4 + ...)

If we let y = x/(2-x) → x = 2*y - x*y

x + log(1-x) = x - 2*(y + y^3/3 + y^5/5 + ...) = -x*y - 2*(y^3/3 + y^5/5 + ...)

→ ln(P(n,s)) ≈ -s/(12*n*(n-s)) + 2*(n-s+0.5)(y^3/3 + y^5/5 + ...) - (s-1)y

I find that y^3 term already gives very good approximation.
Drop the first term, this is my revised ln_nr(n, s), shorter and more accurate.

Code:
function ln_nr(n,s)       -- log(probability of no repetition)
local y = s/(n-s+n)   -- assumed small s, big n
return (2/3*(n-s+0.5)*y*y - (s-1))*y
end

function P_nr(n,s)        -- return P_nr, 1 - P_nr
local x = ln_nr(n,s)
local y = exp(x)
return y, 1-y+y*(log(y)-x)
end

lua> P_nr(365,30)
0.2936840560221026 ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿0.7063159439778974
lua> P_nr(2^128,5e12)
0.9999999999999633 ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿3.6734198463188465e-014
lua> P_nr(2^128,2.6153e18)
0.9900001600944208 ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ ﻿0.009999839905579181

We can even drop y^3 term ...

lua> function ln_nr(n,s) return s*(s-1)/(s-n-n) end

lua> P_nr(365,30)
0.2885585859237581 ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ 0.711441414076242
lua> P_nr(2^128,5e12)
0.9999999999999633 ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ 3.6734198463188465e-014
lua> P_nr(2^128,2.6153e18)
0.9900001600944208 ﻿ ﻿ ﻿ ﻿ ﻿ ﻿ 0.009999839905579181

Update: switched sign of y so x and y has same sign. This make it clear that x + log(1-x) ≤ 0
 « Next Oldest | Next Newest »