Post Reply 
Birthday paradox, 8^8, etc.
05-18-2015, 01:11 PM
Post: #1
Birthday paradox, 8^8, etc.
I came across this interesting site earlier this morning:

Essentially, it's 8 multiple choice questions, each with 8 different answers, and the goal is to try to match people up with others who provided the same responses for all 8.

I immediately thought of the birthday paradox (which probably says as much about me as my answers would), and ran a few numbers.

The stats at the bottom of the page call out "59,295 TESTS 914 MATCHES". The number of matches seems surprising, but then this wouldn't be a "paradox" if the results weren't counter-intuitive. With a selection space of 8^8 (16,777,216), you hit 50% odds of a collision with a sample size of about 4,822. (Based on solving the approximation 1-e^(-n^2/(2*p))=0.5, where p is the selection space, i.e. 8^8. Running a looping product program on my 32S confirms 49.99% after crunching numbers for a few minutes.)

Now, that's all based on purely random selections, which I'm reasonably certain this data set is not. Some responses will be more popular than others, and there are surely correlations between certain responses on different questions.

How does one calculate the expected number of matches given a sample size (whether it's with the same person or different people is irrelevant)? It wouldn't be a binomial probability distribution, as these are not independent trials. Better yet, how can we empirically obtain a normal distribution of the expected number matches? I'd be interested in finding out how much above the mean they currently are.

Or would it be easier to just break out Visual Studio and Monte Carlo method it for an hour? Smile
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 

Messages In This Thread
Birthday paradox, 8^8, etc. - Dave Britten - 05-18-2015 01:11 PM

User(s) browsing this thread: 1 Guest(s)