|[WP34s] Inverse discrete probability distributions|
Message #1 Posted by Eduardo Duenez on 15 May 2012, 12:30 p.m.
The post will be long, but I see no other way to make what I think is a very important point clearly enough.
I was helping my daughter with her homework. She had to spin a roulette many times and count the frequencies of different outcomes. I wrote a 34s program to do a Monte Carlo simulation. That way she could do about 1000 rolls in 2 seconds (a loop calling RAN# and increasing the appropriate register each time).
Just for fun, I decided to write a second program to do the same but using the inverse binomial distribution instead, only to quickly realize that the implementation of inverse discrete distributions on the 34s is "off by one".
Let me explain. For simplicity let us consider the geometric distribution, say with parameter p=0.5 stored in J.
Such a geometric variable takes the value m=0 with probability 1/2, m=1 with probability 1/4, m=2 with prob=1/8, etc.
The distribution function (the function Geom in the 34s), is the function F(x) defined for all real values of x as follows:
F(x) = 0 if x<0
F(x) = 1/2 if 0<=x<1
F(x) = 3/4 if 1<=x<2
F(x) = 7/8 if 2<=x<3
F(x) = 1 - 1/2^n if n-1<=x<n
and so on ad infinitum. It's OK to put F(-infinity)=0 and F(+infinity)=1 if one is so inclined.
The "inverse distribution" F^(-1)(x), that is, the inverse function to F(x) above is, strictly speaking only defined for all real values of x, but only for the values of x=0,1/2,3/4,7/8,..., etc. And worse, it is not *uniquely* defined for those values. For instance, we could take F^(-1)(1/2) to be equal to any number x such that 0<=x<1. This is mathematically correct, strictly speaking.
The question becomes: What is the more *useful* way to define the inverse functions?
The 34s calls that inverse distribution function above Geom-1. Currently, the 34s calculates this function as follows:
Geom-1(x) = 0 if x<0,
Geom-1(x) = -1 if 0<=x<1/2,
Geom-1(x) = 0 if 1/2<=x<3/4,
Geom-1(x) = 1 if 3/4<=x<7/8,
Geom-1(x) = 2 if 7/8<=x<15/16,
The output for x<0 notwithstanding (which I propose to make simply undefined), the values returned for x>=0 are almost always one unit smaller than they ought to be. The values returned should be:
Geom-1(x) is undefined if x<0 (one can probably argue this should be either -1, or -infinity, or NaN. The point is that nobody should ever need to evaluate an inverse distribution function when the argument is not in the range 0<=x<=1.),
Geom-1(x) = 0 if 0<=x<=1/2,
Geom-1(x) = 1 if 1/2<x<=3/4,
Geom-1(x) = 2 if 3/4<x<=7/8,
Geom-1(x) = 3 if 7/8<x<=15/16,
Unfortunately, it seems as if all of the discrete inverse distributions are "off by 1" in the 43s currently (Caveat: I only checked the geometric and binomial).
I want to point out an important immediate corollary of fixing the inverse distributions as I propose above: One can use the inverse distribution (that is, the "inverse transform method" as explained in these notes) to go from a uniform RAN# variable to a variable with the desired distribution:
generates a random variable distributed geometrically. This method fails with the current implementation of the inverse discrete random variables. If we replace Geom-1 by, say, Binom-1, Expon-1 or Phi-1 we get a binomially, exponentially or normally distributed random variable, regardless of whether the distribution is discrete or continuous. Elegant and consistent.
PS: I have the source code of the probability distributions, but I could not quickly make sense of it so as to point out more specifically how to implement the changes. Sorry.
[Edited to exchange some <= to < in order that the examples I give indeed be inverse to the original distribution.]
Edited: 15 May 2012, 11:40 p.m. after one or more responses were posted