Re: *NEW* WP 34S bug list Message #31 Posted by Dieter on 17 Apr 2011, 11:32 a.m., in response to message #1 by Walter B
I took a look at stats.c (revision 672) and the code for the normal cdf and its inverse. At some points it looks familiar. ;-) However, here are some suggestions for improvements and even a bugfix:
- The quantile is determined by solving ln [1 + (cdf(x)-p)/p] for x, as suggested in an earlier post. However, if I get the code right, it actually evaluates ln[1+(cdf(x)-p)]/p. Obviously lines 902 and 903 should get swapped here.
The log1+x may even be omitted since the initial guesses are rather close to the final result now. Without log1+x we simply solve for the relative error to become zero (or less than 1E-16). I haven't tried this, so please check. The log may be useful since it makes the remaining error more linear which might lead to faster convergence.
- The normal CDF is evaluated either by a series (center) or a continued fraction expansion (tails), with s split at x=10. This means that even for x=5 or x=8 the series is used, which converges very slowly here (in fact the series works best for |x|<1). That's why the split between both methods should be done much earlier, usually somewhere between 2 and 3. In this range both methods require roughly the same time so that the result is obtained as fast as possible. But there's another, even more important reason for an earlier split: the series expansion finally adds a constant 0,5 - which means that there is a loss of precision because of cancelled digits. With 39 digits this might be tolerable as we only need 16 digits in the final result. But at x=10 the CDF is 7,6 E-24, so things get very close here. Mathematically, the best point to split would be quantile(0,1) = 1,28155. As mentioned, a value between 2 and 3 would be better in terms of execution speed, so the optimum here is quantile(0,01) = 2,3263. Up to this point only one digit is lost in the CDF. So just one guard digit is required, while we got more than 20 of those. ;-) And the final result is obtained much faster than before.
This of course means that the number of terms for the CF calculation has to be known. Right now 20 terms are used for any value > 10, which should give about 20 valid digits. For this precision I would suggest to try n = 5 + k/(|x|-1) with k = 150...180. This seems to be a good estimate for |x| > 2.
BTW the CF implementaion has a minor error which usually does not show up in the result but should be removed anyway. It was obviously copied from an earlier post (also found in "normal-formulas") that includes the same error:
"not quite right" ;-) correct
------------------------------------------------------------
n := [ see above ] n := [ see above ]
s := 1/n s := 0
for c := n-1 downto 1 do for c := n downto 1 do
s := c / (s + x) s := c / (s + x)
s := s + x s := s + x
cdf := pdf(x) / s cdf := pdf(x) / s
This should be corrected.
At the moment the CF method is used only for large negative arguments, so x = +5 or x = +8 are calculated by the series expansion. I strongly suggest to use the series only for arguments with an absolute value less than a certain limit, and finally return the result or 1-result. As mentioned, the split between CF and series should be done somewhere near x = 2...3.
- The documentation ("formulas") includes a series for the inverse error function. This can be used here as well. I also mentioned a short (four-term) series for the inverse quantile which gives exact results near the center (p = 0,495...0,505). This is quite useful since here the solver may return a slightly imprecise result, because very close to the center there are hundreds or even millions of quantiles that return the same CDF.
Finally two remarks on the date functions:
- The split between Julian and Gregorian dates now is done in 1752, which is the date the British Empire and its colonies happened to switch. Please, use the "official" date, which is 4 October 1582 (last Julian date) resp. 15 October 1582 (first Gregorian date). I do not know of any other calendar algorithm that would use a different date. Okay, Unix may be an exception. ;-)
- There also seems to be an extensive error check for illegal dates. This could be simplified by converting the date to its Julian day number and back again. If this result is the same as the original date it's valid, otherwise it's not. ;-)
Dieter
|