|My solutions & Comments [LONG]|
Message #1 Posted by Valentin Albillo on 26 Nov 2003, 4:53 a.m.
Thanks to all of you who took interest in my humble Mini-Challenge
and did contribute your solutions, I hope you had a good time
attacking it. My original solutions have more or less been duplicated by some of
you (congratulations !), so these are my original versions and relevant comments:
Best regards from V.
- The standard, 'loop' solution:
01 LBL A
02 STO I
05 LBL 0
06 DSE I
08 DSE I
09 GTO 0
This is pretty straightforward, short and sweet, it's probably the simplest approach anyone would try, and has the maximum range (up to N=60), but it has a big problem: it's painfully slow. Something more elaborate is definitely required.
- Pascal's solution, based in
f(n) = (2n)!/(2^n * n!)
is more or less
the first one to be stumbled upon by any sophisticated user dealing with this problem.
It's fast and short, but regrettably N cannot exceed 34 because else
(2n)! would exceed 10^100. This is my original version for this kind of solution, which
was one step shorter and used RI instead of R0, because R0 is dedicated
for matrix operations, while RI is *not* specific for indexing as a number
of registers can work as indexes in the HP-15C:
01 LBL A
02 STO I
06 RCL I
- Stephan's solution (and several others as well) is the second one to
be found, in an attempt to expand the range beyond 34. It makes good
use of the [Py,x] function, using this formula:
f(n) = P(2*n, n)/2^n
which works up to n = 52, but no further. My original version for this kind of solution
used instead the formula
f(n) = P(2*n-1, n)/2^(n-1)
which, though more complicated, can still be evaluated in just one more
step. The idea was that P(2*n-1, n) is a number *smaller* than P(2*n, n) is
(exactly half, to be precise) so perhaps the range could be extended to
n = 53, instead of n = 52. Unfortunately that doesn't pay here, because
while P(106, 53) exceeds 1E100, P(105, 53) exceeds that limit too, so
nothing is gained in this case. My program evaluating that formula was:
01 LBL A
02 STO I
03 RCL I
04 DSE I
09 RCL I
which works for N=1 to N=52. Notice that for N=1, the DSE I at step 04
does decrement I to 0 and so causes step 05 to be skipped, but as it is a RCL+I, it would simply
add 0 to the X register, so skipping it is of no consequence whatsoever and thus the
final result is valid.
The problem with this kind of solution is that while it greatly extends
the range from N <= 34 to N <= 52, it still doesn't reach the maximum
possible value N = 60, and worse, for large N it's *very* slow.
- Finally, Csaba was the first to stumble upon my intended original best solution
for this mini-challenge, making use of the beautiful formula:
f(n) = 2^n * (n-1/2)!/Sqrt(Pi)
which nicely avoids the evaluation of (2*n)!, so greatly extending the
range for N. It's quite peculiar that now the term 2^n appears in the numerator,
rather than the denominator.
That this formula actually works is easily demonstrated by making use of two simple, well-known facts,
n! = n * (n-1)! and (1/2)! = Sqrt(Pi)/2
given that, we have:
2^n * (n-1/2)! / Sqrt(Pi)
and my original solution was:
= 2^n * (n-1/2)*(n-3/2)* ... * 5/2 * 3/2 * (1/2)! / Sqrt(Pi)
= 2^n * (2*n-1)/2 * (2*n-3)/2 * ... 5/2 * 3/2 * (1/2)! /Sqrt(Pi)
= 2^n * (2*n-1) * (2*n -3) * ... * 5 * 3 * Sqrt(Pi)/2 / 2^(n-1) / Sqrt(Pi)
= (2*n-1) * (2*n-3) * ... * 5 * 3, q.e.d.
01 LBL A
which differs from Csaba's only in that he does first the multiplication,
then divides by Sqrt(Pi), while I did exactly the opposite. In general, it's more advisable to first perform
the division, then the multiplication, in order to avoid any possible overflow
though in this particular case there's no overflow for N=60 in either case, but
Csaba's version has a maximum limit of N = 60.436+ without overflowing, while
my version extends the maximum limit to N = 60.555+, which is the theoretically
largest possible value such that f(N) < 1E100.
There's also another important difference in this case between multiplying first,
dividing second (as Csaba does) and doing the opposite as my solution does, and
it has to do with rounding errors. For N = 10, both solutions perform as follows:
Csaba's solution: f(10) = 654,729,074.8
My original solution: f(10) = 654,729,075
Exact value: f(10) = 654,729,075
this is, Csaba's ordering results in a greater rounding error in this particular case.
It's thus advisable to perform the division first for reasons other than to avoid
potential overflows and reduced maximum range.
My improved original solution also catered for the fact Stefan rediscovered, i.e.,
by performing a one-time initialization consisting in storing 1/2 and Sqrt(Pi)
in registers you can save as much as 4 program steps and roughly 1 full second !
My original final best solution was this (with 0.5 in R0, Sqrt(Pi) in R1):
01 LBL A
which differs from Csaba's implementation of Stefan's idea not only in that
the division is performed first, like in the previous solution, but also in
that it uses R0 and R1 instead of Csaba's R8 and R9, the rationale being
that R0 and R1 are *permanent* registers, which will *always* exist whatever
the allocation of memory might be, while R8 and R9 could easily be unavailable,
allocated for program steps or as part of the matrix/advanced functions pool. It may seem a minor point, but when you're all set for the very optimum solution, every detail counts.
IMHO, this is the best solution possible. It's very short (only 12 steps in
machines without recall arithmetic, such as the HP-11C and HP-34C), has the
range (up to N = 60.555+), works even for non-integer arguments, uses the least
resources, and in my Brazilian-made HP-15C runs in 2.7 seconds for N = 52.
I acquired this HP-15C recently (for US $100 !) and it came with no batteries
so I fitted three new ones. Perhaps this accounts for its speed, or else
perhaps Brazilian units feature a newer, slightly faster CPU, but it's certainly
the fastest HP-15C among all my five units, and a pleasure to use !
Finally, some assorted values of f(N) as computed by my solution above:
f(Pi) = 19.40741413
f(10) = 654,729,075
f(52) = 2.835225443 E+82
f(60) = 6.972993462 E+98
f(60.5557) = 9.998554144 E+99