Re: What should we get, Part 2 Message #3 Posted by Rodger Rosenbaum on 1 Dec 2006, 2:18 a.m., in response to message #2 by John Smitherman
John,
Did you read the first thread, "What should we get?", (without a part no.)?
The idea here is that if a 12 digit calculator carries out this "benchmark" , it isn't going to get exactly 2500, and you shouldn't expect that it would. And, in fact, you shouldn't give more credit to a (ndigit) calculator that gets closer to 2500 than (another ndigit) one that doesn't. You should give points to the ndigit calculator that get closest to what an ndigit calculator should get, and that is never 2500 exactly.
Of course, the result obtained by a calculator with more digits is going to be closer to 2500 than that obtained by a calculator with fewer digits, but neither will be exactly 2500 if it does proper BCD floating point arithmetic.
The philosophy that HP's floating point arithmetic embodies is that every individual calculation should return what one would get if it were done with infinite precision and then properly rounded to an ndigit result.
And, if this is done, there will be inevitable errors caused by the fact that the result of one calculation is only n digits wide, and then will be used as the input argument of the next calculation.
But, there will be only one "correct" result at every step of the way if the aforementioned philosophy is followed.
So, the way to evaluate a calculator is to first see if it's doing what it should do (for an ndigit calculator), before comparing it to other calculators.
In fact, if all calculators did their arithmetic properly (that is, according to the philosophy above), all you would have to do to know how well a given calculator performs would be to find out how many digits it carried; all ndigit calculators would behave the same if only they all did their arithmetic correctly. I'm referring to those mathematical operations required to be done properly by the IEEE Floating Point standard, such as +, , *, /, SQRT, etc. The performances of higher functions such as trig and other transcendentals are not specified, and may have larger errors. When analyzing the performance of a particular calculator on the Savage benchmark, if the errors involve those unqualified functions, that is more excusable than if an error occurs with SQRT or *.
In the case of this Savage benchmark, 12digit calculators that round to even should get 2499.99942402 as a final result. I am astounded that the 33S does as well as it does, especially in light of the errors in the TAN function recently discussed.
The HP71 performance is almost as good. It makes 3 very tiny mistakes. The first occurs when the loop variable I is 1470. At that point, the arctangent of 1469.99992223 must be calculated. The exact result is 1.5701160547549999736+; the HP71 returns 1.57011605476. It shouldn't have rounded up the trailing ...475 to ...476. But, it would have to have calculated the unrounded result to more than its internal 15 digits to have made the proper decision, so I excuse it!
What does the 33S return for ATN(1469.99992223)?
The HP71 makes similar mistakes when I = 1648 and when I = 1690. The first two recover after more iterations, but the last one doesn't, and leads to a final result of 2499.99948647. If you trap the program when I = 1690 and substitute the correct result for the ATN, then let the program resume, the the final result is 2499.99942402. Knowing what the final result should be for an ndigit calculator allows one to make an initial assessment of a calculator. If the calculator gets what it should, you have evidence that the calculator is doing very good arithmetic. If it doesn't, then you can compare the intermediate results with some known good results (use a PC based mathematics program to get known accurate values), and find out where and how often the calculator makes a mistake. If they aren't the excusable sort the HP71 makes, then you can evaluate the quality of the arithmetic done by the calculator.
In Wlodek's article, he gives a result for the HP41 of 2499.970322; is this what we should expect?
For the HP75C, he gives a result of 2499.99942403, the same result you got for the 33S. However, I think the result should be 2499.99942402. I wonder where that 1 count difference in the LSD is coming from. I might have to break out the HP75C.
No ndigit calculator, 8 <= n <= 16, should get exactly 2500 as a final result from this benchmark. If it does, then it is committing "avoidable error".
The HP71 has the property that its rounding modes can be varied, and you get different results from the Savage benchmark in each case. Mini challenge: Determine what a 12digit calculator should get for the 4 rouding modes available in the HP71. Does the HP71 get those results?
And, as I said in the first post:
"For the true masochists, what should a calculator get that does n digit arithmetic, 8 <= n <= 16?"
Of course, I meant to make the assumption that proper roundtoeven is done, and the philosophy discussed above is followed.
