The Museum of HP Calculators

HP Forum Archive 16

[ Return to Index | Top of Index ]

What should we get, Part 2
Message #1 Posted by Rodger Rosenbaum on 29 Nov 2006, 3:00 p.m.

Suppose we continue to embrace the same philosophy I described in the first post on this topic, "don't commit avoidable error".

My discussion is limited to calculators that do finite precision BCD floating point arithmetic.

In the V8N1 issue of Datafile, Wlodek Mier-Jedrzejowicz had an article about the "Savage" benchmark. The benchmark is called "Savage" not because it is vicious, but because Bill Savage presented it in Byte magazine.

It is, in HP71 Basic:

5 RADIANS
10 A=1
20 FOR I=1 TO 2499
30 A=TAN(ATN(EXP(LOG(SQR(A*A)))))+1
40 NEXT I
50 PRINT A

Upon first examination, one thinks that the calculator should get a result of 2500.00000000 if it did everything as it should. But this is not so. The HP71 actually gets 2499.99948647. Should we be disappointed by this result? (All the other Saturn based machines, such as the HP48 get this result also.) What result should we expect, if not 2500.00000000?

For the true masochists, what should a calculator get that does n digit arithmetic, 8 <= n <= 16?

      
Re: What should we get, Part 2
Message #2 Posted by John Smitherman on 29 Nov 2006, 7:17 p.m.,
in response to message #1 by Rodger Rosenbaum

Hi Rodger. This is an interesting exercise. On the 33s I got 2,499.99942403 - not exactly 2,500 but very close. Is it close enough? Can someone verify this result?

Thanks,

John

            
Re: What should we get, Part 2
Message #3 Posted by Rodger Rosenbaum on 1 Dec 2006, 2:18 a.m.,
in response to message #2 by John Smitherman

John, Did you read the first thread, "What should we get?", (without a part no.)?

The idea here is that if a 12 digit calculator carries out this "benchmark" , it isn't going to get exactly 2500, and you shouldn't expect that it would. And, in fact, you shouldn't give more credit to a (n-digit) calculator that gets closer to 2500 than (another n-digit) one that doesn't. You should give points to the n-digit calculator that get closest to what an n-digit calculator should get, and that is never 2500 exactly.

Of course, the result obtained by a calculator with more digits is going to be closer to 2500 than that obtained by a calculator with fewer digits, but neither will be exactly 2500 if it does proper BCD floating point arithmetic.

The philosophy that HP's floating point arithmetic embodies is that every individual calculation should return what one would get if it were done with infinite precision and then properly rounded to an n-digit result.

And, if this is done, there will be inevitable errors caused by the fact that the result of one calculation is only n digits wide, and then will be used as the input argument of the next calculation.

But, there will be only one "correct" result at every step of the way if the aforementioned philosophy is followed.

So, the way to evaluate a calculator is to first see if it's doing what it should do (for an n-digit calculator), before comparing it to other calculators.

In fact, if all calculators did their arithmetic properly (that is, according to the philosophy above), all you would have to do to know how well a given calculator performs would be to find out how many digits it carried; all n-digit calculators would behave the same if only they all did their arithmetic correctly. I'm referring to those mathematical operations required to be done properly by the IEEE Floating Point standard, such as +, -, *, /, SQRT, etc. The performances of higher functions such as trig and other transcendentals are not specified, and may have larger errors. When analyzing the performance of a particular calculator on the Savage benchmark, if the errors involve those unqualified functions, that is more excusable than if an error occurs with SQRT or *.

In the case of this Savage benchmark, 12-digit calculators that round to even should get 2499.99942402 as a final result. I am astounded that the 33S does as well as it does, especially in light of the errors in the TAN function recently discussed.

The HP71 performance is almost as good. It makes 3 very tiny mistakes. The first occurs when the loop variable I is 1470. At that point, the arctangent of 1469.99992223 must be calculated. The exact result is 1.5701160547549999736+; the HP71 returns 1.57011605476. It shouldn't have rounded up the trailing ...475 to ...476. But, it would have to have calculated the unrounded result to more than its internal 15 digits to have made the proper decision, so I excuse it!

What does the 33S return for ATN(1469.99992223)?

The HP71 makes similar mistakes when I = 1648 and when I = 1690. The first two recover after more iterations, but the last one doesn't, and leads to a final result of 2499.99948647. If you trap the program when I = 1690 and substitute the correct result for the ATN, then let the program resume, the the final result is 2499.99942402. Knowing what the final result should be for an n-digit calculator allows one to make an initial assessment of a calculator. If the calculator gets what it should, you have evidence that the calculator is doing very good arithmetic. If it doesn't, then you can compare the intermediate results with some known good results (use a PC based mathematics program to get known accurate values), and find out where and how often the calculator makes a mistake. If they aren't the excusable sort the HP71 makes, then you can evaluate the quality of the arithmetic done by the calculator.

In Wlodek's article, he gives a result for the HP41 of 2499.970322; is this what we should expect?

For the HP75C, he gives a result of 2499.99942403, the same result you got for the 33S. However, I think the result should be 2499.99942402. I wonder where that 1 count difference in the LSD is coming from. I might have to break out the HP75C.

No n-digit calculator, 8 <= n <= 16, should get exactly 2500 as a final result from this benchmark. If it does, then it is committing "avoidable error".

The HP71 has the property that its rounding modes can be varied, and you get different results from the Savage benchmark in each case. Mini challenge: Determine what a 12-digit calculator should get for the 4 rouding modes available in the HP71. Does the HP71 get those results?

And, as I said in the first post:

"For the true masochists, what should a calculator get that does n digit arithmetic, 8 <= n <= 16?"

Of course, I meant to make the assumption that proper round-to-even is done, and the philosophy discussed above is followed.

                  
Re: What should we get, Part 2
Message #4 Posted by John Smitherman on 2 Dec 2006, 9:38 a.m.,
in response to message #3 by Rodger Rosenbaum

Quote:

What does the 33S return for ATN(1469.99992223)?


The 33s returns 1.57011605476.

Regards,

John

                        
Re: What should we get, Part 2
Message #5 Posted by Rodger Rosenbaum on 2 Dec 2006, 5:56 p.m.,
in response to message #4 by John Smitherman

So it does what the Saturn machines do.

But, as I mentioned earlier, the calculator can recover from this error. The error made by the Saturn machines, and from which they don't recover is the calculation of ATN(1689.99993538).

What does the 33S get for this calculation?

If it gets 1.57020461086 rather than the 1.57020461087 which the Saturn machines get, then that would explain why it gets almost exactly the proper result for the whole benchmark.

                              
Re: What should we get, Part 2
Message #6 Posted by John Smitherman on 2 Dec 2006, 8:52 p.m.,
in response to message #5 by Rodger Rosenbaum

Quote:

But, as I mentioned earlier, the calculator can recover from this error. The error made by the Saturn machines, and from which they don't recover is the calculation of ATN(1689.99993538).

What does the 33S get for this calculation?

If it gets 1.57020461086 rather than the 1.57020461087 which the Saturn machines get, then that would explain why it gets almost exactly the proper result for the whole benchmark.


Rodger the 33s returns 1.57020461086.

Reards,

John

      
Re: What should we get, Part 2
Message #7 Posted by Chris W on 30 Nov 2006, 1:21 p.m.,
in response to message #1 by Rodger Rosenbaum

I would be willing to bet that the number of devices that can distiguish betwee 2499.99948647 and 2500 (if any exist at all) can be counted on one hand.

In the real world, if you can build something accurate to 3 significant digits, you are doing good. To 4 significant digits, you are doing very good. 5 significant digits is what the highest quality ball bearings are made to. Anything over 5 is amazing.

In other words, 2499.99948647 is plenty close enough to 2500.

However, it is easy to get round off errors that are much greater than that and would be cause for concern. Try the quadradic formula when b is very large and a and c are very small.

Chris W

            
Re: What should we get, Part 2
Message #8 Posted by Dave Shaffer on 30 Nov 2006, 6:11 p.m.,
in response to message #7 by Chris W

"I would be willing to bet that the number of devices that can distiguish betwee 2499.99948647 and 2500 (if any exist at all) can be counted on one hand"

I'll take your bet! There are numerous brands and models (and thousands of units!) of frequency counters that are precise to 9 or 10 digits (and that accurate, too, if they are being fed a proper frequency reference: a cesium beam frequency standard is good to parts in 10E12 absolute, and a hydrogen maser is relatively stable to parts in 10E14 on tens of minutes time scales).

      
Re: What should we get, Part 2
Message #9 Posted by John Smitherman on 30 Nov 2006, 9:25 p.m.,
in response to message #1 by Rodger Rosenbaum

Rodger, it would be interesting to see how a 30s / 9g would perform with this test. I have a 30s but it's not programmable and I don't relish hitting a combination of keys 2,500 times. Is anyone who owns a 9g interested in trying this?

Regards,

John

            
Re: What should we get, Part 2
Message #10 Posted by Mark A. Ordal on 1 Dec 2006, 1:28 p.m.,
in response to message #9 by John Smitherman

Note that the iterative part of the Savage Benchmark is for testing the system performance of the various functions involved.

The "2500" is simply the exact mathematical result for the calculation performed by the last iteration of the loop (when A=2499). So no programming is required to see how closely your 30S or 9G comes to 2500.

More details on the Savage Benchmark can be found here:

<http://www.technicalc.org/tiplist/en/files/pdf/tips/tip6_50.pdf>

--Mark

                  
Re: What should we get, Part 2
Message #11 Posted by Dave Shaffer on 1 Dec 2006, 3:48 p.m.,
in response to message #10 by Mark A. Ordal

That's not right, at least as given by Rodger's BASIC listing.

This line: 30 A=TAN(ATN(EXP(LOG(SQR(A*A)))))+1

makes the value of A depend on all that has happened previously in the loop. If A*A is replaced by I, then your statement is correct.

                        
Re: What should we get, Part 2 - OOOOPS
Message #12 Posted by Dave Shaffer on 1 Dec 2006, 3:50 p.m.,
in response to message #11 by Dave Shaffer

That should be I*I , not just I, in my last sentence.

                        
Re: What should we get, Part 2
Message #13 Posted by John Smitherman on 2 Dec 2006, 10:00 a.m.,
in response to message #11 by Dave Shaffer

Quote:
That's not right, at least as given by Rodger's BASIC listing.

This line: 30 A=TAN(ATN(EXP(LOG(SQR(A*A)))))+1

makes the value of A depend on all that has happened previously in the loop. If A*A is replaced by I, then your statement is correct.


I agree Dave, because it is iterative the error accumulates.

For any who is interested, here is the code I wrote for the 33s to solve this:

LBL A
1
STO A
STO I
LBL B
RCL A
x^2
\|x (Square root of x)
LN
e^x
ATAN
TAN
1
+
STO A
1
STO+ I
RCL I
2,499
x>=y?
GTO B
RCL A
RTN

Comments are welcome.

Regards,

John

                              
Re: What should we get, Part 2
Message #14 Posted by John Smitherman on 2 Dec 2006, 10:27 a.m.,
in response to message #13 by John Smitherman

Out of curiousity I coded this challenge on the TI 83+ SE and the result was 2,499.999992.

Here's my code:

1->A
For(I,1,2499)
tan(tan-1(e^(ln(\|(A2)))))+1->A
End
Disp A

Comments are welcome.

Regards,

John

                                    
Re: What should we get, Part 2
Message #15 Posted by Gerson W. Barbosa on 2 Dec 2006, 11:46 a.m.,
in response to message #14 by John Smitherman

From the aforementioned Wlodek Mier-Jedrzejowicz's article:

"...on older HP calculators trig is accurate to almost 12 digits, but the policy of removing the last two digits, instead of leaving them hidden, can lead to less accuracy too. Newer HP calculators with a Saturn CPU work to 15 digits precision (and accuracy) internally, but round results to only 12 digits at the end of each calculation - the TI-74 works to 14 digits accuracy and precision, and leaves all 14 digits on the stack (with 4 hidden digits), so the final accuracy of the TI-74 result is 2500.0000291436."

Truncating the extra internal digits as HP does increases cumulative errors needlessly. Consider, for instance, Free42 which returns 2500.000000000000729718363 in this benchmark. Had it followed HP's philosophy the result would have been 2499.99948647, which is much less accurate. This feature could be left as an option to the user, though.

Also out of curiosity, I ran the benchmark on my first (acutally on an emulator: http://codigolivre.org.br/frs/?group_id=1367&release_id=1598) and my second computer. The results were 2500.00009 and 2500.0002592983, respectively.

By the way, the original benchmark program can be found here (Page 12, Column 1) :

http://www.amigau.com/68K/dg/dg25.htm

Regards,

Gerson.

                                          
Re: What should we get, Part 2
Message #16 Posted by Rodger Rosenbaum on 2 Dec 2006, 5:49 p.m.,
in response to message #15 by Gerson W. Barbosa

Quote:
Truncating the extra internal digits as HP does increases cumulative errors needlessly. Consider, for instance, Free42 which returns 2500.000000000000729718363 in this benchmark. Had it followed HP's philosophy the result would have been 2499.99948647, which is much less accurate. This feature could be left as an option to the user, though.

HP doesn't truncate the internal digits; it rounds to even.

HP's philosophy is to perform each calculation as if it were done with infinite precision (where possible, and it is possible with the 4 basic arithmetic operations and square root, etc.), and then round (to even) the result to the (maximum) n digit number displayed by the calculator.

This is the best result that can be had with an n digit calculator.

There seem to be 3 basic techniques to be found in calculators.

1. Do the calculations with n+k digits, keep n+k digits for subsequent calculations, but only display n digits. This is the TI way, with n=12 and k=2 for the TI-86, for example.

2. Do the calculations with n+k digits, but round the result to n digits for display and subsequent calculations. This is the HP way, with n=12 and k=3 in Saturn machines.

3. Do everything with n digits, and don't bother with extra digits to get properly rounded results. Many low-end calculators seem to use this method.

I don't know that I would say that method 2 "...increases cumulative errors needlessly". Any method accumulates error, simply because finite precision n digit BCD floating point arithmetic is only finite precision.

The merits and demerits of the first two methods have been argued to death over the years, and I'm not going to get into it again here.

2500.000000000000729718363 is a 25 digit result. Apparently Free42 does at least 25 digit calculations, but I don't know if it uses extra internal digits or not (or how many). How could following HP's philosopy (which isn't to truncate the extra internal digits, if there are any), reduce a 25 digit result to a 12 digit result? The HP philosophy is to round the internally calculated result to n digits for an n digit calculator, which is apparently 25 digits on Free42.

But, if Free42 were to round each calculation to 12 digits (the better to emulate a real HP42) before displaying and before using the result in a subsequent calculation, (in effect an HP42 which had 13 extra internal digits), then it would give a result as though the emulated HP42 didn't make the mistake I mentioned in an earlier post:

"The HP71 performance is almost as good. It makes 3 very tiny mistakes. The first occurs when the loop variable I is 1470. At that point, the arctangent of 1469.99992223 must be calculated. The exact result is 1.5701160547549999736+; the HP71 returns 1.57011605476. It shouldn't have rounded up the trailing ...475 to ...476."

If Free42 did this, then it would get 2499.99942402 rather than 2499.99948647, because the extra 13 internal digits would be more than sufficient to avoid the tiny mistakes that lead to a result of 2499.99948647 on real Saturn based machines.

This demonstrates one way in which simulators/emulators may not get the same result as the actual calculator they are emulating. The HP42 is a 12 digit machine, but Free42 carries more digits than that, and therefore returns a more precise (and probably more accurate) result than a real HP42.

The old HP71 Forth emulator for the HP41 had the same characteristic. Since the HP71 used 12 digits, you would often get a different result from a calculation than a real HP41, with its 10 digits.

If Free42 carries only 25 digits for the user accessible results (has no hidden digits as TI calculators are wont to have), it should have gotten 2500.00000 00000 00000 01104 9 rather than 2500.00000 00000 00729 71836 3, had it followed HP's philosophy (round to even).

If Free42 performed each separate calculation as though with infinite precision, but then applied different rounding rules, the results should have been:

Rounding mode             Final result

Round up 2500.00000 00000 00002 59023 5 Truncate 2499.99999 99999 99997 46933 3

It appears that Free42's actual result, 2500.00000 00000 00729 71836 3, is worse than any of these, so I wonder what it's doing for arithmetic? At best, it's doing what the HP30S does. Rather than do proper arithmetic, just overwhelm the problem by carrying many extra digits, and throw a lot of them away in the final results.

This is easier to do nowadays given the state of the art in microelectronics. But in the early days of calculators, carrying 25 digits to get a final result of 12 digits was a (not very elegant) shortcut the designers of the day couldn't afford.

                                                
Re: What should we get, Part 2
Message #17 Posted by Gerson W. Barbosa on 2 Dec 2006, 6:53 p.m.,
in response to message #16 by Rodger Rosenbaum

Thanks for your convincing explanation and sorry for my confusion between truncation and rounding. I was fixing a program this afternoon because of a truncation problem and had kept the word in my mind.

Being a layman in this matter I may have been impressed by the nice 2500 showing in the Free42 displayed after twenty-five hundred iterations, the same way people got impressed with the HP-30S returning a perfect 9 in the forensic test.

            
Re: What should we get, Part 2
Message #18 Posted by Marcus von Cube, Germany on 7 Dec 2006, 4:49 a.m.,
in response to message #9 by John Smitherman

Here is the program for the 9G:

INPUT N;
A=1;
FOR(I=1;I<N;++I){
A=tan(tan-1(e^(ln(|/(A*A)))))+1;
}
PRINT A;
END;
|/ denotes the square root operator. ++ is a special operator found in the INST menu.

The program editor is horrible, since you can easily (and accidently) split lines but I didn't find a way to join the fragments again!

The result is interesting: exactly 2500. A-N returns zero.

Marcus

Edited: 7 Dec 2006, 5:06 a.m.

                  
Re: What should we get, Part 2
Message #19 Posted by Rodger Rosenbaum on 8 Dec 2006, 7:06 a.m.,
in response to message #18 by Marcus von Cube, Germany

Have you tried to determine the characteristics of the 9G?

How many digits does it carry, and are any of them hidden?

Is it BCD or binary internally?

                        
Re: What should we get, Part 2
Message #20 Posted by Marcus von Cube, Germany on 8 Dec 2006, 8:04 a.m.,
in response to message #19 by Rodger Rosenbaum

The 9g seems to work in binary:

1/3 returns 0.333333333 to the display.

Ans*1E9-INT(Ans*1E9) returns the following sequence if applied repeatedly:

0.333333333
0.333333037

You can press Enter several more times before you get a zero in the display.

So it has 22 significant digits but returns garbage thereafter.

Marcus

                              
Re: What should we get, Part 2
Message #21 Posted by Rodger Rosenbaum on 8 Dec 2006, 3:43 p.m.,
in response to message #20 by Marcus von Cube, Germany

It would be interesting if you would apply some of the tests I applied to the HP30S in this thread:

http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/archv015.cgi?read=85973#85973

                                    
Re: What should we get, Part 2
Message #22 Posted by Marcus von Cube, Germany on 9 Dec 2006, 4:08 a.m.,
in response to message #21 by Rodger Rosenbaum

I'm too lazy to test it all, but a quick check for the standard forensics expression returns an exact 9. So I assume, the 9G is using the same algorithms as the 30s.

Marcus

                                          
Re: What should we get, Part 2
Message #23 Posted by Rodger Rosenbaum on 9 Dec 2006, 4:19 a.m.,
in response to message #22 by Marcus von Cube, Germany

Could you try the last test in that thread:

"So finally, this is why the HP30S gets exactly 9.00000000000000000 when running the calculator forensics test. To see the real error, one should subtract the input argument of the test from the final result. That is, do:

arcsin(arcos(arctan(tan(cos(sin(9))))))-9

On the HP30S, the result is 0, leading one to believe that the HP30S is *really* accurate. But redo the test as:

arcsin(arcos(arctan(tan(cos(sin(9.1))))))-9.1

to foil its rounding near integers and see a result: 4.833288903E-11"

If you get the same error as the HP30S for the forensics test with a starting value of 9.1 instead of 9, then I would consider that good evidence that the 9G is using the same algorithms, and probably even the same firmware engine.

                                                
Re: What should we get, Part 2
Message #24 Posted by Marcus von Cube, Germany on 9 Dec 2006, 4:57 a.m.,
in response to message #23 by Rodger Rosenbaum

Quote:
arcsin(arcos(arctan(tan(cos(sin(9.1))))))-9.1

I get -1.056805951 x 10-18

So the results are, in fact, different.

                  
Re: What should we get, Part 2
Message #25 Posted by John Smitherman on 8 Dec 2006, 7:52 a.m.,
in response to message #18 by Marcus von Cube, Germany

Quote:

The result is interesting: exactly 2500. A-N returns zero.


Thanks Marcus. What about (A-N)*1e6?

Regards,

John

                        
Re: What should we get, Part 2
Message #26 Posted by Marcus von Cube, Germany on 8 Dec 2006, 8:07 a.m.,
in response to message #25 by John Smitherman

Quote:
What about (A-N)*1e6?

If the result of A-N were nonzero, the calc should have indicated it, shouldn't it? The result looks very much like an exact integer. This is probably a "special rounding" because all input is integer, too.

Marcus


[ Return to Index | Top of Index ]

Go back to the main exhibit hall