# HP Forums

Full Version: HP-71B internal summation weakness in Math ROM
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
The Math ROM 1A has a weakness related to the internal 15-digit summation operation used in matrix operations.
The issue is that the Math ROM is using a 15-digit truncating addition for the internal summation with a final 12-digit rounding, instead of rounding each addition result.

I already mentioned the problem in another thread, but here is a more demonstrative example:
Computing DOT([-10000 (1/3) 10000],[1 1 1]):
> DIM X(3),Y(3) (assuming OPTION BASE 1)
> MAT INPUT X
X(1)? -10000,1/3,10000
> MAT Y=(1)
> DOT(X,Y)

This operation computes (-10000)+(1/3)+(10000) with 15 digits
and should return .33333333333 if correctly rounded, as do the HP-28S and the HP-42S,
but the Math ROM 1A is returning .33333333334 .

This weakness is causing the HP-71B Math ROM 1A to deliver answers that may be slightly different from later Saturn-based machines for several matrix operations using summations such as dot product, norms, matrix product and inversion, determinant and system solving.

For example, computing the determinant of the matrix [[69 58 96][51 43 71][32 55 54]]:
> DIM A(3,3)
> MAT INPUT A
A(1,1)? 69,58,96
A(2,1)? 51,43,71
A(3,1)? 32,55,54
> DET(A)

The operation returns 1.00000038313 on the HP-71B with Math ROM 1A.
The 28S , 42S return 1.00000038333 instead.

This is even more visible on the very ill-conditioned matrices built some years ago by Valentin Albillo (see his great article here).
For instance with the AM1 matrix:
Det = .970950561960 on the HP-71B Math ROM 1A,
Det = .970960198039 on the HP-28S, 42S.

I don't consider this weakness as a bug since the HP-71B results are not particularly wrong (in the sense of: not worst than the 28S/42S - the exact determinants of the above examples are 1), but the Math ROM 1A is not always consistent with later machines.

The weakness was probably identified quite early by HP, because the 1987 HP-28C (first scientific RPL machine) correctly manages the rounded 15-digit summations, at a time when the HP-71B and its Math ROM were still in production.
Unfortunately, the HP-71B Math ROM has never been updated, ... until today :-)

I'm particularly pleased to have solved the issue in the latest version of the new "Math Pac 2", now giving *exactly* the same results than the HP-28S and 42S, and moreover very happy to know what was the source of the discrepancy.

J-F
Due to the lack of a "Like" button here are my personal kudos:-)
.
Hi, J-F:

(06-03-2020 06:35 PM)J-F Garnier Wrote: [ -> ]The Math ROM 1A has a weakness related to the internal 15-digit summation operation used in matrix operations. [...] I'm particularly pleased to have solved the issue in the latest version of the new "Math Pac 2", now giving *exactly* the same results than the HP-28S and 42S, and moreover very happy to know what was the source of the discrepancy.

My most sincere congratulation and heartfelt thanks and also thanks a lot for mentioning and kindly praising my article, much appreciated.

Let me assure you that I would join you in improving the already-awesome Math ROM with both ideas and code if I were able to run it in my current hardware/OS combination but alas, I can't as your Emu71 doesn't run in Windows 64-bit and I have no other options available right now.

Now a few questions if you don't mind:

1) I know that the HP-71B uses 15 digits for internal computations but I've noticed since ever that something as this evaluation in the command line (and in a running program, of course):

1/3 + 1/3 +1/3

gives .999999999999. Same thing with 3*1/3 which also gives the 9's.

Why is this ? Why aren't all expressions internally evaluated to full 15-digit internal precision, then rounded to 12 digit for output to the user (or storage in some variable) ?

I can't understand the rationale for evaluating internally the temporary result of 1/3 with 15 digits, then instead of continuing using this value in the still internal computation, it actually gets rounded to 12-digit and the computation continues, still internally. Thus you get 3 unnecessary, precision-losing roundings while computing the expression, plus a 4th rounding to 12 decimals when performing the sum of the 3 intermediate results and reporting the rounded result to the user.

I've seen lots of discussion over the years whether the (HP-41C, say) 10-digit of precision with no guard digitst of HP was preferable or not to the 12-13 digits with 2-3 guard digits of TI. Frankly, I much prefer the TI approach as it can be exploited to get additional accuracy in many cases while HP does the precision-losing nonsense just described.

2) This has nothing to do with your Emu71 as I can't test it but I'd be interested in your personal opinion:

I've noticed that go71b, another (very buggy and unreliable, unlike your excellent, reliable Emu71) HP-71B emulator using actual HP-71B ROM code is more than 60 times slower than Free42 for the very same program (BASIC and RPN, respectively) and running in the very same hardware.

Might the reason be that simulated RPN code is that much faster than emulated HP-71B BASIC/Assembler code ? 60 times ?. What do you think ?

As a side note, go71B lies downright with the timing. It outputs times (using the TIME function) that are 2.5x faster than real time (i.e.: it says something took 1 min. but it actually took 2.5 min. of actual, real time).

Again, thanks and best regards.
V.
This is a common problem in numerical analysis. Similar problems arise in the fused multiply_sum stuff on number crunching chips. It's a funny problem from one point of view; the accuracy can be shown to be better (it's like carrying guard digits) but the analysis is much harder. The guys who wrote LINPACK and its successors may have some publications on the subject.
(06-03-2020 11:51 PM)Valentin Albillo Wrote: [ -> ]Let me assure you that I would join you in improving the already-awesome Math ROM with both ideas and code if I were able to run it in my current hardware/OS combination but alas, I can't as your Emu71 doesn't run in Windows 64-bit and I have no other options available right now.

Emu71 may not run on Windows 64-bit but Emu71 Windows certainly does if that's an option.

(06-03-2020 11:51 PM)Valentin Albillo Wrote: [ -> ]Hi, J-F:
Let me assure you that I would join you in improving the already-awesome Math ROM with both ideas and code if I were able to run it in my current hardware/OS combination but alas, I can't as your Emu71 doesn't run in Windows 64-bit and I have no other options available right now.
Hi Valentin !

Valentin, I strongly encourage you to use DOSBox to continue to use Emu71/DOS !
DOSBox is easy to install and use, and it nicely shares the host OS file system, contrary to other solutions like VirtualBox.
Even if the performance is reduced due to the two emulation layers, Emu71 still runs at about x8 speed on my quite old core-i3 machine (Win10-64).
So it's enough for short tests or to quickly try new ideas. I'm using it very often, and go to VirtualBox (or my very old 32-bit W2K system) when I really need speed.

Quote:1) I know that the HP-71B uses 15 digits for internal computations but I've noticed since ever that something as this evaluation in the command line (and in a running program, of course):

1/3 + 1/3 +1/3

gives .999999999999. Same thing with 3*1/3 which also gives the 9's.

Why is this ? Why aren't all expressions internally evaluated to full 15-digit internal precision, then rounded to 12 digit for output to the user (or storage in some variable) ?

The reason is that each elementary math operation is made on 15 digits and rounded to 12 digits, so the process is
1/3= .333333333333333 (15-dig). rounded to .333333333333
then .333333333333 +.333333333333 is computed as .333333333333000 + .333333333333000 and rounded again
and so on.

Keeping the 15-digit value all along the expression evaluation will create consistency issues, for instance doing A=1/3 @ A+A+A would return a different result, since variables are holding the packed 12-digit forms only.

I believe the HP-71B math core system was designed with this principle in mind: each elementary math operation must provide the best result (that is the closest approximation to the exact value). This was probably inspired by Prof. Kahan (who was involved in the HP-15C and HP-71B Math ROM developments). This explains the use of "infinite precision" 15-digit truncating then 12-digit rounding.

It is possible to compute expressions on 15 digits for a few cases with the resources of the Math ROM such as complex numbers or matrices.
For instance the product of two complex numbers (x1,y1)*(x2,y2) provides (x1*x2-y1*y2,x1*y2+x2*y1) each part completely evaluated on 15 digits. It can be combining with the DOT function to sum several terms.

Quote:I've noticed that go71b, another (very buggy and unreliable, unlike your excellent, reliable Emu71) HP-71B emulator using actual HP-71B ROM code is more than 60 times slower than Free42 for the very same program (BASIC and RPN, respectively) and running in the very same hardware.

Might the reason be that simulated RPN code is that much faster than emulated HP-71B BASIC/Assembler code ? 60 times ?. What do you think ?

An emulator adds the overhead of the CPU/hardware simulation; Free42 is a native application. All depends on the performance of the emulation engine, I can't really judge the x60 ratio but it's the order of magnitude we can expect.
As a comparison, Emu71/DOS runs at about x300 speed natively on my W2K 32-bit system, and at x8 speed in DOSbox on my W10 64-bit system, so a ratio of 40 due to the DOSbox emulation layer.

J-F
Hello JF.
What does your new version return for DOT([1E15 5 -1E15],[1 1 1]) ?
Werner
(06-04-2020 09:17 AM)Werner Wrote: [ -> ]Hello JF.
What does your new version return for DOT([1E15 5 -1E15],[1 1 1]) ?
Werner

Good question !
It returns 10, as do the 28S and 42S (and series 48/49/50 I believe).

This is a good test, because it reveals another aspect related to half-way rounding.
With real numbers, 12-digit half way rounding (that is numbers ending exactly in between two possible rounded forms) is done with the round to even rule.
For the 15-digit rounded addition (as introduced in the 28C), half way rounding is always done upwards, probably for consistency.

To be clear, I don't argue on which system is better, I just wanted to make the HP-71B Math ROM consistent with the math routines used since the 28C up to the 50G.

J-F
I should've known I didn't need to ask ;-)
Werner
(06-04-2020 08:18 AM)J-F Garnier Wrote: [ -> ]As a comparison, Emu71/DOS runs at about x300 speed natively on my W2K 32-bit system, and at x8 speed in DOSbox on my W10 64-bit system, so a ratio of 40 due to the DOSbox emulation layer.

This has nothing to do with the DOSbox emulation layer. It's the setting "cycles=auto" in the dosbox-0.74-3.conf file. This reduces the CPU speed to a reasonable value for old DOS games.

BTW, years ago I compiled a larger HP48 project with the original DOS HPTools package v1.56, measured the compiling time and then the same project with the HPTOOLS v3.0.9. Compiling with v3.0.9 was much faster. Then modified the DOSBOX setting to "cycles=max" and then I had a compile speed with v1.56 nearly equal to v3.0.9.
.
Hi, grsbanks:

(06-04-2020 07:10 AM)grsbanks Wrote: [ -> ]Emu71 may not run on Windows 64-bit but Emu71 Windows certainly does if that's an option.

Thanks for your input but as I understand it Emu71 Win tries to mimic the appearance of a physical HP-71B as closely as possible, as seen in the image you posted, including the inadequate single-line 22-char display (what were HP thinking !?) and the keyboard.

That's not what I need, I'm used to J-F's Emu71, which uses the PC's keyboard and a integrated virtual multi-line video display, with copy-paste capabilities. I understand that Emu71 Win doesn't support any of this unless you include HP-IL functionalities which are laborious to install and configure and even then I'm not sure it supports copy-paste. It also doesn't include any HP ROMs.

Regards.
V.
.
Hi again, J-F:

(06-04-2020 08:18 AM)J-F Garnier Wrote: [ -> ]Valentin, I strongly encourage you to use DOSBox to continue to use Emu71/DOS ! [...] Even if the performance is reduced due to the two emulation layers, Emu71 still runs at about x8 speed on my quite old core-i3 machine (Win10-64).

Thanks for your advice, J-F, I appreciate it but we've discussed this over PMs a number of times. x8 speed is simply too slow for me, I'm now using go71b which says it runs at 128x speed (probably 60x at most) and it's slow as molasses, so 8x would be simply unbearable (can you imagine, a FOR/NEXT loop counting up to 800 in a second, while anything now counts in the hundreds of thousands at the very very least ?)

Also, you told me that DOSBox doesn't support copy-paste of text and that's a big no-no for me.

Anyway, as you can't/won't issue a 64-bit compatible version of your Emu71 I've decided that I'll simply buy an old XP system and that's that. I'd have already done it were it not for this dreaded confinement but will do it eventually, end of problem.

Quote:Keeping the 15-digit value all along the expression evaluation will create consistency issues, for instance doing A=1/3 @ A+A+A would return a different result, since variables are holding the packed 12-digit forms only.

Your example has nothing to do with mine, as it consist of two separate evaluations, not one, and the first one does assign the 12-rounded result to variable A, so no surprise that A+A+A returns a 12d+12d+12d = 12d result.

What I'm saying is that a single evaluation that proceeds internally in full till it returns the result rounded to 12 digits should never round every partial subexpression while still internally computing the full expression, that's retarded and serves no purpose, even consistency. It should really evaluate the whole expression to 15 decimals internally, then return the final result rounded to 12 digits once it's fully done and no sooner, period.

If not, would you consider it acceptable that while the system is evaluating, say, the value of a sine or exponential, it would round to 12 digits each term, internal sum or multiplication or division or whatever ? Surely not, right ? You'd agree that the whole sin(x) should be evaluated using 15-digit precision wholesale. Same with my 1/3+1/3+1/3.

Quote:An emulator adds the overhead of the CPU/hardware simulation; Free42 is a native application. All depends on the performance of the emulation engine, I can't really judge the x60 ratio but it's the order of magnitude we can expect

I don't concur. A factor of 60x is just too much, no emulation should be that inefficient. Say 10x would be acceptable, if slow, but 60x ? Really ? Converting a 10 seconds running time to 10 minutes ? A 10 minutes running time to 10 hours ? That would be a horribly inefficient emulation, direct-to-garbage-bin class.

I can imagine the poor retarded emulation engine saying to itself: "What's that thingy ahead ? Oh, my, it's a byte ! ... and look, what's this other thing ? Hey, nice, another byte ! ..." and so on and so forth.

Thank you very much for your comments. By the way, I've read your supplementary manual for "Math ROM The Sequel" and have some hopefully useful comments for you, to be sent in a PM or e-mail in the next days.

Best regards.
V.
(06-05-2020 12:39 AM)Valentin Albillo Wrote: [ -> ]By the way, I've read your supplementary manual for "Math ROM The Sequel" and have some hopefully useful comments for you, to be sent in a PM or e-mail in the next days.

Please do, I will appreciate it very much. Keep in mind that the Math 2 is still a work-in-progress and the current version is preliminary (including the manual). I chose to release intermediate test versions, because I thought it may take me a while to reach a kind of final version. But the project progressed quite faster than I expected...

J-F
(06-04-2020 09:07 PM)Christoph Giesselink Wrote: [ -> ]BTW, years ago I compiled a larger HP48 project with the original DOS HPTools package v1.56, measured the compiling time and then the same project with the HPTOOLS v3.0.9. Compiling with v3.0.9 was much faster. Then modified the DOSBOX setting to "cycles=max" and then I had a compile speed with v1.56 nearly equal to v3.0.9.

(06-05-2020 12:39 AM)Valentin Albillo Wrote: [ -> ]
Quote:An emulator adds the overhead of the CPU/hardware simulation; Free42 is a native application. All depends on the performance of the emulation engine, I can't really judge the x60 ratio but it's the order of magnitude we can expect

I don't concur. A factor of 60x is just too much, no emulation should be that inefficient. Say 10x would be acceptable, if slow, but 60x ? Really ? Converting a 10 seconds running time to 10 minutes ? A 10 minutes running time to 10 hours ? That would be a horribly inefficient emulation, direct-to-garbage-bin class.

I can imagine the poor retarded emulation engine saying to itself: "What's that thingy ahead ? Oh, my, it's a byte ! ... and look, what's this other thing ? Hey, nice, another byte ! ..." and so on and so forth.

This is an interesting discussion, but a bit off topic so I will open a new discussion, please feel free to comment.

J-F
(06-05-2020 02:15 PM)J-F Garnier Wrote: [ -> ]This is an interesting discussion, but a bit off topic so I will open a new discussion, please feel free to comment

Yes, you're right, my emulation comments were utterly off-topic, honestly I didn't realize at the time but now I can clearly see it. My sincere apologies.

On the other hand, my comments on the suboptimal accuracy of internal computations such as 1/3+1/3+1/3 are pretty much on-topic now that you're intent in improving the suboptimal accuracy of some Math ROM internal computations.

Thanks and Best regards.
V.
I need to correct my previous statement. In fact, there is no rounding mechanism that can take a result rounded to 15 bits then correctly round that to 12 bits. The expected error may be better, but there are always cases where fewer than 12 bits will be correct.

This has been known since the earliest computations of trig tables.
(06-05-2020 05:43 PM)ttw Wrote: [ -> ]I need to correct my previous statement. In fact, there is no rounding mechanism that can take a result rounded to 15 bits then correctly round that to 12 bits. The expected error may be better, but there are always cases where fewer than 12 bits will be correct.

This has been known since the earliest computations of trig tables.

Yes, it's the problem of the double rounding.
An example with complex numbers on a 28S or 42S:
(2,4.995E-12)*(1,1) returns (2, 2.00000000001)
also with the new HP71 Math 2b,
but NOT on the original Math 1A that returns (2,2) as do the HP-32S/SII.

I think HP considered that these marginal cases were less important than the (visible) incorrect rounding such as in the DOT example I gave above.

J-F
(edited: it's easy to find examples when you know where to look at :-)
(06-05-2020 12:39 AM)Valentin Albillo Wrote: [ -> ].
Hi again, J-F:

(06-04-2020 08:18 AM)J-F Garnier Wrote: [ -> ]Valentin, I strongly encourage you to use DOSBox to continue to use Emu71/DOS ! [...] Even if the performance is reduced due to the two emulation layers, Emu71 still runs at about x8 speed on my quite old core-i3 machine (Win10-64).

Thanks for your advice, J-F, I appreciate it but we've discussed this over PMs a number of times. x8 speed is simply too slow for me, I'm now using go71b which says it runs at 128x speed (probably 60x at most) and it's slow as molasses, so 8x would be simply unbearable (can you imagine, a FOR/NEXT loop counting up to 800 in a second, while anything now counts in the hundreds of thousands at the very very least ?)

Also, you told me that DOSBox doesn't support copy-paste of text and that's a big no-no for me.

Anyway, as you can't/won't issue a 64-bit compatible version of your Emu71 I've decided that I'll simply buy an old XP system and that's that. I'd have already done it were it not for this dreaded confinement but will do it eventually, end of problem.

Quote:Keeping the 15-digit value all along the expression evaluation will create consistency issues, for instance doing A=1/3 @ A+A+A would return a different result, since variables are holding the packed 12-digit forms only.

Your example has nothing to do with mine, as it consist of two separate evaluations, not one, and the first one does assign the 12-rounded result to variable A, so no surprise that A+A+A returns a 12d+12d+12d = 12d result.

What I'm saying is that a single evaluation that proceeds internally in full till it returns the result rounded to 12 digits should never round every partial subexpression while still internally computing the full expression, that's retarded and serves no purpose, even consistency. It should really evaluate the whole expression to 15 decimals internally, then return the final result rounded to 12 digits once it's fully done and no sooner, period.

If not, would you consider it acceptable that while the system is evaluating, say, the value of a sine or exponential, it would round to 12 digits each term, internal sum or multiplication or division or whatever ? Surely not, right ? You'd agree that the whole sin(x) should be evaluated using 15-digit precision wholesale. Same with my 1/3+1/3+1/3.

Quote:An emulator adds the overhead of the CPU/hardware simulation; Free42 is a native application. All depends on the performance of the emulation engine, I can't really judge the x60 ratio but it's the order of magnitude we can expect

I don't concur. A factor of 60x is just too much, no emulation should be that inefficient. Say 10x would be acceptable, if slow, but 60x ? Really ? Converting a 10 seconds running time to 10 minutes ? A 10 minutes running time to 10 hours ? That would be a horribly inefficient emulation, direct-to-garbage-bin class.

I can imagine the poor retarded emulation engine saying to itself: "What's that thingy ahead ? Oh, my, it's a byte ! ... and look, what's this other thing ? Hey, nice, another byte ! ..." and so on and so forth.

Thank you very much for your comments. By the way, I've read your supplementary manual for "Math ROM The Sequel" and have some hopefully useful comments for you, to be sent in a PM or e-mail in the next days.

Best regards.
V.

When I run either an emulation or simulation, I want it to run as identically to the original as possible, including speed. Nostalgia is why I run them in the first place and having them run hundreds or thousands of times faster than the original would ruin the feel of them. When I want speed, I run modern compilers on modern hardware.
My 2ยข. YMMV.
Reference URL's
• HP Forums: https://www.hpmuseum.org/forum/index.php
• :