100-99.99-0.01 in a spreadsheet

The Museum of HP Calculators

HP Forum Archive 16

100-99.99-0.01 in a spreadsheet
Message #1 Posted by Karl Schneider on 7 Feb 2007, 3:16 a.m.

All --
Digging through some old threads, I found one from late 2002 in which a contributor pointed out that the formula 100.00-99.99-0.01 would equal not zero, but 5.1156995306556 x 10^-15 in a Microsoft Excel spreadsheet cell. Excel 2003 and Quattro Pro 12 (2004) return this result, which is due to roundoff error in floating-point calculations, as the first responder stated. In this case, IEEE 64-bit double-precision values are apparently used.
The thread:
http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/archv009.cgi?read=24977
The example illustrates an advantage of binary-coded decimal (BCD) -- exactitude eliminates the need for rounding and tolerance-checking. It's probably also why the HP-30S, which uses 80-bit floating-point representations, performs rounding to achieve "pleasing" results, which are not always strictly correct.
Just as an exercise of curiosity, here are the computational steps in the floating-point calculation of 100.00-99.99-0.01:
-- KS

100.00 - 99.99 - 0.01 = 5.1156995306556 x 10^-15 using spreadsheet software
Conversions: IEEE 64-bit double-precision format at
http://babbage.cs.qc.edu/IEEE-754/Decimal.html
number offset exponent value of mantissa
99.99 10000000101 (+6) 1.1000111111110101110000101000111101011100001010001111 0.01 01111111000 (-7) 1.0100011110101110000101000111101011100001010001111011
100.00 10000000101 (+6) 1.1001000000000000000000000000000000000000000000000000
number base-2 representation with aligned radix point
99.99 1100011.1111110101110000101000111101011100001010001111
0.01 0.00000010100011110101110000101000111101011100001010001111011
100.00 1100100.0000000000000000000000000000000000000000000000
Now, the subtractions:
100.00 1100100.0000000000000000000000000000000000000000000000 -99.99 1100011.1111110101110000101000111101011100001010001111
= 1100011.1111111111111111111111111111111111111111111111 - 1100011.1111110101110000101000111101011100001010001111 ------------------------------------------------------ 0000000.0000001010001111010111000010100011110101110000 + 0000000.0000000000000000000000000000000000000000000001
= 0.0000001010001111010111000010100011110101110001
= 1.0100011110101110000101000111101011100010000000------ x 2^-07 -0.01 1.0100011110101110000101000111101011100001010001111011 x 2^-07
= 0.0000000000000000000000000000000000000000101110000101 x 2^-07
= 2949 X 2^-59
= 5.1156995306556 x 10^-15

Edited: 10 Feb 2007, 5:01 p.m. after one or more responses were posted

Re: 100-99.99-0.01 in a spreadsheet
Message #2 Posted by hugh steers on 7 Feb 2007, 4:54 a.m.,
in response to message #1 by Karl Schneider

here's my excel example,
=IF(4.6-.2-.2-.2-4=0,"YES","NO")
unfortunately, this is the case for IEEE-754 binary. you always get the same answer 4.6-.2-.2-.4-4 = -8.8817841970013e-016
it's one reason why i've changed hplua to now use an underlying decimal system.

more floating-point math in a spreadsheet
Message #3 Posted by Karl Schneider on 7 Feb 2007, 11:10 p.m.,
in response to message #2 by hugh steers

Hi, Hugh --
That's an even more curious example! While 100 and 0.01 are 13 orders of base-2 magnitude apart, 4.6 and 0.2 are only 5 orders of base-2 magnitude apart. Of course, there are four subtractions involved in total, instead of two.
MS Excel 2003 computes zero as the result of the formula "=4.6-.2-.2-.2-4.0", but not if the expression is within parentheses, such as

=IF(4.6-.2-.2-.2-4=0,"YES","NO")
=(4.6-.2-.2-.2-4)
In such cases, the -8.8817841970013e-016 is returned.
How can ya trust that?
Quattro Pro seems to be a bit more consistent with results in that respect. However, I never warmed to using spreadsheet programs for straightforward calculations of limited volume -- too much frustration.
-- KS


Re: more floating-point math in a spreadsheet
Message #4 Posted by Gunnar Degnbol on 8 Feb 2007, 5:07 a.m.,
in response to message #3 by Karl Schneider

Try =(0.3-0.1-0.1-0.1). I get -2.77556E-17.
The reason for Excel's odd behavior is described in William Kahan's paper How Futile are Mindless Assessments of Roundoff in Floating-Point Computation ?:

Quote:
Apparently Excel rounds Cosmetically in a futile attempt to make Binary floating-point appear to be Decimal. This is why Excel confers supernatural powers upon some (not all) parentheses.
He recommends they re-design Excel to use decimal arithmetic:

Quote:
Decimal has the great advantage that, if enough digits are displayed, What You See is What You Get. Some day, perhaps, IBM’s LOTUS 123 spreadsheet may come out with decimal floating-point carrying 34 sig. dec.; if then Microsoft’s Excel imitates (instead of “innovates”), its mysteries will become vastly fewer.
Note that the WYSIWYG principle is lost not just when using binary arithmetic, but also when using hidden digits as the TIs do. Apparent precision is gained at the cost of predictability.

Re: more floating-point math in a spreadsheet
Message #5 Posted by Karl Schneider on 9 Feb 2007, 1:52 a.m.,
in response to message #4 by Gunnar Degnbol

Gunnar --
Thank you for the link and response. This particular article by Kahan is fairly new (just more than a year old), and I had not seen it before. I didn't look at the date until I saw the quote from the article, "What's in your spreadsheet?" This is a twist on the catchphrase of a contemporary US television advertising campaign for a credit card: "What's in your wallet?"
Professor Kahan really ought to be more concise; this article runs 56 pages. Section 2 of the article describes the strange behavior of MS Excel 2000, which probably hasn't changed in Excel 2003, and perhaps not even in the latest version.
Section 5 tabulates "J-M. Muller’s Recurrence":
x_n = 5 – 2/(1 + (5/3)ⁿ)
for which practically any BCD calculator gives accurate and best-possible results to the limits of its display, to n = 75 and beyond, long after the results from software utilizing floating-point math with 64-bit and 53-bit mantissas have diverged from the realm of numerical correctness.
-- KS

Edited: 10 Feb 2007, 5:04 p.m.

Other numbers in the Excel spreadsheet
Message #6 Posted by unspellable on 12 Feb 2007, 1:08 p.m.,
in response to message #5 by Karl Schneider

Try filling a short column with -5, -4,-3,-2,-1, 0, 1, 2, 3, 4, 5.
Find the standard deviation = 3.02765... So far so good.
Now if you have an older version, up to about '98 and I think 2000 as well, try this with 14,999,999,998 through 15,000,000,005. The correct answer should be the same but the spreadsheet will come up with nonsense. This not due to the internal arithmetic so much as it is due to poor choice of algorithm. The algorithm is mathematically correct but when doing numerical computation all the significant data is rounded off.
BTW: This came up as a real world problem. Whipped out my HP-48SX and it had no trouble with this calculation.

Re: Other numbers in the Excel spreadsheet
Message #7 Posted by bill platt on 12 Feb 2007, 1:35 p.m.,
in response to message #6 by unspellable

I get 3.317

Standard deviation calculation in Excel
Message #8 Posted by Karl Schneider on 12 Feb 2007, 11:41 p.m.,
in response to message #6 by unspellable

Quote:
Try filling a short column with -5, -4,-3,-2,-1, 0, 1, 2, 3, 4, 5.
Find the standard deviation = 3.02765... So far so good.
This looks like the sample standard deviation for the first ten values or the last ten. When you copied a range of cells into the formula, you may have missed one.
The correct population standard deviation is 3.16228
The correct sample standard deviation is 3.31662
MS Excel 2003 gives the correct answers with 14,999,999,995 through 15,000,000,005. Quattro Pro v12 calculates zero as either standard deviation for the large input data, as do calculators that use the summation method, instead of retaining each input datum and first calculating the mean therefrom (e.g., HP-17B/II, HP-27S, HP-28/48/49/50).
This was discussed in a thread from several years ago; Tom Sherman's post elaborated in detail.
http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/archv014.cgi?read=53787#53787
I'm somewhat surprised that not all modern spreadsheet software can give the correct answer to this problem. The PC-based programs are already storing all the input data, and there's no shortage of processing power.
-- KS

Edited: 13 Feb 2007, 8:59 p.m.

Re: Standard deviation calculation in Excel
Message #9 Posted by unspellable on 14 Feb 2007, 1:55 p.m.,
in response to message #8 by Karl Schneider

I probably mistyped something. But the main point is that Excel versions up through 98 or 2000 give totally bogus answers due to rounding off the significant data. It's more evidence that Micro Soft's primary objective is a new version with new window dressing rather than actually making software that works.
Another cute trick is finding the square root of -1.
I get: 6.12303176911189E-017+i
Come on, be serious! A rounding error for sqrt(-1)?
This was in Excel 2003 although I think all previous versions do the same.

Re: Standard deviation calculation in Excel
Message #10 Posted by Ken Shaw on 19 Feb 2007, 5:54 p.m.,
in response to message #8 by Karl Schneider

MS Excel 2002 (version 10.2614.2625) failed this test with:
Sample std dev = 228.973360895978
Population std dev = 186.181818181818
Stunning.

Re: 100-99.99-0.01 in a spreadsheet
Message #11 Posted by Garth Wilson on 8 Feb 2007, 2:24 a.m.,
in response to message #1 by Karl Schneider

Quote:
5.1156995306556 x 10E-15
This is a little O.T., but it looks like you're using the "E" incorrectly. I think you mean 5.1156995306556E-15. 1,000 = 1E3 = 10^3 = 10E2. "E" already means "times ten to the ___ power," so to say "x 10E__" puts you off by a power of ten. Also, 2E0=2, 2E1=20, 2E2=200, etc. Some engineering students and technicians at a company I did some consulting for years ago called me up with some simple calculations where things weren't coming out right. They had the presence of mind to know their results were way off, but they couldn't figure out why. This "E" thing is what messed them up.

"E" notation
Message #12 Posted by Karl Schneider on 8 Feb 2007, 11:37 p.m.,
in response to message #11 by Garth Wilson

Hi, Garth --
Yes, you are correct -- I should stick with orthodox notation to prevent any possible confusion. It's just that
"5.1156995306556 x 10E-15"
looked better than the cluttered
"5.1156995306556 x 10^(-15)
or the run-together
"5.1156995306556E-15"
I was too lazy to superscript each value:
"5.1156995306556 x 10^-15
which can also be easily corrupted if responders quote without manually restoring the formatting.
-- KS

Re: "E" notation
Message #13 Posted by Rodger Rosenbaum on 9 Feb 2007, 5:04 a.m.,
in response to message #12 by Karl Schneider

Quote:
or the run-together
"5.1156995306556E-15"
-- KS
What's wrong with that? That's just how it looks or our beloved HP calculators.
As long as we're picking on you, I would point that some would call these:

Quote:

number base-2 representation with aligned decimal points
99.99 1100011.1111110101110000101000111101011100001010001111 .01 0.00000010100011110101110000101000111101011100001010001111011

"binary" points, not decimal points.

Re: "E" notation
Message #14 Posted by Karl Schneider on 9 Feb 2007, 11:45 p.m.,
in response to message #13 by Rodger Rosenbaum

Hi, Roger --

Quote:
"5.1156995306556E-15"
What's wrong with that? That's just how it looks or our beloved HP calculators.
True, but it's a shorthand notation -- blank-free by necessity -- developed to represent "extreme" numbers within limited displays.
Another problem with superscripting using preformatting is that the superscripted numbers look small.
Clarifications will be applied to my original post.
-- KS

Re: "E" notation
Message #15 Posted by Rodger Rosenbaum on 10 Feb 2007, 5:22 a.m.,
in response to message #14 by Karl Schneider

Quote:
Hi, Roger --

True, but it's a shorthand notation -- blank-free by necessity -- developed to represent "extreme" numbers within limited displays.
Another problem with superscripting using preformatting is that the superscripted numbers look small.
Clarifications will be applied to my original post.
-- KS
Why is it "blank-free" by necessity?
It wasn't developed to represent numbers within limited displays as far as I know. The notation has been used at least since Fortran when outputs were printed on 135 column wide printers. There was no space limitation. Calculators inherited the notation from those early computer languages.
The calculators with truly limited display space, such as the HP15 and HP41 didn't even have an "E"; they had a space.
The HP71's "PRINT USING" statement allows one to put several spaces before the "E", so space limitations aren't involved. But the default is the old Fortran style, 12.3456E78. Space limitations aren't a problem in the HP28, HP48, etc., but again the default is the Fortran notation.
So, I don't think space limitations had anything to do with its development; rather it was the lack of superscripts on the printers of the time.
--------------------------------
In your first post in this thread where your revision now says:
"base-2 representation with aligned radix"
I think you should say:
"base-2 representation with aligned radix points"
See:
http://en.wikipedia.org/wiki/Radix
http://en.wikipedia.org/wiki/Radix_point


Re: "E" notation and radix point
Message #16 Posted by Karl Schneider on 10 Feb 2007, 4:39 p.m.,
in response to message #15 by Rodger Rosenbaum

Quote:
Why is ("E"-notation) "blank-free" by necessity?
It wasn't developed to represent numbers within limited displays as far as I know. The notation has been used at least since Fortran when outputs were printed on 135 column wide printers. There was no space limitation. Calculators inherited the notation from those early computer languages.
That depends on the application. I was referring primarily to the entry of exponentiated numbers in computer-language code, not the display or printing of them. If a space were inserted betewen the mantissa and the "E" (or "D" for double precision), the Fortran compiler would unsuccessfully try to parse the number into an exponent and a variable.
I'm an old Fortran guy (too?), and I remember offhand that 132 characters was the standard limit for line-printer output.

Quote:
The calculators with truly limited display space, such as the HP15 and HP41 didn't even have an "E"; they had a space.
More specifically, the exponents on these models were right-justified to provide one or more spaces as the display setting permitted. Of course, with spaces, the "E" isn't even necessary, and would waste a valuable display position. I prefer the spaces to the full-size "E" used in the HP-20S, HP-21S, HP-32S, and others.
Apparently, so did other people at H-P; the HP-32SII uses a reduced-size "E" (as well as an elevated "-" for negative exponents) in the display to help distinguish exponentiated numbers. This was more important in the HP-32SII, with its Equation capabilities that utilized variables denoted by single capital letters. That's attention to detail.
BTW, when I stated "limited displays", I was also referring to lack of sub- and superscripting as well as length of display windows.

Quote:
I think you should say:
"base-2 representation with aligned radix points"
Right you are. I knew of "radix" only from the HP-42S RDX functions which set the style of radix point. By golly, there's a Wikipedia entry for everything!
-- KS


Re: "E" notation and radix point
Message #17 Posted by Rodger Rosenbaum on 10 Feb 2007, 9:41 p.m.,
in response to message #16 by Karl Schneider

Quote:
That depends on the application. I was referring primarily to the entry of exponentiated numbers in computer-language code, not the display or printing of them.
Too bad you didn't say that. Then my comments could have been relevant to your intent.

Quote:
If a space were inserted betewen the mantissa and the "E" (or "D" for double precision), the Fortran compiler would unsuccessfully try to parse the number into an exponent and a variable.
This isn't "necessity"; this is convention. The parser could easily accept free-form input, including spaces, and use comma, semicolon or CRLF as the terminator. I don't have any Fortran manuals anymore, and I don't know what free-form capability may have been available in any particular Fortran compiler.
But I like to refer to the HP71's BASIC because it is one of the most capable machines I've ever seen as far as input/output formatting goes. It has a built-in free-form input capability for HPIL input. For ordinary input, numbers can be input as strings and there are string functions that allow the user to remove superfluous spaces, or any other character, and then convert to a number in a variable with the VAL function. The user can do their own free-form input with this capability.
Therefore, I don't agree that the "E" format is "blank-free by necessity".

Quote:
I'm an old Fortran guy (too?), and I remember offhand that 132 characters was the standard limit for line-printer output.
Must have been a different printer.

Quote:
More specifically, the exponents on these models were right-justified to provide one or more spaces as the display setting permitted. Of course, with spaces, the "E" isn't even necessary, and would waste a valuable display position.
Why would an "E" in an exponential format number waste a display position, but a space wouldn't? Wouldn't the "E" be in the same position as the space, occupying one character position, just as the space does?

Quote:
I prefer the spaces to the full-size "E" used in the HP-20S, HP-21S, HP-32S, and others.
Apparently, so did other people at H-P; the HP-32SII uses a reduced-size "E" (as well as an elevated "-" for negative exponents) in the display to help distinguish exponentiated numbers.
I prefer the "E" of whatever size. The space is ok on a small size calculator display like the HP-41, but if it is printed on paper, the space rather than "E" will make it look like two non-exponential numbers rather than one "E" format number, e.g.:
12.345E26 with spaces becomes 12.345 26
I like to have the same format on the calculator display as when it's printed on paper. I'm used to the "E" format on both paper and a calculator display; it looks the same both places.
My TI-86 uses a small size "E" and elevated "-"; it's ok. Maybe those "other people" at H-P are an invasion force from TI (or some Japanese manufacturer of calculators). :-)

Quote:
BTW, when I stated "limited displays", I was also referring to lack of sub- and superscripting as well as length of display windows.
What you said earlier was that the "E" notation was "...developed to represent "extreme" numbers within limited displays."
To speak of "displays" like that sure sounds like you were referring to something like a calculator display such as found on a hand-held calculator. When the "E" format was developed, there weren't any such "displays"; there was printed output. There was no "length" limitation on printed "E" format numbers such as there is on a calculator display.
I said in my reply:

Quote:
So, I don't think space limitations had anything to do with its development; rather it was the lack of superscripts on the printers of the time.
To be perfectly clear, I think that to the extent you were suggesting that length limitations had anything to do with the development of the "E" format, you were mistaken.
I think it was solely the lack of superscripts on the printers of the time. On that point, you and I are in agreement.
As far as the use of the compact "E" format nowadays, it seems that you don't like it, but I do; I'm used to it.
We all have our preferences.


Re: 100-99.99-0.01 in a spreadsheet
Message #18 Posted by Rodger Rosenbaum on 10 Feb 2007, 9:49 p.m.,
in response to message #1 by Karl Schneider

A while back when the HP30S came out, there was some discussion about what its internal workings were. I posted a long item on the newsgroup. You can use Microsoft's BASIC to play around with binary representations. Here is what I posted:

Quote:
I think I can show that the 30S is indeed using binary arithmetic internally.
Fire up one of the old Microsoft GWBASIC versions and use the following little program to see the behavior of internal binary number representations. Some of the behavior which would otherwise seem peculiar derives directly from conversions from binary to decimal, and vice versa.

10 a$ = MKD$(.1) 15 GOSUB 100 20 FOR I = LEN(a$) TO 1 STEP -1 30 PRINT RIGHT$("0" + HEX$(ASC(MID$(a$, I))), 2); 40 NEXT I 50 END 100 GOTO 130 110 MID$(a$, 1) = CHR$(0) 120 MID$(a$, 2) = CHR$(0) 130 PRINT 140 PRINT CVD(a$); 150 PRINT " "; 200 RETURN
The Microsoft Basics use binary arithmetic internally, and there are two floating point precisions available, single precision and double precision. The single precision uses 4 bytes, 1 for the exponent and its sign, and 3 for the mantissa and its sign. There were some changes when Quick Basic came out, but I believe the GWBASIC assumed that since the mantissa is normalized, the most significant bit (MSB) of the mantissa need not be stored (since in a normalized mantissa, it is always a one; the special case of floating point zero is represented as an all zero exponent and all zero mantissa). The bit actually stored in the place of the MSB of the mantissa is the sign of the mantissa. Double precision uses 8 bytes, 1 for the exponent and its sign, and 7 for the mantissa and its sign. Thus, single precision means 24 bit precision, and double precision means 56 bit precision (mantissa precision).
Run the program and see the following result:

.1000000014901161 7D4CCCCD00000000
The first value is the result of putting a single precision value .1 in a double precision variable. The second item, a hex string, is the internal representation of the number. Notice that the last 4 bytes are all zeroes, which is what happens when you store a single precision value in a double precision variable.
The quantity in parentheses in line 10 is .1 as a single precision quantity. Now place a # after the .1, so line 10 becomes:
10 a$ = MKD$(.1#)
Now we are putting a double precision .1 in the double precision variable. By the way, there is no explicit double precision variable being used here; the MKD$ function creates a temporary double precision variable and then converts it to a string which is moved to the string variable a$ for display.
Run the program and see the following:

.1 7D4CCCCCCCCCCCCD
Now we see the internal double precision (56 bit mantissa) representation of .1--notice the value of .1 converted to binary has a repeating mantissa, CCCCCCCCC..., and that the last nib has been rounded up.
Now put the value Joe Horn mentions in another posting as the value the 30S gets for SQRT(2), namely: 1. 41421 35623 71514 85402 13689, in line 10, thus:
10 a$=MKD$(1.4142135623715148540213689#)
Run the program and get the following:

1.414213562371515 813504333F8FFFF
Notice all the trailing F's. If this binary quantity were rounded up by adding only one bit in the least significant place, we would have 813504333F90000. Let's see if we can coax the program into showing us what the floating point value of that binary (represented as hex here for space saving) string would be. For this we need lines 110 and 120. Replace line 100 with: 100 REM
Run the program and see:

1.414213562371515 813504F333F90000
GWBASIC converts the 40 bit string 813504F333F90000 (we have 56 bits in the mantissa, but the last 16 are all zeroes) into the numeric value 1.414213562371515
If the y to the x function on the 30S is used to calculate 2 to the .5, which is also equal to SQRT(2), we get 1.41421356237309503445232. At this point, I can't use the GWBASIC program any longer for the next step, since BASIC only has 56 bits in double precision. Fortunately, Mathematica comes to the rescue. Converting this floating point value to hexadecimal, we get: B504F333F9DE640000 (without exponent here).
Plainly, this is a mantissa which has been truncated to 56 bits.
If two 12 digit numbers are multiplied on the 30S, such as:
123456789123*987654321999, we get: 121932631357450082816877 which is exactly the correct result. Similarly for division, addition, and subtraction.
Having reached this point, I remembered the properties of the math coprocessors that used to be available as separate devices before they were incorporated into the Pentium CPU's. There were 3 levels of precision available; 24 bit single precision, 56 bit double precision and 80 bit double extended precision. It seemed possible that the 30S has a modern CPU with built-in math coprocessor. Those processors provided math functions such as sqr, sin, cos, log, exp, etc., as well as the 4 basic arithmetic functions +, -, *, /. The basic four could be done in all 3 precisions, but the higher functions were available only to a maximum of 56 bits.
So, I thought, let's check some of the other functions. Uh-oh, Ln(2) returns a result correct to 24 digits, more than is possible with 56 bit arithmetic.
But, Exp(10) gives 22026.465794806716075982, correct to 17 digits, which converts to a hexadecimal mantissa of 2B053B9F2A0ABF000000 which is a truncated 56 bit value.
(By the way, a binary mantissa of N bits is equivalent to a decimal number of:
N Ln(2) -------- or N/3.32193 digits, so 80 bits give 80/3.32193= 24.08 digits Ln(10)
40 bits give 12.04 decimal digits, and 56 bits give 16.85 digits)
Sin(60) (degrees, that is) gives 19 correct digits. Darn.
So, to summarize:
We can often tell how many bits of binary arithmetic are being used by converting a decimal result back to binary (hex), and seeing if all the bits past a certain point are truncated. This shows that some of the error seen in decimal results is directly attributable to truncation error in the binary result, which when converted to decimal is not recognizable by simple inspection as truncation any more.
The basic four arithmetic functions seem to use 80 bit arithmetic.
SQRT seems to truncate to 40 bits (12 digits)
The built-in keyboard constant Pi is 24 digits.
Other functions don't seem to consistently give a 56 bit truncated result.
Joe Horn is quite right that the calculator will accept 13 digits on input.

[ Return to Index | Top of Index ]

Go back to the main exhibit hall