|Re: Precision of HP 30S|
Message #4 Posted by hugh steers on 18 Nov 2004, 2:16 p.m.,
in response to message #3 by Markus Sigg
its because 0.11 is roughly ten times bigger than 0.01.
here’s how i think it works (really i’m guessing). it doesn’t just use the final answer `a’. it compares `a’ to one of the terms of the input ‘b’. this suppression only applies to ADD and SUB at the top level and not to adds and subs used internally.
define ADD(x, y) to be 0 if |(x + y)/x| < eps, where eps = 1e-12, say.
and ADD(x,y) to be x + y, otherwise.
define SUB(x, y) to be 0 if |(x - y)/x| < eps, where eps = 1e-12, say.
and SUB(x,y) to be x - y, otherwise.
in the above “+” means the result of the floating point binary add (ie not true real number addition).
so, in your examples. ex1: 1e10 + 0.11 - 1e10 = 0.11
we have SUB(ADD(1e10,0.11), 1e10) = SUB(10000000000.11,1e10) = 0.11 because
(10000000000.11-1e10)/10000000000.11 = 1.0999999..e-11 > eps
but, ex2: 1e10 + 0.01 - 1e10 = 0 because
SUB(ADD(1e10,0.01),1e10) = SUB(10000000000.01,1e10) = 0, since
(10000000000.01-1e10)/10000000000.01 = 9.9999999..e-13 < eps