AriCalculator is a home made pocket calculator.
07-27-2017, 03:38 AM
Post: #41
 Dan Junior Member Posts: 31 Joined: Jan 2017
RE: AriCalculator is a home made pocket calculator.
I was able to significantly reduce execution time for the square root algorithm by using the "Friden" method instead of Newton's method. This algorithm was used in the Friden electro-mechanical calculator and is described in an HP journal from the 70's.

I will do some more timing and post the results.
07-27-2017, 04:20 AM
Post: #42
 Paul Dale Senior Member Posts: 1,202 Joined: Dec 2013
RE: AriCalculator is a home made pocket calculator.
That's a bit surprising.

What initial estimate did you use to start Newton's method? If this is moderately close, Newton's method converges very rapidly which generally compensates for the division involved.

Pauli
07-28-2017, 05:10 AM
Post: #43
 Dan Junior Member Posts: 31 Joined: Jan 2017
RE: AriCalculator is a home made pocket calculator.
I was very surprised - the time dropped from 0.72 sec to 0.013 sec! Perhaps my multiplication and division algorithms need (major) reworking.

However I read in the HP journals that they decided polynomial (Taylor, Chebyshev) methods would be too slow for the HP-35 and went with this square root algorithm and CORDIC methods for the transcendental functions. We are using an 8-bit microcontroller so perhaps these methods are better for our architecture too. I'll flowchart the algorithms and post them, which should help determine what's going on.
07-28-2017, 07:05 AM
Post: #44
 Paul Dale Senior Member Posts: 1,202 Joined: Dec 2013
RE: AriCalculator is a home made pocket calculator.
I thought the choice of CORDIC was more to do with it being tiny to implement and reusable between many transcendental functions. Polynomial and rational approximations tend to require a lot of constants. They also require repeated multiplications which could hurt performance, so it is believable that this was involved in the decision.

Newton's method for square root requires a division per iteration, but each iteration should double the number of correct digits. In other words, it converges very quickly if the initial estimate is close enough.

Pauli
07-30-2017, 07:05 AM
Post: #45
 Dan Junior Member Posts: 31 Joined: Jan 2017
RE: AriCalculator is a home made pocket calculator.
You are spot on about CORDIC being tiny and reusable. I've just completed a CORDIC implementation of tan x for the 1st quadrant that I found on a TI forum post from the 90's. This then gives me sin and cos using identities, as described in the HP journal. Amazingly the structure of the CORDIC arctan x algorithm is identical, with even the same special angles (not implemented yet).

I've also implemented e^x for x>=0 using a CORDIC algorithm from the same TI post, and again the algorithm for 2^x, 10^x etc. is identical, the only difference being the special values. I'm a big fan of CORDIC, but more on that later. Back to Newton's method for square roots:

I determined the initial value as follows:

Suppose we want to find sqrt(A), where A = m * 10^n, 1 <= m < 10. If

1. n is even, set the initial value to 2 * 10^(n/2).

2. n is odd, A = 10m * 10^(n-1), 10 <= m < 100 and set the initial value to 7 * 10^((n-1)/2).

e.g. A = 8 -> initial value = 2, A = 56 -> initial value = 7.

I chose 2 and 7 because they are approximately equal to the average value of sqrt(x) on [1,10] and [10,100], respectively. They also ensure convergence of the method.

On the calculator “A”, being the most recent value entered by the user, is stored in stack memory called FPN_1 (“floating-point number 1”, 9 bytes long. 8 bytes for the mantissa and 1 byte for the exponent).
The initial value found as above is stored in memory called FPN_b. I use FPN_b and another area of memory, FPN_a, for arithmetic operations on the stack values. The algorithm now proceeds as follows (comments after the semicolon):

1. 6 -> counter ;number of iterations = 6

2. FPN_1 -> FPN_a ;copy FPN_1 to FPN_a, i.e. FPN_a = A

3. call subroutine divFPN ;divFPN divides FPN_a by FPN_b and stores the result in FPN_a

4. call subroutine cmpExp_ab ;cmpExp_ab compares the exponents of FPN_a and FPN_b and stores the floating-point number with the larger exponent to FPN_a and the other to FPN_b

6. 0.5 -> FPN_b

7. call subroutine multFPN ;multFPN multiplies FPN_a with FPN_b and stores the result in FPN_a

8. FPN_a -> FPN_b ;FPN_b equals the new value

9. decrement counter ;i.e. counter – 1 -> counter

10. if counter > 0 go to step 2, otherwise continue

When I wrote this algorithm I wanted to use pre-existing subroutines as much as possible as I was concerned about running out of FLASH memory but I now see this is not an issue, so I can create customised subroutines for it. Some of the things that come to mind:

The subroutines divFPN and multFPN first convert BCD values to binary, perform the operation, then convert the result back to BCD in preparation for addition. This shifting from BCD to binary and back again is time consuming. When you mentioned a division per iteration, it made me think: instead of using x(n+1) = 0.5[x(n) + A/x(n)] (as in the above algorithm) try x(n+1) = x(n)/2 + A/2x(n).

Originally I needed to do 5 conversions: A to binary, x(n) to binary, divide and convert result to BCD, add x(n), convert result to binary, multiply by 0.5 and convert the result to BCD.

Now the value “A” remains constant in the algorithm so it only has to be converted to binary once at the start. So there are only 3 conversions necessary with the new form: x(n) to binary, arithmetic shift left to obtain 2x(n) (much faster than multiplying), divide A by 2x(n) and convert result to BCD, then arithmetic shift right x(n) to obtain x(n)/2 (much faster than dividing by 2) convert to BCD and add the results.

When x(n) is odd shifting 1 bit to the right causes a loss in precision – hopefully this won't cause problems. It will be interesting to see what effect these changes will have.
07-31-2017, 07:16 AM
Post: #46
 Paul Dale Senior Member Posts: 1,202 Joined: Dec 2013
RE: AriCalculator is a home made pocket calculator.
You should avoid converting from BCD to binary and back. This guarantees that you'll not get correct results for things you should. Arithmetic directly in decimal should be used. It will save a lot of multiplications and divisions in the conversion.

As for the initial estimate, you don't need to handle such a wide range. You only need to handle a mantissa between 0.1 and 1 and normalisation gives this automatically. For an even exponent, the exponent of the result is half that of the original and this doesn't alter the mantissa of the result. For an odd exponent, it is more troublesome.

This is the method described in Properly Rounded Variable Precision Square Root by T. Hull and A. Abrham from ACM Transactions on Mathematical Software, Vol 11 #3, September 1985:

Code:
f = mantissa of x  (which must be normalised 0.1 <= x < 1) e = exponent of x if e is even then     approximation = 0.259 + 0.819 * f else     f = f / 10  (i.e. a one digit shift, in BCD four bits)     e = e + 1     approximation = 0.0819 + 2.59 * f endif

Apply Newton's method: $$approximation = 0.5 \left( approximation + \frac{f}{approximation} \right)$$ until convergence. The final result is $$approximation \times 10^{\frac{e}{2}}$$. I.e. the approximation with the exponent set to e / 2. Correct rounding requires an extra step: you need to square the result and the result ±1 ULP to see which value is indeed most accurate.

There are others approaches if this isn't suitable.

Pauli
08-01-2017, 02:59 AM (This post was last modified: 08-01-2017 03:19 AM by Dan.)
Post: #47
 Dan Junior Member Posts: 31 Joined: Jan 2017
RE: AriCalculator is a home made pocket calculator.
Converting to binary is unavoidable - my multiplication and division algorithms are based on "shift and add" methods . Also I haven't observed any incorrect results arising from converting between BCD and binary. All arithmetic is integer so there are no issues with numbers that cannot be expressed exactly in binary.

e.g. 1.2 * 0.3 =

0001 0010 FF * 0000 0011 FF

The BCD values in the mantissas are converted to binary, giving:

0000 1100 FF * 0000 0011 FF.

The "shift and add" algorithm is applied to find the product of the mantissas, and the exponents are added, giving:

0010 0100 FE = 0.36

This is then converted back to BCD:

0011 0110 FE

If the conversion to binary were not made, then:

1.2 * 0.3 = 0001 0010 FF * 0000 0011 FF = 0011 0110 FE = 0.54
08-01-2017, 04:07 AM
Post: #48
 Paul Dale Senior Member Posts: 1,202 Joined: Dec 2013
RE: AriCalculator is a home made pocket calculator.
Ahhh, that makes sense. Don't worry about it -- I assumed you used a binary floating point format which would have problems.

Pauli
08-08-2017, 04:00 AM (This post was last modified: 08-08-2017 04:50 AM by Dan.)
Post: #49
 Dan Junior Member Posts: 31 Joined: Jan 2017
RE: AriCalculator is a home made pocket calculator.
I will stay with the Friden algorithm for square roots. My division routines take about 0.04s so there is no way Newton's method can even come close to the Friden algorithm's time of about 0.01s. There is a CORDIC method for square roots, and Friden present another algorithm for square roots in a paper on their EC-130 calculator from the mid 60's. It would be interesting to see how they compare.
10-09-2017, 09:13 AM
Post: #50
 Dan Junior Member Posts: 31 Joined: Jan 2017
RE: AriCalculator is a home made pocket calculator.
I have finally finished coding the scientific functions - all that remains is to include logic to handle x^y when x <= 0 since I used x^y = e^(ylnx). I want to test the algorithms. One test is:

arcsin (arccos (arctan (tan (cos (sin 9)))))

Any other tests people would recommend?
10-09-2017, 09:22 AM
Post: #51
 Paul Dale Senior Member Posts: 1,202 Joined: Dec 2013
RE: AriCalculator is a home made pocket calculator.
I wouldn't bother with the forensic test unless you are aiming for correct rounding. It isn't that interesting.

Try things like:
• Trig functions as their argument approaches multiples of pi/2.
• That sin(x) = sin(-x) for all x.
• Trig functions for huge arguments (to validate your modulo reduction).
• Log as its argument approaches zero and one.
• Square root as its argument approaches zero and one.

I'm sure a lot of other cases will come to mind once I post

Pauli
10-10-2017, 12:52 PM
Post: #52
 Eddie W. Shore Senior Member Posts: 572 Joined: Dec 2013
RE: AriCalculator is a home made pocket calculator.
Powers of large numbers. Are the mantissa correct and the correct exponent displayed?
10-11-2017, 05:04 AM (This post was last modified: 10-11-2017 06:34 AM by Dan.)
Post: #53
 Dan Junior Member Posts: 31 Joined: Jan 2017
RE: AriCalculator is a home made pocket calculator.
Thanks Paul and Eddie,

Some results:

tan 89.999 = 57295.7694563882 (actual 57295.7795072646)
tan 89.9999 = 572956.753487699 (actual 572957.795130241)
tan 90 = -1 x 10^13

sin 90 = 1
sin 0 = 0

cos 90 = -1 x 10^-13
cos 0 = 1

sin -20 = -0.342020143322619
sin 20 = 0.342020143322619
sin 200 = -0.342020143326293

cos (5 x 10^11) = 0.766001588909574
cos (10^12) gives an error (require |x| < ~5x10^11)

ln 1.001 = 0.00099950033295
ln 1.0001 = 0.00009999500015 (actual 0.00009999500033)

ln 0.001 = -6.9077552789823 (actual -6.9077552789821)
ln 0.0001 = -9.21034037197635 (actual -9.21034037197618)

sqrt (1.001) = 1.0004998750620 (actual 1.0004998750625)
sqrt (1.0001) = 1.0000499987500 (actual 1.0000499987501)
sqrt (1.00001) = 1.0000049999870 (actual 1.0000049999875)

1234567891 ^ 9.876543219 = 6.20498308133987 x 10^89 (actual 6.20498308125632 x 10^89)

I have attached several articles I used to develop the algorithms. The TI article proved very valuable. It gives easy to follow CORDIC algorithms for tan, arctan and e^x. After coding tan I was able to implement sin and cos using the given identities. I then coded arctan, and arccos and arcsin again followed from identities. Finally I coded e^x. Once this was done 2^x and 10^x were surprisingly easy; all that was required was a change to the special values used in the e^x algorithm.

The natural logarithm was more challenging. I couldn’t find an easy to follow algorithm, so I had to read the HP article and learn the mathematical background to the algorithm! Once ln was done, log and ld were again relatively easy.

Coding the Friden square root algorithm was also quite a challenge. See the HP article for further information. Also see http://www.hpmuseum.org/root.htm

The square root algorithm also raises the issue of execution time, which doesn't seem to be talked about as much. As mentioned previously, the Friden algorithm is much faster than the Newton's method algorithm on this machine (~0.7s v ~0.01s), but you lose 1 digit of accuracy.

Attached File(s)