Threaded Mode | Linear Mode

Paul Dale · 06-06-2014, 08:33 AM

(06-06-2014 08:21 AM)Thomas Klemm Wrote: Or do you calculate $e^x-1$ directly without extinction for small x?

Yes. The EXP-1 function already exists to do this. I actually calculate:

sinh(x) = ((z * 0.5) / (z+1)) * (z+1+1), where z = $e^x-1$

The order of operations should avoid overflow issues which is why I think this ought to work for larger x. Still, the obvious approach works for large x and will be faster.

- Pauli

Claudio L. · 06-06-2014, 04:07 PM

(06-06-2014 01:51 AM)Paul Dale Wrote:
(06-06-2014 01:31 AM)Claudio L. Wrote: There's other cases where this can't be avoided: cosh() for example.

Cosh is easily worked out from the definition: (e^x + e^-x)/2.
For x large (positive or negative), one term dominates. For x small, there are both relevant but of similar magnitude.

Sinh is a completely different beast.

- Pauli

Yeah, I made a mistake, it was acosh() not cosh the one that gave me trouble (the rotation mode in CORDIC is quite stable, vectoring mode is much worse), since the argument close to 1 leaves you very few significant digits to work with.
Also ln() is very tricky (it's also in rotation mode), because the input to the CORDIC algorithm is (1-x) and (1+x). When x is close to one you lose a lot of digits in the first one, and when it's close to zero you lose them in the other one. Can't win.

Claudio

Claudio L. · (This post was last modified: 06-06-2014 04:36 PM by Claudio L..)

(06-06-2014 06:07 AM)Thomas Klemm Wrote:
(06-06-2014 01:31 AM)Claudio L. Wrote: However, CORDIC accumulate angles from large to small, adding smaller and smaller numbers to bring a target to zero. This is really bad for precision, because it forces you to start adding large terms first that cancel each other.

This depends on how CORDIC is implemented. It's not a problem with the method used in this article.

Cheers
Thomas

Of course it depends on how you implement it, but the method is the same and has the same drawbacks.

In your article you start with 0.1 radians for the angle rotation, and go down 0.1, 0.01, 0.001. First of all, that sequence will not converge for all numbers.

EDIT: I just noticed your article does repetition. You can indeed achieve convergence by repeating rotations with the same angle. But it's bad for speed!

There is a maximum ratio between a rotation angle and the next that can't be exceeded, and is a requisite for convergence. Binary cordic uses a ratio of 2, where each angle is half of the previous, and that guarantees convergence. a 1/10 ratio does not converge.
There's multiple papers (I already added a link to a very good one on a previous post) where researchers propose different sequences of angles and prove that 'theirs is better'. newRPL uses 1.0, 0.5,0.2,0.2,0.1, 0.05,0.02,0.02,0.01... known as a 5,2,2,1
This sequence is proven on that paper to converge for both normal an hyperbolics.

Now what if you are trying to calculate sin(3e-10)?

If you start your rotations with 0.1 radians, and go smaller, you will get a lot of errors. That's why I said it's bad for small angles.
However, you can avoid that by starting the loop with 1e-9 radians, and that would give you all the digits you need (and yes, it's 1e-9 because you always start with an angle larger than your initial argument, otherwise does not converge).
But starting the loop with smaller angles means the constants K=Product(cos(Alphai)) changes for each case, hence you need a whole lot of constants to handle this case properly.

EDIT: With the method you described, you don't know how many repetitions so you have to compute the constants every time, alongside your other calculations. This is bad for speed and not how you'd normally implement CORDIC.

Claudio

Dieter · 06-06-2014, 05:27 PM

(06-06-2014 04:07 PM)Claudio L. Wrote: Also ln() is very tricky (it's also in rotation mode), because the input to the CORDIC algorithm is (1-x) and (1+x). When x is close to one you lose a lot of digits in the first one, and when it's close to zero you lose them in the other one. Can't win.

Since we are talking about hyperbolic functions I assume you refer to artahn x = 1/2 ln[(1+x)/(1–x)]. Numerically this is not much of a problem as long as a ln1plusx function is available. This way artahn x can be written as 1/2 ln1plusx[2x/(1-x)]. Since already the 41-series offered such a function (as well as e^x–1 for that matter) there might be a way to realize this function in CORDIC as well. I am sure you will have a definitive answer here. ;-)

Dieter

Thomas Klemm · 06-06-2014, 05:42 PM

(06-06-2014 04:24 PM)Claudio L. Wrote: Now what if you are trying to calculate sin(3e-10)?

Compare it to all the angles in the list until one is smaller than 3e-10.
That will be $\tan^{-1}(10^{-10})$ which is a little bit smaller than 1e-10.
Thus we can subtract $3\cdot\tan^{-1}(10^{-10})$. And then we can go on with the remainder ...
As you may notice there will be no cancellation due to adding or subtracting big values.

Quote:However, you can avoid that by starting the loop with 1e-9 radians

In fact this is very similar to what you do.

Quote:But starting the loop with smaller angles means the constants K=Product(cos(Alphai)) changes for each case, hence you need a whole lot of constants to handle this case properly.

EDIT: With the method you described, you don't know how many repetitions so you have to compute the constants every time, alongside your other calculations. This is bad for speed and not how you'd normally implement CORDIC.

You can do without using the constant K. Calculate:
\[\tan(\theta)=\frac{K\cdot\sin(\theta)}{K\cdot\cos(\theta)}\]
From this you can calculate:
\[\sin(\theta)=\frac{\tan(\theta)}{\sqrt{1+\tan(\theta)^2}}\]

How normal do you consider the implementation used in most HP-calculators?

Cheers
Thomas

Thomas Klemm · (This post was last modified: 12-27-2023 01:03 AM by Thomas Klemm.)

(06-06-2014 05:27 PM)Dieter Wrote: Since we are talking about hyperbolic functions I assume you refer to artahn x = 1/2 ln[(1+x)/(1–x)].

Nah, I think it's formula (20) of J. S. Walther's paper:

$\ln w = 2 \tanh^{-1}(\frac{y}{x})$ where $x=w+1$ and $y=w-1$.

So it's just the other way round.

Claudio L. · 06-06-2014, 07:53 PM

(06-06-2014 05:42 PM)Thomas Klemm Wrote: You can do without using the constant K. Calculate:
\[\tan(\theta)=\frac{K\cdot\sin(\theta)}{K\cdot\cos(\theta)}\]
From this you can calculate:
\[\sin(\theta)=\frac{\tan(\theta)}{\sqrt{1+\tan(\theta)^2}}\]

How normal do you consider the implementation used in most HP-calculators?

Cheers
Thomas

That seems easy to do, but then you have to do a division, a multiplication and a square root, on top of your sin/cos which both come together from CORDIC.
The square root alone will take about half the time of the sin, so you'd increase your time by 50%.
Yes, this will be faster than having to compute the constants alongside but still much worse than having a precomputed constant ready.

Also, when x is close to zero, doing sqrt(1+x^2) is really bad for precision. The x^2 "spreads" your useful digits throughout your exponent range, then the square root compresses them back, and you lost about half of them at the end (see what I meant with "small angles are tougher on precision loss").
For hyperbolics, I'm using the double-angle formulae with much better results since you don't have the square roots.

For now, I'd rather have the constants precalculated, so after CORDIC, one multiplication and I got the final result. However, if we run out of space in ROM and I have to remove my constants, we'll have no choice but to use that trick (thanks, BTW).
I'm not sure a similar trick will work well for hyperbolics, exp() and ln(), though.

Claudio

Paul Dale · 06-06-2014, 10:27 PM

(06-06-2014 04:24 PM)Claudio L. Wrote: You can indeed achieve convergence by repeating rotations with the same angle. But it's bad for speed!

If you want speed on a modern CPU, don't use CORDIC. Multiplications and additions are similar speed on the 34S -- software floating point is slow but hardware integer operations help a lot.

If I were chasing speed, I'd likely use polynomial or rational approximations. Probably piecewise ones. This will result in larger code sizes. The Intel decimal library recently discussed here did exactly this. Lots of approximations over small intervals meant lots of tables of coefficients but the code is fast and accurate.

Of course, there are other options available. Elementary Functions by Jean-Michel Muller is a good book giving details of many method.

- Pauli

Paul Dale · 06-06-2014, 10:53 PM

(06-06-2014 07:53 PM)Claudio L. Wrote: Also, when x is close to zero, doing sqrt(1+x^2) is really bad for precision. The x^2 "spreads" your useful digits throughout your exponent range, then the square root compresses them back, and you lost about half of them at the end (see what I meant with "small angles are tougher on precision loss").

Huh??

For z near one, $\sqrt{z}$ is always closer to one than z is. So yes, you will lose some least significant digits but the result is accurate within your working precision. Hence, $\sqrt{1+x^2}$ for small x has no accuracy problems by itself, the answer will approach unity much faster than x approaches zero.

If, on the other hand, you have something along the lines of $\sqrt{1+x^2}-1$, then the story is completely different. Here, there is a cancellation of digits and loss of precision due to the $-1$. However, these situations are rarely insurmountable. In this case, I'd use the transformation (assuming my algebra is good):

$$\sqrt{1+x^2}-1 = \frac{x^2}{\sqrt{1+x^2}+1}$$

which doesn't involve any cancellation. The actual transformation required will depend on the exact formula being evaluated of course.

- Pauli

Claudio L. · (This post was last modified: 06-07-2014 01:49 AM by Claudio L..)

(06-06-2014 10:53 PM)Paul Dale Wrote:
(06-06-2014 07:53 PM)Claudio L. Wrote: Also, when x is close to zero, doing sqrt(1+x^2) is really bad for precision. The x^2 "spreads" your useful digits throughout your exponent range, then the square root compresses them back, and you lost about half of them at the end (see what I meant with "small angles are tougher on precision loss").

Huh??

For z near one, $\sqrt{z}$ is always closer to one than z is. So yes, you will lose some least significant digits but the result is accurate within your working precision. Hence, $\sqrt{1+x^2}$ for small x has no accuracy problems by itself, the answer will approach unity much faster than x approaches zero.

If, on the other hand, you have something along the lines of $\sqrt{1+x^2}-1$, then the story is completely different. Here, there is a cancellation of digits and loss of precision due to the $-1$. However, these situations are rarely insurmountable. In this case, I'd use the transformation (assuming my algebra is good):

$$\sqrt{1+x^2}-1 = \frac{x^2}{\sqrt{1+x^2}+1}$$

which doesn't involve any cancellation. The actual transformation required will depend on the exact formula being evaluated of course.

- Pauli

You'r right, that term won't lose precision.

I looked at my code, and here's where I was losing digits:

// COMPUTE ACOS(X) = ATAN2(SQRT(1-X^2),X)

when x~1 (ACOS(X)~0) I couldn't get all the digits right (this is when the sqrt() brings your digits back).

I should double check my posts, but I don't always have my devel machine with me to look at the code.

Claudio

Claudio L. · 06-07-2014, 01:55 AM

(06-06-2014 10:27 PM)Paul Dale Wrote:
(06-06-2014 04:24 PM)Claudio L. Wrote: You can indeed achieve convergence by repeating rotations with the same angle. But it's bad for speed!

If you want speed on a modern CPU, don't use CORDIC. Multiplications and additions are similar speed on the 34S -- software floating point is slow but hardware integer operations help a lot.

If I were chasing speed, I'd likely use polynomial or rational approximations. Probably piecewise ones. This will result in larger code sizes. The Intel decimal library recently discussed here did exactly this. Lots of approximations over small intervals meant lots of tables of coefficients but the code is fast and accurate.

Of course, there are other options available. Elementary Functions by Jean-Michel Muller is a good book giving details of many method.

- Pauli

More than speed of execution, I was looking at speed of coding. I got all transcendental functions with the same loop, and only a few variations around the same idea over the course of a few weeks. I got a lot of ground to cover, can't spend a whole year doing just trig.

Once newRPL is finished, then I might attempt to do a piecewise-optimized version.

Claudio

Thomas Klemm · 06-07-2014, 03:38 AM

(06-07-2014 01:46 AM)Claudio L. Wrote: You're right, that term won't lose precision.

As Pauli already pointed out there's no cancellation since:

\[\frac{1}{\sqrt{1+x^2}}=1-\frac{x^2}{2}+\ldots\]

For a small x this will be close to 1 and thus $\sin(x)\approx\tan(x)$.

Quote:I looked at my code, and here's where I was losing digits:

// COMPUTE ACOS(X) = ATAN2(SQRT(1-X^2),X)

when x~1 (ACOS(X)~0) I couldn't get all the digits right

You might have a look at how the HP-48 calculates the complex arccos.

Cheers
Thomas

Claudio L. · 06-07-2014, 11:08 AM

@Thomas:
Thanks! Complex numbers are not done yet, so this article sure will be of great help!

Claudio

pito · (This post was last modified: 06-07-2014 02:31 PM by pito.)

(06-05-2014 12:08 AM)pito Wrote: With WP34S emulator v3.3.3605 and 9degree test (double) you get with DEG mode
..

A comparison 9 degree DEG/RAD "test" with WMat (34 digits precision requested):

Code:

"STANDARD DEG MODE TEST"

x = 9.0`34

pii = N[Pi, 34]

torad = N[pii / 180.0`34, 34]

todeg = N[180.0`34 / pii, 34]

s = N[Sin[x * torad], 34]

c = N[Cos[s * torad], 34]

t = N[Tan[c * torad], 34]

at = N[todeg * ArcTan[t], 34]

ac = N[todeg * ArcCos[at],  34]

as = N[todeg * ArcSin[ac],34]

d = N[as - 9.0`34, 34]

9.`34.

3.141592653589793238462643383279502884197169399375`34.

0.017453292519943295769236907684886127134428718885`33.69897000433602

57.295779513082320876798154814105170332405472466611`33.69897000433602

0.156434465040230869010105319467166892313899892082`33.526471382542105

0.999996272742885024117516205011350248686695280675`34.

0.017454999855488660791394140928348469199263126371`33.5227905465791

0.999996272742885024117516205011350248686695280612`33.30102999566398

0.15643446504023086901010531946716689231`28.173449397729776

9.00000000000000000000000000000000000016`28.169855476156346

0``27.215612324562617

Code:

"EXPERIMENTAL RAD MODE TEST"

x = 9.0`34

pii = N[Pi, 34]

y = N[x * pii / 180.0`34, 34]

s = N[Sin[y], 34]

c = N[Cos[s], 34]

t = N[Tan[c], 34]

at = N[ArcTan[t], 34]

ac = N[ArcCos[at], 34]

as = N[ArcSin[ac], 34]

r = N[as * 180.0`39 / pii, 34]

d = N[r - 9.0`39, 34]

9.`34.

3.141592653589793238462643383279502884197169399375`34.

0.157079632679489661923132169163975144209858469965`33.52287874528034

0.156434465040230869010105319467166892313899892082`33.526471382542105

0.987789061484321026592245971856886355356443043866`34.

1.516357552591207213695274523631412761647072139176`33.66771030391201

0.987789061484321026592245971856886355356443043874`34.

0.156434465040230869010105319467166892313899892038`32.39222789922116

0.157079632679489661923132169163975144209858469919`32.38863526195939

8.999999999999999999999999999999999999999999997149`32.378135882061486

0``31.4238932688884

UPDATE: with 39 digits precision requested:

Code:

"STANDARD DEG MODE TEST"

x = 9.0`39

pii = N[Pi, 39]

torad = N[pii / 180.0`39, 39]

todeg = N[180.0`39 / pii, 39]

s = N[Sin[x * torad], 39]

c = N[Cos[s * torad], 39]

t = N[Tan[c * torad], 39]

at = N[todeg * ArcTan[t], 39]

ac = N[todeg * ArcCos[at], 39]

as = N[todeg * ArcSin[ac], 39]

d = N[as - 9.0`39, 39]

[9.0]  9.`39.

3.141592653589793238462643383279502884197169399375`39.

0.017453292519943295769236907684886127134428718885416636531`38.69897000433602

57.295779513082320876798154814105170332405472466566`38.69897000433602

[SIN]  0.156434465040230869010105319467166892313899892085655293893`38.526471382542​105

[COS]  0.999996272742885024117516205011350248686695280675540360027`39.

[TAN]  0.017454999855488660791394140928348469199263126372188621527`38.522790546579​1

[ATAN] 0.999996272742885024117516205011350248686695280675`38.30102999566398

[ACOS] 0.156434465040230869010105319467166892313899897994`33.17344939772978

[ASIN] 9.000000000000000000000000000000000000000000342759`33.16985547615635

0``32.21561232456263

This is the result I would expect Smile