Threaded Mode | Linear Mode

Gerson W. Barbosa · 02-10-2014, 09:25 PM

This is an RPL adaptation of Valentin Albillo's FFTMULT program for the HP-71B:

.pdf

DatafileVA013. (HP-71B Fantastic FOUR).pdf (Size: 109.47 KB / Downloads: 50)

That's a straightforward adaptation, except that I've used IFFT which the HP-71B lacks.

Code:

%%HP: T(3)A(D)F(.);

\<< DUP2 SIZE SWAP

SIZE + ROT ROT DUP2

SIZE SWAP SIZE DUP2

MAX LN 2 LN / DUP

FP NOT NOT SWAP IP

+ 2 SWAP ^ DUP +

ROT OVER - NEG ROT

ROT - NEG ROT ROT 1

SWAP

  START 0 +

  NEXT SWAP ROT 1

ROT

  START 0 +

  NEXT OBJ\-> 1 \->LIST

\->ARRY FFT OBJ\-> 1

GET \->LIST SWAP OBJ\->

1 \->LIST \->ARRY FFT

OBJ\-> 1 GET \->LIST

  \<< *

  \>> DOLIST OBJ\-> 1

\->LIST \->ARRY IFFT

OBJ\-> 1 GET \->LIST 1

ROT SUB RE IP

REVLIST DUP SIZE 1

- 1 SWAP

  FOR i i GETI ROT

ROT GETI SWAP DROP

ROT OVER 10000 MOD

+ OVER 10000 / IP

OVER 10000 / IP +

SWAP 10000 MOD ROT

DROP SWAP ROT ROT i

SWAP PUTI ROT PUTI

DROP

  NEXT REVLIST

\>>

Here is an example from Valentin's article:

{ 5476 1407 } { 8613 2724 } FFTMULT -->

{ 4716 7491 5498 2668 }

This matches Valentin's result. However, I've found a discrepancy in this example:

{ 1234 5678 9012 } { 3456 7890 } FFTMULT -->

{ 426 7639 7023 2002 4680 }

This should have been { 426 7640 7023 2002 4680 }, as we would get using Valentin's algorithm:

10 OPTION BASE 1 @ DIM X(3),Y(2),Z(5)
15 READ X,Y
20 CALL FFTMULT(X,Y,Z) @ MAT DISP Z
100 END
200 DATA 1234,5678,9012,3456,7890
900 SUB FFTMULT(A(),B(),C()) @ I=UBND(A,1) @ J=UBND(B,1)
910 K=I+J @ COMPLEX X(I),Y(J) @ MAT X=A @ MAT Y=B
920 N=LOG2(MAX(I,J)) @ E=10000 @ N=2^(INT(N)+(FP(N)#0))
930 M=2*N @ COMPLEX X(M),Y(M),Z(M) @ MAT X=FOUR(X)
940 MAT Y=FOUR(Y) @ FOR I=1 TO M @ Z(I)=X(I)*Y(I) @ NEXT I
950 COMPLEX Z(M,1) @ MAT Z=TRN(Z) @ MAT Z=FOUR(Z)
960 MAT Z=TRN(Z) @ MAT Z=(M*M)*Z @ COMPLEX Z(M)
965 MAT DISP Z @ DISP
970 FOR I=1 TO K @ C(I)=IROUND(REPT(Z(I))) @ NEXT I
980 FOR I=K-1 TO 1 STEP -1 @ J=I+1 @ C(J)=C(J)+MOD(C(I),E)
990 C(I)=C(I) DIV E+C(J) DIV E @ C(J)=MOD(C(J),E) @ NEXT I

>RUN
(4264704.00002,-0)
(29359428.0003,.000000128)
(75944892,-0)
(71104680,.000000512)
(-.000016,-0)
(-.00025024,-.000000128)
(0,-0)
(-.0002352,-.000000512)

426
7640
7023
2002
4680

As a comparison, the first element of the inverse Fourier transform is computed as (4264703.99998, 0) by the HP-48 calculator, which suggests is should be actually (4264704, 0). Perhaps the addition of a suitable quantity to the real part of these values should fix the problem and allow for chunks of 5 digits, but I haven't tried it yet.
Another issue is speed. My HP-48GX is about 50% slower than the HP-71B when multiplying two 512-digit numbers (173 seconds versus 117 seconds). But of course the RPL program can be improved, as you can see by the code.
Anyway, when these issues are solved (all of you who are interested are invited to try) this can make for a nice tool to compute pi to a couple of thousand digits, for instance.

Gerson W. Barbosa · (This post was last modified: 02-11-2014 08:16 AM by Gerson W. Barbosa.)

(02-10-2014 09:25 PM)Gerson W. Barbosa Wrote: . My HP-48GX is about 50% slower than the HP-71B when multiplying two 512-digit numbers (173 seconds versus 117 seconds).

Two 256-digit numbers, actually, giving a 512-digit result. Sorry for the mistake.

(02-10-2014 09:25 PM)Gerson W. Barbosa Wrote: However, I've found a discrepancy in this example:

{ 1234 5678 9012 } { 3456 7890 } FFTMULT -->

{ 426 7639 7023 2002 4680 }

This should be fixed by inserting the following instructions between RE and IP in the code above, at the end of the second line after the IFFT command:

Code:

1

\<< .00005 +

\>> DOSUBS

More tests have to be made, though.

P.S.: .05 is better. No problem with 640-digit results (128 groups of 5 digits, constants replaced with 100,000), at least for the few examples I tried.

P.P.S.: Replacing RE IP with RE 0 RND is even better. Actually that's what should've been done in the first place if somehow I hadn't mistaken IROUND with IP. Since the HP-48 built-in IFFTis more accurate than the implementation using FOUR on the HP-71B it is possible groups of 5 digits rather than 4 can be used for up to 1000-digit results. A few empirical tests, as described in the last page of Valentin's article, should be made before we can be sure, however. Here is the fixed code:

Code:

%%HP: T(3)A(D)F(.);

\<< DUP2 SIZE SWAP

SIZE + ROT ROT DUP2

SIZE SWAP SIZE DUP2

MAX LN 2 LN / DUP

FP NOT NOT SWAP IP

+ 2 SWAP ^ DUP +

ROT OVER - NEG ROT

ROT - NEG ROT ROT 1

SWAP

  START 0 +

  NEXT SWAP ROT 1

ROT

  START 0 +

  NEXT OBJ\-> 1 \->LIST

\->ARRY FFT OBJ\-> 1

GET \->LIST SWAP OBJ\->

1 \->LIST \->ARRY FFT

OBJ\-> 1 GET \->LIST

  \<< *

  \>> DOLIST OBJ\-> 1

\->LIST \->ARRY IFFT

OBJ\-> 1 GET \->LIST 1

ROT SUB RE 0 RND

REVLIST DUP SIZE 1

- 1 SWAP

  FOR i i GETI ROT

ROT GETI SWAP DROP

ROT OVER 10000 MOD

+ OVER 10000 / IP

OVER 10000 / IP +

SWAP 10000 MOD ROT

DROP SWAP ROT ROT i

SWAP PUTI ROT PUTI

DROP

  NEXT REVLIST

\>>

Raymond Del Tondo · 02-11-2014, 09:04 AM

Hi,

(02-10-2014 10:02 PM)Gerson W. Barbosa Wrote:
(02-10-2014 09:25 PM)Gerson W. Barbosa Wrote: . My HP-48GX is about 50% slower than the HP-71B when multiplying two 512-digit numbers (173 seconds versus 117 seconds).

Two 256-digit numbers, actually, giving a 512-digit result. Sorry for the mistake.

You could speed up things by trying to avoid object type changes where possible.

The snippet below shows some time-consuming actions from your code.

(02-10-2014 10:02 PM)Gerson W. Barbosa Wrote:

Code:

START 0 + NEXT SWAP ROT 1 ROT START 0 + NEXT OBJ\-> 1 \->LIST \->ARRY

If you work with arrays anyway, you could try to avoid building lists. Creating and decomposing lists is very time-consuming.
If possible, meta obs are recommended.

I didn't dive into the code too deeply, but I'm sure the functionality can be made faster on an HP 48GX than any code giving the same results on an HP-71B;-)

Gerson W. Barbosa · 02-11-2014, 03:34 PM

(02-11-2014 09:04 AM)Raymond Del Tondo Wrote: Hi,

(02-10-2014 10:02 PM)Gerson W. Barbosa Wrote: Two 256-digit numbers, actually, giving a 512-digit result. Sorry for the mistake.

You could speed up things by trying to avoid object type changes where possible.

The snippet below shows some time-consuming actions from your code.

(02-10-2014 10:02 PM)Gerson W. Barbosa Wrote:

Code:

START 0 + NEXT SWAP ROT 1 ROT START 0 + NEXT OBJ\-> 1 \->LIST \->ARRY

If you work with arrays anyway, you could try to avoid building lists. Creating and decomposing lists is very time-consuming.
If possible, meta obs are recommended.

Thanks for your tips!

I'd rather use vectors only, but lists are more convenient for certain tasks (or I just don't know how to do them using vectors).

(02-11-2014 09:04 AM)Raymond Del Tondo Wrote: I didn't dive into the code too deeply, but I'm sure the functionality can be made faster on an HP 48GX than any code giving the same results on an HP-71B;-)

I was disappointed with the HP-48GX, but now I see I can blame it on my sloppy adaptation. It surely can do better than this:

# Groups Digits (results) Time 71B Time 48GX
16 128 24.11 14.39
32 256 53.41 40.95
64 512 116.74 199.96

Also, I suspect the performance decrease as the number of digits grow is due to lack of memory, as I have only 12 K left (and only one HP-48GX).

Regards,

Gerson.

Christoph Giesselink · 02-11-2014, 09:22 PM

I agree with Raymond, that using other data types improves the speed significantly.

But what about using SystemRPL? Using SystemRPL throwing away the OOP overhead and protection stuff improving also the execution speed. It's never too late to learn SystemRPL. ;-)

Moreover, without SystemRPL we may not have emulators like Emu28, Emu42 and Emu71/Win. I began working on Emu48 in 1997 because I wrote SystemRPL programs since 1995 and tested them on my real HP48SX. After each of such a debugging session I always had to restore the memory content. With Kermit this took ~30 minutes. So I was looking for a simple way to restore data quickly. The problem was, that Emu48 0.99 and Emu48 1.0 wasn't working very well on Windows NT4, the story began...

Gerson W. Barbosa · 02-11-2014, 10:48 PM

(02-11-2014 09:22 PM)Christoph Giesselink Wrote: I agree with Raymond, that using other data types improves the speed significantly.

But what about using SystemRPL? Using SystemRPL throwing away the OOP overhead and protection stuff improving also the execution speed. It's never too late to learn SystemRPL. ;-)

Hey, I like that one, "It's never too late to learn XXXX". Much better than "Life is too short to learn German" (who said that?) :-)
I regret never having tried to learn Saturn Assembly or SysRPL back in the day. Now that the HP-48 has been discontinued and the HP-50g is fast enough I don't feel motivated. But I have a notion of what I missed...
Sticking to User-RPL I have started to rewrite the program following Raymond's advice. Indeed the delay is due to switching to and from lists and vectors. FFT on the HP-48GX takes about 1/6 of the time it takes FOUR on the HP-71B, exactly what one should expect just considering the clock rates of both devices. I've only just begun and running time has dropped from 200 to 175 seconds for 256-digit multiplications. The final array to vector conversion alone takes 15 seconds:

Code:

%%HP: T(3)A(D)F(,);

\<< DUP2 SIZE OBJ\->

DROP SWAP SIZE OBJ\->

DROP DUP2 MAX ROT

ROT + 4 ROLLD LN 2

LN / DUP FP NOT NOT

SWAP IP + 2 SWAP ^

DUP + DUP ROT SWAP

1 \->LIST RDM ROT ROT

1 \->LIST RDM FFT

OBJ\-> 1 GET \->LIST

SWAP FFT OBJ\-> 1 GET

\->LIST

  \<< *

  \>> DOLIST OBJ\-> 1

\->LIST \->ARRY IFFT

OBJ\-> 1 GET \->LIST 1

ROT SUB RE 0 RND

REVLIST DUP SIZE 1

- 1 SWAP

  FOR i i GETI ROT

ROT GETI SWAP DROP

ROT OVER 10000 MOD

+ OVER 10000 / IP

OVER 10000 / IP +

SWAP 10000 MOD ROT

DROP SWAP ROT ROT i

SWAP PUTI ROT PUTI

DROP

  NEXT REVLIST OBJ\->

1 \->LIST \->ARRY

\>>

I wasn't aware there was an array re-dimension command, RDM. Sometimes is pays to read the manual :-)

(02-11-2014 09:22 PM)Christoph Giesselink Wrote: Moreover, without SystemRPL we may not have emulators like Emu28, Emu42 and Emu71/Win. I began working on Emu48 in 1997 because I wrote SystemRPL programs since 1995 and tested them on my real HP48SX. After each of such a debugging session I always had to restore the memory content. With Kermit this took ~30 minutes. So I was looking for a simple way to restore data quickly. The problem was, that Emu48 0.99 and Emu48 1.0 wasn't working very well on Windows NT4, the story began...

I should thank you again for all those emulators, especially Emu48. Too used to RPN, I would make mistakes all the time when using the Windows calculator at work. I even started to write an RPN calculators in Delphi (my only attempt at the language) and almost succeeded, only the EEX key was left not implemented. But then I found Emu48 which became my only calculator on the PC :-)

Best regards,

Gerson.

Gerson W. Barbosa · 02-12-2014, 02:41 AM

HP 50g version:

Code:

%%HP: T(3)A(R)F(.);

\<< DUP2 SIZE OBJ\-> DROP SWAP SIZE OBJ\-> DROP DUP2 MAX UNROT + 4 ROLLD LN 2 LN / DUP FP NOT NOT SWAP IP + 2 SWAP ^ DUP + DUP ROT SWAP 1 \->LIST RDM UNROT 1 \->LIST RDM FFT SWAP FFT HADAMARD IFFT AXL 1 ROT SUB RE 0 RND REVLIST DUP SIZE 1 - 1 SWAP

  FOR i i GETI ROT ROT GETI NIP ROT OVER 10000 MOD + OVER 10000 / IP OVER 10000 / IP + SWAP 10000 MOD ROT DROP SWAP UNROT i SWAP PUTI ROT PUTI DROP

  NEXT REVLIST AXL

\>>

[ 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. ]

[ 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. ]

<< FFTMULT>> TEVAL -->

[ 152. 4157. 8753. 2388. 2031. 7024. 9644. 8712. 3910. 9892. 536. 5036. 5790. 2759. 1428. 1360. 7669. 5626. 2319. 7684. 9548. 8493. 3211. 4009. 1428. 1360. 4103. 333. 3307. 4227. 4994. 6657. 5186. 7094. 5886. 2981. 7065. 9961. 6777. 9305. 8945. 2828. 7669. 5630. 824. 5695. 8561. 1954. 2703. 8562. 9452. 8278. 4583. 1430. 344. 4602. 6462. 4297. 1236. 926. 8341. 7164. 2127. 7250. 9916. 1715. 5512. 8798. 8036. 8848. 4621. 2474. 6157. 5981. 3729. 6150. 4278. 3114. 2837. 9826. 2399. 247. 1946. 3502. 519. 7380. 1054. 7177. 8640. 4513. 163. 853. 6761. 1645. 9271. 4529. 4881. 8778. 8379. 8205. 3002. 5911. 7488. 1881. 1123. 3044. 6596. 5556. 9244. 177. 5704. 9232. 7364. 7310. 4813. 2908. 5485. 4443. 3921. 6584. 3606. 1576. 3030. 260. 1726. 8709. 2138. 3936. ]

s:41.0499

The version using <<*>> DOLIST instead of HADAMARD for element by element vector multiplication is about 2 seconds faster for numbers this size, despite vector to list to vector conversions. At this point double input sizes imply in triple running times, which is strange. Sure the final part needs more optimization, but I don't see why this behavior occurs.

Ángel Martin · 02-12-2014, 12:51 PM

Quote:Hey, I like that one, "It's never too late to learn XXXX". Much better than "Life is too short to learn German" (who said that?) :-)

Here's a flippant off-subject quote that may shed some light to this issue :-)
One of my favorites essays from Mark Twain, if I may add - Sam Clemens at his best.

http://en.wikipedia.org/wiki/The_Awful_German_Language

Auf wieder-lessen!

Han · (This post was last modified: 02-12-2014 07:11 PM by Han.)

Gerson's post already alluded to this, but there is a HUGE difference between computing with integers and non-integers on the HP50G. Since the HP49G, integers were considered a new object. That is, 2 is very different from 2. from an object-type point of view. So you will see a dramatic increase in speed just by switching your input from integer values to "integers-with-a-trailing-decimal-point."

The code Gerson posted could be made even faster by changing all the integers to decimals. 1 -> 1. and 2 -> 2. etc.

Gerson W. Barbosa · 02-12-2014, 07:38 PM

(02-12-2014 07:09 PM)Han Wrote: Gerson's post already alluded to this, but there is a HUGE difference between computing with integers and non-integers on the HP50G. Since the HP49G, integers were considered a new object. That is, 2 is very different from 2. from an object-type point of view. So you will see a dramatic increase in speed just by switching your input from integer values to "integers-with-a-trailing-decimal-point."

You're right! From 41 down to 30 seconds in the above example, no matter the 50g is in exact or approximate mode. Thanks for reminding!

(02-12-2014 07:09 PM)Han Wrote: The code Gerson posted could be made even faster by changing all the integers to decimals. 1 -> 1. and 2 -> 2. etc.

The easier way to do this is copying the program to the stack and saving it again in approximate mode. All integer constants will automatically change to float.

Regards,

Gerson.

Gerson W. Barbosa · (This post was last modified: 02-12-2014 08:16 PM by Gerson W. Barbosa.)

(02-12-2014 12:51 PM)Ángel Martin Wrote:
Quote:Hey, I like that one, "It's never too late to learn XXXX". Much better than "Life is too short to learn German" (who said that?) :-)

Here's a flippant off-subject quote that may shed some light to this issue :-)
One of my favorites essays from Mark Twain, if I may add - Sam Clemens at his best.

http://en.wikipedia.org/wiki/The_Awful_German_Language

Auf wieder-lessen!

Hola, Ángel!

I knew the story already, but thanks for the link! I don't think Mark Twain's complaints are fair, I am sure he tried to learn German through a completely wrong method, that is, paying too much attention to grammar matters. I learned English in five months when I was 15, attending 1-hour classes twice a week (ok, two more semesters to consolidate it and still learning it to date :-) During those 18 months no word about Grammar from my Belgian teacher, no books, no copybooks -- just talking. I only regret not having attended his French and German classes, but then again that would have been too expensive for my parents.

¡Hasta la leída!

Gerson.

Raymond Del Tondo · 02-13-2014, 01:16 AM

(02-12-2014 02:41 AM)Gerson W. Barbosa Wrote: [ 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. ]

[ 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. ]

<< FFTMULT>> TEVAL -->

[ 152. 4157. 8753. 2388. 2031. 7024. 9644. 8712. 3910. 9892. 536. 5036. 5790. 2759. 1428. 1360. 7669. 5626. 2319. 7684. 9548. 8493. 3211. 4009. 1428. 1360. 4103. 333. 3307. 4227. 4994. 6657. 5186. 7094. 5886. 2981. 7065. 9961. 6777. 9305. 8945. 2828. 7669. 5630. 824. 5695. 8561. 1954. 2703. 8562. 9452. 8278. 4583. 1430. 344. 4602. 6462. 4297. 1236. 926. 8341. 7164. 2127. 7250. 9916. 1715. 5512. 8798. 8036. 8848. 4621. 2474. 6157. 5981. 3729. 6150. 4278. 3114. 2837. 9826. 2399. 247. 1946. 3502. 519. 7380. 1054. 7177. 8640. 4513. 163. 853. 6761. 1645. 9271. 4529. 4881. 8778. 8379. 8205. 3002. 5911. 7488. 1881. 1123. 3044. 6596. 5556. 9244. 177. 5704. 9232. 7364. 7310. 4813. 2908. 5485. 4443. 3921. 6584. 3606. 1576. 3030. 260. 1726. 8709. 2138. 3936. ]

s:41.0499

Hi, I just made a non-optimized SysRPL version derived from your previous prog version, which runs the above arrays in about 67 seconds on a normal HP 48GX.

There's _much_ room for streamlining, which I may do tomorrow;-)

Cheers

Raymond

Gerson W. Barbosa · (This post was last modified: 02-13-2014 03:19 AM by Gerson W. Barbosa.)

(02-13-2014 01:16 AM)Raymond Del Tondo Wrote:
(02-12-2014 02:41 AM)Gerson W. Barbosa Wrote: [ 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. ]

[ 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. 1234. 5678. 9012. 3456. ]

<< FFTMULT>> TEVAL -->

[ 152. 4157. 8753. 2388. 2031. 7024. 9644. 8712. 3910. 9892. 536. 5036. 5790. 2759. 1428. 1360. 7669. 5626. 2319. 7684. 9548. 8493. 3211. 4009. 1428. 1360. 4103. 333. 3307. 4227. 4994. 6657. 5186. 7094. 5886. 2981. 7065. 9961. 6777. 9305. 8945. 2828. 7669. 5630. 824. 5695. 8561. 1954. 2703. 8562. 9452. 8278. 4583. 1430. 344. 4602. 6462. 4297. 1236. 926. 8341. 7164. 2127. 7250. 9916. 1715. 5512. 8798. 8036. 8848. 4621. 2474. 6157. 5981. 3729. 6150. 4278. 3114. 2837. 9826. 2399. 247. 1946. 3502. 519. 7380. 1054. 7177. 8640. 4513. 163. 853. 6761. 1645. 9271. 4529. 4881. 8778. 8379. 8205. 3002. 5911. 7488. 1881. 1123. 3044. 6596. 5556. 9244. 177. 5704. 9232. 7364. 7310. 4813. 2908. 5485. 4443. 3921. 6584. 3606. 1576. 3030. 260. 1726. 8709. 2138. 3936. ]

s:41.0499
Hi, I just made a non-optimized SysRPL version derived from your previous prog version, which runs the above arrays in about 67 seconds on a normal HP 48GX.

There's _much_ room for streamlining, which I may do tomorrow;-)

Cheers

Raymond

In order to avoid lists I tried the following in the latest HP-48GX program:

Code:

DUP SIZE OBJ\-> DROP 

1 SWAP

  FOR i OVER i GET 

OVER i GET * i SWAP

PUT

  NEXT SWAP DROP

This replaces the code between the second FFT and IFFT commands and is equivalent to HADAMARD in the hp 50g program, but takes much longer (230 versus 175 seconds in the example I was using), so I kept ... \<< * \>> DOLIST ...

67 seconds is pretty good considering the hp 50g does is 30 seconds.

Cheers,

Gerson.

Gerson W. Barbosa · 02-14-2014, 12:07 AM

(02-13-2014 01:16 AM)Raymond Del Tondo Wrote: Hi, I just made a non-optimized SysRPL version derived from your previous prog version, which runs the above arrays in about 67 seconds on a normal HP 48GX.

There's _much_ room for streamlining, which I may do tomorrow;-)

Cheers

Raymond

Hello Raymond,

You might want to try the latest UserRPL version:

Code:

%%HP: T(3)A(D)F(,);

\<< DUP2 SIZE OBJ\->

DROP SWAP SIZE OBJ\->

DROP DUP2 MAX ROT

ROT + 4 ROLLD LN 2

LN / DUP FP NOT NOT

SWAP IP + 2 SWAP ^

DUP + DUP ROT SWAP

1 \->LIST RDM ROT ROT

1 \->LIST RDM FFT

OBJ\-> 1 GET \->LIST

SWAP FFT OBJ\-> 1 GET

\->LIST

  \<< *

  \>> DOLIST OBJ\-> 1

\->LIST \->ARRY IFFT 1

ROT SUB RE 0 RND

DUP SIZE OBJ\-> DROP

1 - 1

  FOR i i GETI

10000 MOD 3 PICK

ROT DUP ROT ROT GET

ROT + PUT i GETI

10000 / IP 3 PICK

ROT GET 10000 / IP +

i SWAP PUTI OVER

OVER GET 10000 MOD

PUT -1

  STEP

\>>

It does the same in about 52 seconds, timed with this program:

Code:

%%HP: T(3)A(D)F(,);

\<< TICKS ROT ROT

FFTM TICKS ROT -

B\->R 8192 /

\>>

The previous version took about 175 seconds. No significant gain on the hp 50g with this new version, though (29 s --> 26.5 s).

Code:

%%HP: T(3)A(D)F(.);

\<< DUP2 SIZE OBJ\-> DROP SWAP SIZE OBJ\-> DROP DUP2 MAX UNROT + 4. ROLLD LN 2. LN / DUP FP NOT NOT SWAP IP + 2. SWAP ^ DUP + DUP ROT SWAP 1. \->LIST RDM UNROT 1. \->LIST RDM FFT SWAP FFT HADAMARD IFFT 1. ROT SUB RE 0. RND DUP SIZE OBJ\-> DROP 1. - 1.

  FOR i i GETI 10000. MOD PICK3 ROT DUP UNROT GET ROT + PUT i GETI 10000. / IP PICK3 ROT GET 10000. / IP + i SWAP PUTI OVER OVER GET 10000. MOD PUT -1.

  STEP

\>>

Thanks for you suggestions and advice!

Gerson.

Raymond Del Tondo · 02-14-2014, 12:34 AM

(02-13-2014 03:16 AM)Gerson W. Barbosa Wrote: 67 seconds is pretty good considering the hp 50g does is 30 seconds.

Hi,

now my SysRPL version runs in about 22.9 seconds for the two 128 element arrays on a normal HP 48GX.

The most time-consuming parts are now the two FFT and the IFFT calls.
The final loop, which was the slowest part in the original program, now needs about 3 seconds only;-)

One could squeeze out another 2 seconds by using integer arithmetic in the final loop if applicable...

I'll post the commented SysRPL source tonight or tomorrow.

Cheers

Raymond

Gerson W. Barbosa · (This post was last modified: 02-14-2014 01:45 AM by Gerson W. Barbosa.)

(02-14-2014 12:34 AM)Raymond Del Tondo Wrote: The most time-consuming parts are now the two FFT and the IFFT calls.
The final loop, which was the slowest part in the original program, now needs about 3 seconds only;-)

I think this has been fixed in my lastest version above, but optimization is still possible in the final loop, I admit. It appears the use of the lists, especially in the final loop, was causing the lack of linearity in the size vs. running times table we would expect from FFT multiplication. The previous hp 50g program would not complete a 128-element size multiplication, at least not under 30 minutes, when I quit waiting. Now it takes only 65 s. Here is a more complete table:

Code:

# elements     HP-71B     HP-48GX     hp 50g    

     16         24.11      10.72        5.35

     32         53.41      23.97       12.45      

     64        116.74      52.11       26.47

    128        269.89     119.82 (*)   65.35       

    256                                148.84

(*) estimated

Perhaps 5-digit groups are possible up to that size, considering IFFT is performed with full-accuracy. If not in UserRPL at least in your SysRPL version. This is worth checking.

Cheers,

Gerson.

Raymond Del Tondo · (This post was last modified: 02-15-2014 11:27 PM by Raymond Del Tondo.)

Hi,

as promised, here's the beef:-)

As written, the code runs in ~22.9 seconds for the two arrays with 64 elements each.

Have fun;-)

Code:

*****************************************************************************

* Name         :    FFTMULT - FFT Multiplicator

* Modultype  :    Secondary

* Dest.Comp. :    HP-48

* Language   :    System RPL

* Author     :    Raymond Del Tondo

* Interface  :    [%] [%]'  -->  [%]''

*

* Notes         :    It will run on all versions of the HP48 G Series (tested on rev P-R).

*

*        This program has been inspired by a UserRPL program by Gerson W. Barbosa,

*        which he published here: http://www.hpmuseum.org/forum/thread-649.html

*

* [ 5476 1407 ] [ 8613 2724 ] FFTMULT    -->    [ 4716 7491 5498 2668 ]

*

* [ 1234 5678 9012 ] [ 3456 7890 ]    -->    [ 426 7640 7023 2002 4680 ]

*

* The timings below are given for 2 arrays with 64 real numbers each.

* See forum thread for actual arrays.

*

* Ed.Hist.    Date        Who    What

*        13.02.2014    RDT    Start

*        14.02.2014    RDT    Speed up everything:-)

*****************************************************************************

    INCLUDE    D:\RPL\TL\HEADER.H

ASSEMBLE

=ARRYLP_DO    EQU    #37BCB

=PULLEL[S]    EQU    #3558E

=RNDX#        EQU    #2B551

=XEQ>VEC    EQU    #1D02C

=XEQNOT        EQU    #1E8D9

RPL

DEFINE    fft        ROMPTR 0C4 000

DEFINE    ifft        ROMPTR 0C4 002

::  2DUP ARSIZE SWAP ARSIZE        ( *A B SizeB SizeA* )

    2DUP #MAX                ( *A B SizeB SizeA MAX[SizeB SizeA]* )

    UNROT                ( *A B MAX[SizeB SizeA] SizeB SizeA* )

    #+                    ( *A B MAX[SizeB SizeA] SizeB+A* )

    1LAMBIND                ( *Save away SizeB+A* )

    UNCOERCE %LN %2 %LN %/        ( *A B LN[MxAB]/LN[2]* )

    DUP %FP                ( *A B LN[MxAB]/LN[2] FP[LN[MxAB]/LN[2]]* )

    XEQNOT XEQNOT            ( *A B LN[MxAB]/LN[2] FP[LN[MxAB]/LN[2]]'* )

    SWAP %IP                ( *A B FP[LN[MxAB]/LN[2]]' IP[LN[MxAB]/LN[2]]* )

    %+                    ( *A B [FP[LN[MxAB]/LN[2]]']+[IP[LN[MxAB]/LN[2]]]* )

    %2 SWAP %^                ( *A B 2^[[FP[LN[MxAB]/LN[2]]']+[IP[LN[MxAB]/LN[2]]]]* )

    DUP %+                ( *A B 2_x_2^[[FP[LN[MxAB]/LN[2]]']+[IP[LN[MxAB]/LN[2]]]]* )

    %ABSCOERCE

* From here: 2xDim := 2_x_2^[[FP[LN[MxAB]/LN[2]]']+[IP[LN[MxAB]/LN[2]]]]

    DUP ROT SWAP            ( *A 2xDim B 2xDim* )

    ONE{}N                ( *A 2xDim B 2xDim* )

    MATREDIM                ( *A 2xDim B'* )

    UNROT                ( *B' A 2xDim* )

    ONE{}N                ( *B' A 2xDim* )

    MATREDIM                ( *B' A'* )

    fft

    SWAP fft                ( *fftA' fftB'* )

* Up to here the code runs about 11 seconds...

    ARRYLP_DO (DO)

    INDEX@ PULLEL[S]        ( *fftA' fftB' C%* )

    ROT                ( *fftB' C% fftA'* )

    INDEX@ PULLEL[S]        ( *fftB' C% fftA' C%* )

    ROT                ( *fftB' fftA' C% C%* )

    x*                ( *fftB' fftA' C%'* )

    UNROT                ( *C%' fftB' fftA'* )

    LOOP

    DROP

    ARSIZE

* Up to here the code runs about 14 seconds...

    UNCOERCE XEQ>VEC

    ifft

* Up to here the code runs about 19.8 seconds for new loop

    ARRYLP_DO (DO)

    INDEX@ PULLEL[S]

    DUPTYPECMP? IT

        C>Re%            ( *Complex to real* )

    ONE RNDX#            ( *Cut off fractional part* )

    SWAP

    LOOP

    ARSIZE

*Here: %n..%1 #n

    1GETLAM                ( *%n..%1 #n SizeB+A* )

    #-                    ( *%n..%1 #n-[SizeB+A]* )

    NDROP                ( *%n..%1 Dropped superfluous elements* )

    1GETLAM                ( *%n..%1' SizeB+A* )

    ONE_DO (DO)

    OVER                ( *ob2 ob1 ob2* )

    % 10000    %MOD %+            ( *ob2 ob1+MOD[ob2,10k]* )

    SWAP % 10000 %/ %IP        ( *ob1+MOD[ob2,10k] IP[ob2/10k]* )

    OVER                ( *ob1+MOD[ob2,10k] IP[ob2/10k] ob1+MOD[ob2,10k]* )

    % 10000 %/ %IP %+        ( *ob1+MOD[ob2,10k] IP[ob2/10k]+IP[[ob1+MOD[ob2,10k]]/10k]* )

    SWAP                ( *IP[ob2/10k]+IP[[ob1+MOD[ob2,10k]]/10k] ob1+MOD[ob2,10k]* )

    % 10000 %MOD            ( *IP[ob2/10k]+IP[[ob1+MOD[ob2,10k]]/10k] MOD[ob1+MOD[ob2,10k],10k]* )

    ISTOP@ UNROLL

    LOOP

    1GETLAM UNROLL            ( *%n'..%1'* )

    1GETABND                ( *%n'..%1' SizeB+A* )

    UNCOERCE XEQ>VEC

;

[Edited to correct binary (no debug info) and element number (64 instead of 128)]

Cheers

Raymond

Gerson W. Barbosa · 02-15-2014, 01:22 AM

(02-14-2014 10:37 PM)Raymond Del Tondo Wrote: Hi,

as promised, here's the beef:-)

As written, the code runs in ~22.9 seconds for the two arrays with 128 elements each.

Have fun;-)

Code:

***************************************************************************** * Name : FFTMULT - FFT Multiplicator * Modultype : Secondary * Dest.Comp. : HP-48 * Language : System RPL * Author : Raymond Del Tondo * Interface : [%] [%]' --> [%]'' * * Notes : It will run on all versions of the HP48 G Series (tested on rev P-R). * * This program has been inspired by a UserRPL program by Gerson W. Barbosa, * which he published here: http://www.hpmuseum.org/forum/thread-649.html * * [ 5476 1407 ] [ 8613 2724 ] FFTMULT --> [ 4716 7491 5498 2668 ] * * [ 1234 5678 9012 ] [ 3456 7890 ] --> [ 426 7640 7023 2002 4680 ] * * The timings below are given for 2 arrays with 128 real numbers each. * See forum thread for actual arrays. * * Ed.Hist. Date Who What * 13.02.2014 RDT Start * 14.02.2014 RDT Speed up everything:-) ***************************************************************************** INCLUDE D:\RPL\TL\HEADER.H ASSEMBLE =ARRYLP_DO EQU #37BCB =PULLEL[S] EQU #3558E =RNDX# EQU #2B551 =XEQ>VEC EQU #1D02C =XEQNOT EQU #1E8D9 RPL DEFINE fft ROMPTR 0C4 000 DEFINE ifft ROMPTR 0C4 002 :: 2DUP ARSIZE SWAP ARSIZE ( *A B SizeB SizeA* ) 2DUP #MAX ( *A B SizeB SizeA MAX[SizeB SizeA]* ) UNROT ( *A B MAX[SizeB SizeA] SizeB SizeA* ) #+ ( *A B MAX[SizeB SizeA] SizeB+A* ) 1LAMBIND ( *Save away SizeB+A* ) UNCOERCE %LN %2 %LN %/ ( *A B LN[MxAB]/LN[2]* ) DUP %FP ( *A B LN[MxAB]/LN[2] FP[LN[MxAB]/LN[2]]* ) XEQNOT XEQNOT ( *A B LN[MxAB]/LN[2] FP[LN[MxAB]/LN[2]]'* ) SWAP %IP ( *A B FP[LN[MxAB]/LN[2]]' IP[LN[MxAB]/LN[2]]* ) %+ ( *A B [FP[LN[MxAB]/LN[2]]']+[IP[LN[MxAB]/LN[2]]]* ) %2 SWAP %^ ( *A B 2^[[FP[LN[MxAB]/LN[2]]']+[IP[LN[MxAB]/LN[2]]]]* ) DUP %+ ( *A B 2_x_2^[[FP[LN[MxAB]/LN[2]]']+[IP[LN[MxAB]/LN[2]]]]* ) %ABSCOERCE * From here: 2xDim := 2_x_2^[[FP[LN[MxAB]/LN[2]]']+[IP[LN[MxAB]/LN[2]]]] DUP ROT SWAP ( *A 2xDim B 2xDim* ) ONE{}N ( *A 2xDim B 2xDim* ) MATREDIM ( *A 2xDim B'* ) UNROT ( *B' A 2xDim* ) ONE{}N ( *B' A 2xDim* ) MATREDIM ( *B' A'* ) fft SWAP fft ( *fftA' fftB'* ) * Up to here the code runs about 11 seconds... ARRYLP_DO (DO) INDEX@ PULLEL[S] ( *fftA' fftB' C%* ) ROT ( *fftB' C% fftA'* ) INDEX@ PULLEL[S] ( *fftB' C% fftA' C%* ) ROT ( *fftB' fftA' C% C%* ) x* ( *fftB' fftA' C%'* ) UNROT ( *C%' fftB' fftA'* ) LOOP DROP ARSIZE * Up to here the code runs about 14 seconds... UNCOERCE XEQ>VEC ifft * Up to here the code runs about 19.8 seconds for new loop ARRYLP_DO (DO) INDEX@ PULLEL[S] DUPTYPECMP? IT C>Re% ( *Complex to real* ) ONE RNDX# ( *Cut off fractional part* ) SWAP LOOP ARSIZE *Here: %n..%1 #n 1GETLAM ( *%n..%1 #n SizeB+A* ) #- ( *%n..%1 #n-[SizeB+A]* ) NDROP ( *%n..%1 Dropped superfluous elements* ) 1GETLAM ( *%n..%1' SizeB+A* ) ONE_DO (DO) OVER ( *ob2 ob1 ob2* ) % 10000 %MOD %+ ( *ob2 ob1+MOD[ob2,10k]* ) SWAP % 10000 %/ %IP ( *ob1+MOD[ob2,10k] IP[ob2/10k]* ) OVER ( *ob1+MOD[ob2,10k] IP[ob2/10k] ob1+MOD[ob2,10k]* ) % 10000 %/ %IP %+ ( *ob1+MOD[ob2,10k] IP[ob2/10k]+IP[[ob1+MOD[ob2,10k]]/10k]* ) SWAP ( *IP[ob2/10k]+IP[[ob1+MOD[ob2,10k]]/10k] ob1+MOD[ob2,10k]* ) % 10000 %MOD ( *IP[ob2/10k]+IP[[ob1+MOD[ob2,10k]]/10k] MOD[ob1+MOD[ob2,10k],10k]* ) ISTOP@ UNROLL LOOP 1GETLAM UNROLL ( *%n'..%1'* ) 1GETABND ( *%n'..%1' SizeB+A* ) UNCOERCE XEQ>VEC ;

Cheers

Raymond

Thanks for the UserRPL version. Now, the question is: how can those of us not familiar with UserRPL compile this? I tried Debug4x but HEADER.H is missing (assuming this is a valid tool).

Cheers,

Gerson.

Raymond Del Tondo · (This post was last modified: 02-15-2014 11:17 PM by Raymond Del Tondo.)

Hi Gerson,

you can comment out the reference to Header.h .

I used a personalised version of the standard header file during development, but the "release" SysRPL source is compilable without the header ref.

I only use the HP CLI tools in the 32 bit version.

Attached is the original source file and the binary for the HP 48G series.
The source can be viewed best using an editor which supports mono-spaced fonts, like WordPad, (better) UltraEdit or (best, my choice) TSE Pro 32 .

Cheers

Gerson W. Barbosa · (This post was last modified: 02-15-2014 08:31 PM by Gerson W. Barbosa.)

(02-14-2014 10:37 PM)Raymond Del Tondo Wrote:

Code:

* Interface : [%] [%]' --> [%]''

Hello, Raymond!

On my HP-48GX (and on the emulator) things go like this: [%] [%]' --> ENTER --> [%]". Is this behavior normal?

Here is the running times table, times in seconds:

Code:

# elements     HP-71B     hp 49g      HP-48GX     hp 50g   48GX(SysRPL)

     16         24.11      10.42         9.96       5.57        4.4  (*)

     32         53.41      22.58        21.91      11.90        9.2

     64        116.74      47.86        47.64      25.16       20.6

    128        269.89     110.14       106.45      47.54       43.6

    256          --       255.87       254.00     104.42       95.4

(*) SysRPL times have been measured with a chronometer because I wasn't able to do it programmatically.

The HP-48GX times have decreased because now I have 54 K RAM free (previously I had only 11 K). The 49/50g programs are faster because I am using ...DOLIST... instead of HADAMARD:

Code:

%%HP: T(3)A(R)F(.);

\<< DUP2 SIZE OBJ\-> DROP SWAP SIZE OBJ\-> DROP DUP2 MAX UNROT + 4. ROLLD LN 2. LN / DUP FP NOT NOT SWAP IP + 2. SWAP ^ DUP + DUP ROT SWAP 1. \->LIST RDM UNROT 1. \->LIST RDM FFT OBJ\-> 1. GET \->LIST SWAP FFT OBJ\-> 1. GET \->LIST

  \<< *

  \>> DOLIST OBJ\-> 1. \->LIST \->ARRY IFFT 1. ROT SUB RE 0. RND DUP SIZE OBJ\-> DROP 1. - 1.

  FOR i i GETI 10000. MOD PICK3 ROT DUP UNROT GET ROT + PUT i GETI 10000. / IP PICK3 ROT GET 10000. / IP + i SWAP PUTI OVER OVER GET 10000. MOD PUT -1.

  STEP

\>>

The previous version using HADAMARD is more compact (299.5 bytes), but gets slower as the arrays grow larger:

Code:

# elements      hp 49g     hp 50g

     16          11.45       5.35

     32          25.04      12.45

     64          54.01      26.47

    128         131.35      65.35

    256         367.35     148.84

Cheers,

Gerson.