Hi,
as promised, here's the beef:-)
As written, the code runs in ~22.9 seconds for the two arrays with 64 elements each.
Have fun;-)
Code:
*****************************************************************************
* Name : FFTMULT - FFT Multiplicator
* Modultype : Secondary
* Dest.Comp. : HP-48
* Language : System RPL
* Author : Raymond Del Tondo
* Interface : [%] [%]' --> [%]''
*
* Notes : It will run on all versions of the HP48 G Series (tested on rev P-R).
*
* This program has been inspired by a UserRPL program by Gerson W. Barbosa,
* which he published here: http://www.hpmuseum.org/forum/thread-649.html
*
* [ 5476 1407 ] [ 8613 2724 ] FFTMULT --> [ 4716 7491 5498 2668 ]
*
* [ 1234 5678 9012 ] [ 3456 7890 ] --> [ 426 7640 7023 2002 4680 ]
*
* The timings below are given for 2 arrays with 64 real numbers each.
* See forum thread for actual arrays.
*
* Ed.Hist. Date Who What
* 13.02.2014 RDT Start
* 14.02.2014 RDT Speed up everything:-)
*****************************************************************************
INCLUDE D:\RPL\TL\HEADER.H
ASSEMBLE
=ARRYLP_DO EQU #37BCB
=PULLEL[S] EQU #3558E
=RNDX# EQU #2B551
=XEQ>VEC EQU #1D02C
=XEQNOT EQU #1E8D9
RPL
DEFINE fft ROMPTR 0C4 000
DEFINE ifft ROMPTR 0C4 002
:: 2DUP ARSIZE SWAP ARSIZE ( *A B SizeB SizeA* )
2DUP #MAX ( *A B SizeB SizeA MAX[SizeB SizeA]* )
UNROT ( *A B MAX[SizeB SizeA] SizeB SizeA* )
#+ ( *A B MAX[SizeB SizeA] SizeB+A* )
1LAMBIND ( *Save away SizeB+A* )
UNCOERCE %LN %2 %LN %/ ( *A B LN[MxAB]/LN[2]* )
DUP %FP ( *A B LN[MxAB]/LN[2] FP[LN[MxAB]/LN[2]]* )
XEQNOT XEQNOT ( *A B LN[MxAB]/LN[2] FP[LN[MxAB]/LN[2]]'* )
SWAP %IP ( *A B FP[LN[MxAB]/LN[2]]' IP[LN[MxAB]/LN[2]]* )
%+ ( *A B [FP[LN[MxAB]/LN[2]]']+[IP[LN[MxAB]/LN[2]]]* )
%2 SWAP %^ ( *A B 2^[[FP[LN[MxAB]/LN[2]]']+[IP[LN[MxAB]/LN[2]]]]* )
DUP %+ ( *A B 2_x_2^[[FP[LN[MxAB]/LN[2]]']+[IP[LN[MxAB]/LN[2]]]]* )
%ABSCOERCE
* From here: 2xDim := 2_x_2^[[FP[LN[MxAB]/LN[2]]']+[IP[LN[MxAB]/LN[2]]]]
DUP ROT SWAP ( *A 2xDim B 2xDim* )
ONE{}N ( *A 2xDim B 2xDim* )
MATREDIM ( *A 2xDim B'* )
UNROT ( *B' A 2xDim* )
ONE{}N ( *B' A 2xDim* )
MATREDIM ( *B' A'* )
fft
SWAP fft ( *fftA' fftB'* )
* Up to here the code runs about 11 seconds...
ARRYLP_DO (DO)
INDEX@ PULLEL[S] ( *fftA' fftB' C%* )
ROT ( *fftB' C% fftA'* )
INDEX@ PULLEL[S] ( *fftB' C% fftA' C%* )
ROT ( *fftB' fftA' C% C%* )
x* ( *fftB' fftA' C%'* )
UNROT ( *C%' fftB' fftA'* )
LOOP
DROP
ARSIZE
* Up to here the code runs about 14 seconds...
UNCOERCE XEQ>VEC
ifft
* Up to here the code runs about 19.8 seconds for new loop
ARRYLP_DO (DO)
INDEX@ PULLEL[S]
DUPTYPECMP? IT
C>Re% ( *Complex to real* )
ONE RNDX# ( *Cut off fractional part* )
SWAP
LOOP
ARSIZE
*Here: %n..%1 #n
1GETLAM ( *%n..%1 #n SizeB+A* )
#- ( *%n..%1 #n-[SizeB+A]* )
NDROP ( *%n..%1 Dropped superfluous elements* )
1GETLAM ( *%n..%1' SizeB+A* )
ONE_DO (DO)
OVER ( *ob2 ob1 ob2* )
% 10000 %MOD %+ ( *ob2 ob1+MOD[ob2,10k]* )
SWAP % 10000 %/ %IP ( *ob1+MOD[ob2,10k] IP[ob2/10k]* )
OVER ( *ob1+MOD[ob2,10k] IP[ob2/10k] ob1+MOD[ob2,10k]* )
% 10000 %/ %IP %+ ( *ob1+MOD[ob2,10k] IP[ob2/10k]+IP[[ob1+MOD[ob2,10k]]/10k]* )
SWAP ( *IP[ob2/10k]+IP[[ob1+MOD[ob2,10k]]/10k] ob1+MOD[ob2,10k]* )
% 10000 %MOD ( *IP[ob2/10k]+IP[[ob1+MOD[ob2,10k]]/10k] MOD[ob1+MOD[ob2,10k],10k]* )
ISTOP@ UNROLL
LOOP
1GETLAM UNROLL ( *%n'..%1'* )
1GETABND ( *%n'..%1' SizeB+A* )
UNCOERCE XEQ>VEC
;
[Edited to correct binary (no debug info) and element number (64 instead of 128)]
Cheers
Raymond