FFT Multiplication (HP-48G/GX/G+)
02-15-2014, 09:41 PM
Post: #21
 Raymond Del Tondo Member Posts: 283 Joined: Dec 2013
RE: FFT Multiplication (HP-48G/GX/G+)
(02-15-2014 08:26 PM)Gerson W. Barbosa Wrote:
(02-14-2014 10:37 PM)Raymond Del Tondo Wrote:
Code:
 * Interface  :    [%] [%]'  -->  [%]''
On my HP-48GX (and on the emulator) things go like this: [%] [%]' --> ENTER --> [%]". Is this behavior normal?
No, the stack diagram usually lists the inputs, the way and the output.
The program expects two arrays, and returns another array.

(02-15-2014 08:26 PM)Gerson W. Barbosa Wrote:  Here is the running times table, times in seconds:

Code:
 # elements     HP-71B     hp 49g      HP-48GX     hp 50g   48GX(SysRPL)      16         24.11      10.42         9.96       5.57        4.4  (*)      32         53.41      22.58        21.91      11.90        9.2      64        116.74      47.86        47.64      25.16       20.6     128        269.89     110.14       106.45      47.54       43.6     256          --       255.87       254.00     104.42       95.4

(*) SysRPL times have been measured with a chronometer because I wasn't able to do it programmatically.

The HP-48GX times have decreased because now I have 54 K RAM free (previously I had only 11 K).
You could use a simple TICKS wrapper, very similar to the tEVAL I use.

Thanks for the correction about the element count. For some reason I thought the example arrays consisted of 128 elements, but it was 64 only. My fault.

So the timings in my last posts actually were for two arrays of 64 elements each.
Still not bad for the brave old HP 48, and still slightly faster than the 50g;-)

Cheers

-- Ray
02-15-2014, 10:19 PM (This post was last modified: 02-15-2014 10:20 PM by Gerson W. Barbosa.)
Post: #22
 Gerson W. Barbosa Senior Member Posts: 1,410 Joined: Dec 2013
RE: FFT Multiplication (HP-48G/GX/G+)
(02-15-2014 09:41 PM)Raymond Del Tondo Wrote:
(02-15-2014 08:26 PM)Gerson W. Barbosa Wrote:  On my HP-48GX (and on the emulator) things go like this: [%] [%]' --> ENTER --> [%]". Is this behavior normal?
No, the stack diagram usually lists the inputs, the way and the output.
The program expects two arrays, and returns another array.

This is what I get, when I enter two arrays, for instance:

[1234]
[5678]
FFTMLTS2 -->

3: 7006652
2: 0
1: <2d>

Upon pressing ENTER, I finally get

[ 700 6652 ]

Even calling FFTMLTS2 inside a program, I always get a screen like that. That's why I can't time it by using TICKS or by any other means. Something wrong? Thanks!

Gerson.
02-15-2014, 11:23 PM
Post: #23
 Raymond Del Tondo Member Posts: 283 Joined: Dec 2013
RE: FFT Multiplication (HP-48G/GX/G+)
Hi Gerson,

(02-15-2014 10:19 PM)Gerson W. Barbosa Wrote:  Something wrong?
Thanks for the hint. The binary in the attachment still has some debug info (like the stack view included). I updated the attachment by including a binary without debug info. Seems it was too late in the morning:-)

Cheers

-- Ray
02-15-2014, 11:53 PM (This post was last modified: 02-16-2014 12:31 AM by Gerson W. Barbosa.)
Post: #24
 Gerson W. Barbosa Senior Member Posts: 1,410 Joined: Dec 2013
RE: FFT Multiplication (HP-48G/GX/G+)
(02-15-2014 11:23 PM)Raymond Del Tondo Wrote:  Hi Gerson,

(02-15-2014 10:19 PM)Gerson W. Barbosa Wrote:  Something wrong?
Thanks for the hint. The binary in the attachment still has some debug info (like the stack view included). I updated the attachment by including a binary without debug info. Seems it was too late in the morning:-)

Cheers

It's ok now, thanks! Now I get 22.25 seconds for two 64-element arrays, close to what you've reported. Just one more question: does your SysRPL do a List->Array conversion at the end? (I don't understand the code, but it appears to do that from the behavior of your previous compiled program. My latest HP-48GX version, as of yesterday, uses lists only once, to do the Hadamard multiplication. This has improved performance significantly. Would it do the same in SysRPL? (Assuming of course your SysRPL version corresponds to the older User-RPL version).

Cheers,

Gerson.

P.S.: Here is the updated table:

Code:
 # elements     HP-71B     hp 49g      HP-48GX     hp 50g   48GX(SysRPL)      16         24.11      10.42         9.96       5.57       5.44      32         53.41      22.58        21.91      11.90       9.89      64        116.74      47.86        47.64      25.16      22.22     128        269.89     110.14       106.45      47.54      49.18     256          --       255.87       254.00     104.42     110.11

The difference is most likely due to the final conversion to array, which wasn't being taken into account in the manual timing.
02-16-2014, 01:51 AM
Post: #25
 Raymond Del Tondo Member Posts: 283 Joined: Dec 2013
RE: FFT Multiplication (HP-48G/GX/G+)
(02-15-2014 11:53 PM)Gerson W. Barbosa Wrote:  It's ok now, thanks! Now I get 22.25 seconds for two 64-element arrays, close to what you've reported. Just one more question: does your SysRPL do a List->Array conversion at the end? (I don't understand the code, but it appears to do that from the behavior of your previous compiled program. My latest HP-48GX version, as of yesterday, uses lists only once, to do the Hadamard multiplication. This has improved performance significantly. Would it do the same in SysRPL? (Assuming of course your SysRPL version corresponds to the older User-RPL version).

The difference is most likely due to the final conversion to vector, which wasn't being taken into account in the manual timing.
The SySRPL version doesn't use lists for the array/vector at all.
Starting after the 2nd fft call the result vector is dissolved during the multiplication loop.
After that loop the input vector for the ifft call is generated via XEQ>VEC.
After the ifft call the result vector is dissolved again during the rounding/cutting loop.

Following that loop, the stack contains a meta object of sizeA+sizeB entries (SysRPL version).

The final loop always works on the first two stack levels (elements n and n+1) of the meta object, and finally rotates out the level 1 ob. Since the meta ob will be worked on from the end to the front, one final rotation is needed after the loop to retain the correct order of elements.

The conversion to a vector at the end of the program takes about 81ms for a 256 elements meta ob, so that's rather negligible.

Cheers

-- Ray
02-16-2014, 02:25 AM
Post: #26
 Gerson W. Barbosa Senior Member Posts: 1,410 Joined: Dec 2013
RE: FFT Multiplication (HP-48G/GX/G+)
(02-16-2014 01:51 AM)Raymond Del Tondo Wrote:
(02-15-2014 11:53 PM)Gerson W. Barbosa Wrote:  It's ok now, thanks! Now I get 22.25 seconds for two 64-element arrays, close to what you've reported. Just one more question: does your SysRPL do a List->Array conversion at the end? (I don't understand the code, but it appears to do that from the behavior of your previous compiled program. My latest HP-48GX version, as of yesterday, uses lists only once, to do the Hadamard multiplication. This has improved performance significantly. Would it do the same in SysRPL? (Assuming of course your SysRPL version corresponds to the older User-RPL version).

The difference is most likely due to the final conversion to vector, which wasn't being taken into account in the manual timing.
The SySRPL version doesn't use lists for the array/vector at all.
Starting after the 2nd fft call the result vector is dissolved during the multiplication loop.
After that loop the input vector for the ifft call is generated via XEQ>VEC.
After the ifft call the result vector is dissolved again during the rounding/cutting loop.

Following that loop, the stack contains a meta object of sizeA+sizeB entries (SysRPL version).

The final loop always works on the first two stack levels (elements n and n+1) of the meta object, and finally rotates out the level 1 ob. Since the meta ob will be worked on from the end to the front, one final rotation is needed after the loop to retain the correct order of elements.

Thanks for the detailed explanations. Although

(02-16-2014 01:51 AM)Raymond Del Tondo Wrote:  The conversion to a vector at the end of the program takes about 81ms for a 256 elements meta ob, so that's rather negligible.

Yes, I remember now the conversion was immediate after pressing ENTER in your debug version. I can't explain the 15 seconds difference compared to my manual timing, though. I'm sure I didn't make any mistake while taking time measurements, also my Casio wristwatch is accurate enough: only 133 seconds in advance since I bought it in Jul/2008 :-) (Time to adjust it for the first time now that our DST has just ended)

Thanks again for your fast SysRPL version (Finally the sixfold increase in speed when compared to the HP-71B !). Only thing left is checking whether 5-digit groups will yield accurate results -- will do that later.

Best regards,

Gerson.
02-16-2014, 05:21 AM (This post was last modified: 02-19-2014 12:39 PM by Gerson W. Barbosa.)
Post: #27
 Gerson W. Barbosa Senior Member Posts: 1,410 Joined: Dec 2013
RE: FFT Multiplication (HP-48G/GX/G+)
Ever been curious about all digits of 253! ? Out of 500, the HP-48GX gives the first 12 only: 517346099264. Just run the program below with single argument 253 and wait for about one hour and 45 minutes. This is by no means an efficient way to do this, but might be a way to check the accuracy of the algorithm after 253 multiplications, even though one on the operands is small. (Better use Emu48 and get the result under 60 seconds or less).

Code:
 %%HP: T(3)A(D)F(,); \<< 1 - [ 1 ] 1 ROT ROT 1 ROT   START OVER 1 + DUP 1 \->ARRY ROT FFTMLTS2 TRIM ROT DROP   NEXT SWAP DROP \>>

where FFTMLTS2 is Raymond del Tondo's SysRPL program above and TRIM is the following program, slightly adapted from TRIM, by Wayne H. Scott, part of his POLY package:
Code:
 %%HP: T(3)A(D)F(,);                 ;   \<< OBJ\-> OBJ\-> DROP \->          ; This trims off the leading zeros of the array. n                                   ; For example,   \<< n                             ;     WHILE ROLL DUP                  ; [ 0 0 0 1234 0 5789 ]  -->  [ 1234 0 5789 ] ABS NOT n 1 - AND                   ;     REPEAT DROP 'n' DECR     END n ROLLD n 1 \->LIST \->ARRY   \>> \>>

-->

[ 5173 4609 9264 78 9218 433 899 7295 2746 9542 3561 2720 6639 9607 4846 3616 3134 3029 313 41 2383 1443 7828 1112 1374 4932 5428 7661 7296 3169 484 977 8527 4435 4743 3640 9654 4589 6311 9980 576 3521 219 7345 934 790 1685 4446 6163 7384 4451 7144 4589 2498 2615 9309 2898 1062 2514 4818 9873 9824 3499 6567 2944 9381 9909 5203 1087 3152 8570 9655 6175 4517 6766 2603 4976 5427 6777 1987 6267 959 7099 9373 2257 7683 9082 7849 7279 3284 6880 6763 5727 3110 3332 7966 9572 6049 2114 9638 6749 6804 5622 1513 5307 5201 4396 1440 1249 2800 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]

As far as I have checked, all digits are correct, considering the hidden leading zeros in some 4-digit elements.

-----

Update: I've run the User-RPL program on the emulator (50g) and formatted the output to integer. The result was exactly the same I got when computing 253! on the hp 50g in exact mode.

Also, an exact match was obtained for 997! (2559 digits). This was computed using arrays with 5-digit elements. Here they are, in case someone wants to check them out :-)

Code:
 40359724461645390234292652765290740211090335246139389143030797373519663190106842​37263758833853587102137006631984661977193944113181265519616824478081989249680516​43330792790545975658652366984953410102994729193397927862707860663312211428139657​33919921028883982924596589308458677218884794980135443761645075224506666559889800​94175577967376951675213432494794136314145342021847264214793926157307811731645263​93982880263279118925406206180689438683308644696334133955184235540077242451165903​81101827719832180031595827989994138156615149091737998105454985248322329275243898​11980802708882543991975745364605704734303715958724031694867571661542949412580453​11241382930836862005052391967478429035362983199050663230586866257612402804942403​44233166394434168335073220412356534986944621623211159899567872446218256850113174​63838577067904001075072667390026316129311241122279096729357421049685332780747960​00335855930432060517447195226436187301231195091058916141500005034486568847599649​00494067769318523221837865944485464570390882493401514455003570460531797737862031​18550953567694888922171302000112504911516415310901200837651592219697553144378802​09281708574493693840125338722070514029362985801732618715060934298236579096167095​85950405331060872571119845720022654435044594115773486340742853243112648568667878​84661486819750191740104532976390040068195207044638407735286912244552652299854897​64356909675383800245969276679872407757924211918488179598530382266647547907226165​47980297654789693965688881325682653906791569527887851625739692098351138902956310​11123253723954647397831433613628798725785501475711681360833913542427351428039887​35616917749898060073075542403509536490539404444972668319521415425667918323473675​96656633239099325959195904942407038086186468220698646372928155733874746654662785​92062875719964916067979790641428194695892008126790265612881240871363598309598670​34513441434850212864818601504529520195828528045600869420646442863720485414968365​31269052383502650854577265971210516113769359526291937135884001947338380202834453​11816794177165630135012424772911390424228141663696011522232935969575275309346520​46662174154235850073391729650007182794396630407081318880947107940245036774649857​42937922077663735689021159654000934909225598804790941759477837570572384191816766​30262770090339396547856717150451221853157302493936160447379021701169807360000000​00000000000000000000000000000000000000000000000000000000000000000000000000000000​00000000000000000000000000000000000000000000000000000000000000000000000000000000​0000000000000000000000000000000000000000000000000000000000000000000000000000000

Update:

It appears 5-digit group arrays yield reliable results up to 128 elements only (1280-digit results, 5*256). 4^2^11 has 1234 digits and is correctly computed using the program below. The next doubling (4^2^12) is not.

Code:
 11  %%HP: T(3)A(D)F(.); \<< [ 4 ] 1 ROT   START DUP FFTMULT TRIM   NEXT A2I \>> EVAL '4^2^11'               EVAL  - -->   0  FFTMULT: %%HP: T(3)A(D)F(.); \<< DUP2 SIZE OBJ\-> DROP SWAP SIZE OBJ\-> DROP DUP2 MAX UNROT + 4. ROLLD LN 2. LN / DUP FP NOT NOT SWAP IP + 2. SWAP ^ DUP + DUP ROT SWAP 1. \->LIST RDM UNROT 1. \->LIST RDM FFT OBJ\-> 1. GET \->LIST SWAP FFT OBJ\-> 1. GET \->LIST   \<< *   \>> DOLIST OBJ\-> 1. \->LIST \->ARRY IFFT 1. ROT SUB RE 0. RND DUP SIZE OBJ\-> DROP 1. - 1.   FOR i i GETI 100000. MOD PICK3 ROT DUP UNROT GET ROT + PUT i GETI 100000. / IP PICK3 ROT GET 100000. / IP + i SWAP PUTI OVER OVER GET 100000. MOD PUT -1.   STEP \>> TRIM: %%HP: T(3)A(D)F(.); \<< OBJ\-> OBJ\-> DROP \-> n   \<< n     WHILE ROLL DUP ABS NOT n 1. - AND     REPEAT DROP 'n' DECR     END n ROLLD n 1. \->LIST \->ARRY   \>> \>> A2I: %%HP: T(3)A(D)F(.); \<< DUP "" UNROT SIZE OBJ\-> SWAP   FOR i DUP i GET R\->I \->STR FMT ROT SWAP + SWAP   NEXT DROP OBJ\-> \>> FMT: %%HP: T(3)A(D)F(.); \<< DUP SIZE 5 - NOT NOT   IF   THEN 5. OVER SIZE - 1. SWAP     START "0" SWAP +     NEXT   END \>>
 « Next Oldest | Next Newest »

User(s) browsing this thread: 1 Guest(s)