RE: (WP-34S) Prime Factors - performance challenge
With the subroutine-driven MOD30 loop shown here I felt challenged to find some optimal between speed, resources used and rubbish left on the SBR-stack, but it's quite hard to decide...
From looking through the 'PF' in the library I decided to remain with the RPN stack registers. Since the WP 34S offers so much enhanced TEST commands, you just have to remember, when the stack lifts or drops and when not. It reminded me a little an ancient predecessor of the Rubik's cube called "Schiebefax" in german - 4x4 squares less one in a square frame which were to be put into a certain order.
Here's the very shortest (but also slowest) I found:
Code:
WP 34S (3.3 3678) 13.Feb.2015
Example: 7 x 38447 x 62119 - 1.-2.Factor 1'35.3"
Find prime factors for (unknown)
- Version using stack (X, Y, L) only
- the shortest but slowest - trying every number from 2
usage: (unknown) XEQ 'Pr' - R/S [- R/S] ... '1'
- Registers:
RPN-stack
--------------------------------------------------------------------------------
001 LBL 'Pr' ; ( L ); X , Y
002 2 ;for even test ; ( ); 2 , 'X'
003 x><y ; ( ); 'X', 'T'
004 SKIP 002
;main loop: 6(7) lines per 1(1) candidates
;i.e. 180(210) per 30(30)
005 INC Y ;next candidate ; ('X'); 'X/T0', 'T1'
006 x>< L ; ( ); 'X', 'T'
007 RCL/ Y ; ('X'); 'X/T', 'T'
008 x<y? ;'X/T'<'T' i.e. 'T'^2>'X'
009 SKIP 005 ; -> end
010 FP? ;remainder != 0 ?
011 BACK 006 ; -> next test factor
012 VIEW Y
013 STOP ;show factor
014 BACK 007
015 x>< L ; ( ); 'X', 'T'
016 STOP
017 RCL/ X ; ('X'); 1 , 'T'
018 END
Adding one line to the loop and a few for filtering out the even the required time lowers by a third:
Code:
WP 34S (3.3 3678) 13.Feb.2015
Example: 7 x 38447 x 62119 - 1.-2.Factor 53.2"
Find prime factors for (unknown)
- Version using stack (X, Y, L) only, testing 2 and every odd number
usage: (unknown) XEQ 'Pr' - R/S [- R/S] ... '1'
- Registers:
RPN-stack
--------------------------------------------------------------------------------
001 LBL 'Pr' ; ( L ); X , Y
002 2 ;for even test ; ( ); 2 , 'X'
003 x><y ; ( ); 'X', 'T'
004 RCL/ Y ; ('X'); 'X/T', 'T'
005 FP?
006 SKIP 004 ;remainder != 0 -> try '3'
007 VIEW Y
008 STOP ;show factor 2
009 BACK 005 ;try factor again
;main loop: 7(8) lines per 1(2) candidates
;i.e. 105(120) per 15(30)
010 INC Y ; ('X'); 'X/T0', 'Tx'
011 INC Y ;next odd candidate ; ('X'); 'X/T0', 'T1'
012 x>< L ; ( ); 'X', 'T'
013 RCL/ Y ; ('X'); 'X/T', 'T'
014 x<y? ;'X/T'<'T' i.e. 'T'^2>'X'
015 SKIP 005 ; -> end
016 FP? ;remainder != 0 ?
017 BACK 007 ; -> next test factor
018 VIEW Y
019 STOP ;show factor 2n+1
020 BACK 007
021 x>< L ; ( ); 'X', 'T'
022 STOP
023 RCL/ X ; ('X'); 1 , 'T'
024 END
Still using only x, y and L, but leaving out divisions by multiples of 3, even more speed is gained, but on the cost of almost doubled line count:
Code:
WP 34S (3.3 3678) 13.Feb.2015
Example: 7 x 38447 x 62119 - 1.-2.Factor 34"
Find prime factors for (unknown)
- Version using stack (X, Y, L) only, trying 2, 3, excluding their multiples
usage: (unknown) XEQ 'Prm' - R/S [- R/S] ... '1'
- Registers:
RPN-stack
--------------------------------------------------------------------------------
001 LBL 'Prm' ; ( L ); X , Y
002 2 ;for even test ; ( ); 2 , 'X'
003 x><y ; ( ); 'X', 'T'
004 RCL/ Y ; ('X'); 'X/T', 'T'
005 FP?
006 SKIP 003 ;remainder != 0
007 VIEW Y
008 STOP ;show factor
009 BACK 005 ;try factor again
010 CLx ; ('X'); 0 , 'T'
011 3 ; ('X'); 3 , 'T'
012 x=? Y
013 SKIP 003
014 x><y ; ('X'); 'T', 3
015 x>< L ; ('T'); 'X', 3
016 BACK 012
017 STO/ Y ; ('X'); 3 , 1
;main Loop: 13(15) lines for 2(6) candidates
;i.e. 65(75) for 10(30)
018 CLx ; ('X'); 0 , 'T'
019 4 ; ('X'); 4 , 'T0'
020 STO+ Y ; ('X'); 4 , 'T1'
021 x>< L ; ( 4 ); 'X', 'T'
022 RCL/ Y ; ('X'); 'X/T', 'T'
023 INT?
024 SKIP 011
025 INC Y ; ('X'); , 'Tx'
026 INC Y ; ('X'); , 'T1'
027 x>< L ; ( ); 'X', 'T'
028 RCL/ Y ; ('X'); 'X/T', 'T'
029 x<? Y ; 'X/T'<'T' i.e. 'T'^2>'X'
030 SKIP 008 ; goto End
031 FP?
032 BACK 014
033 VIEW Y
034 STOP ;show factor
035 BACK 007
036 VIEW Y
037 STOP ;show factor
038 BACK 016
039 x>< L ; ('X/T'); 'X', 'T'
040 STOP
041 RCL/ X ; ('X'); 1 , 'T'
042 END
Now I became curious whether the MOD 30 algorithm, i.e. leaving out the multiples of 5, too, would be worth the effort. Formerly I was always convinced, the overhead to organize the loop would eat up the speed. Now I tried - and found the way that Dave presented here is definitely the fastest.
Here it is, adapted to use the RPN stack x, y, z, t and L only. Since for a faster end condition test the root of the 'unknown' is also maintained and integer division employed, it uses a few lines and registers more:
Code:
WP 34S (3.3 3678) 17.Feb.2015
Example: 7 x 38447 x 62119 - 1.-2.Factor 28.2"
Find prime factors for (unknown)
- Version using stack (X, Y, Z, T, L) only,
trying mod 30 chain via a subroutine, holding sqrt(x) for end condition
usage: (unknown) XEQ 'Pr' - R/S [- R/S] ... '1'
- Registers: RPN-stack (ssize4 dbloff recommended)
--------------------------------------------------------------------------------
001 LBL 'Pr' ; ( L ); X , Y , Z , T
002 SKIP 021
003 LBL 10 ; ('T0'); 'S', 'X', 'X', [sqrt('X')]
004 RCL+ L ; ('S'); 'T1', 'X', 'X', [sqrt('X')]
005 RMDR ; ('T'); ? , 'X', [sqrt('X')]
006 x=0? ;factor found?
007 SKIP 003
008 CLx ; ('T'); 0 , 'X', [sqrt('X')]
009 RCL Y ;duplicate 'X' ; ('T'); 'X', 'X', [sqrt('X')]
010 RTN ;-> try next
011 x>< L ; ( 0 ); 'T', 'X0', [sqrt('X')]
012 STO/ Y ; ( ); 'T', 'X1', [sqrt('X0')]
013 STOP ;show factor
014 XEQ 11 ; ( ); 'T', 'X', 'X', [sqrt('X')]
015 BACK 010 ;try factor again
016 LBL 11
017 x>< Z ; ( ); [sqrt('X0')], 'X1', 'T'
018 RCL Y ; ( ); 'X1', [sqrt('X0')], 'X1', 'T'
019 STO Y ;duplicate 'X' ; ( ); 'X1', 'X1, 'X1', 'T'
020 SQRT() ; ( ); sqrt('X'), 'X', 'X', 'T'
021 IP ;end condition value ;(sqrt('X')); [sqrt('X')], 'X', 'X', 'T'
022 x>< T ; ( );, 'T', 'X', 'X', [sqrt('X')]
023 RTN
024 RCL X ; ( ); 'X', 'X'
025 1 ; ( ); 1 , 'X', 'X'
026 XEQ 11 ; ( ); 1 , 'X', 'X', [sqrt('X')]
027 STO L ; ( 1 ); 1 , 'X', 'X', [sqrt('X')]
028 XEQ 10
029 1 ; ('T'); 1 , 'X', 'X', [sqrt('X')]
030 XEQ 10
031 2 ; ('T'); 2 , 'X', 'X', [sqrt('X')]
032 XEQ 10
033 2 ; ('T'); 2 , 'X', 'X', [sqrt('X')]
034 XEQ 10 ; ('T'); 'X', 'X', [sqrt('X')]
035 ENTER^ ; ('T'); 'X', 'X', 'X', [sqrt('X')]
;main loop 8x 8(10) + 4 lines
;i.e. 68 (84 w.'LBL'+skipped) for 8 ("10"/30) tests
036 CLx ; ('T'); 0 , 'X', 'X', [sqrt('X')]
037 4 ; ('T'); 4 , 'X', 'X', [sqrt('X')]
038 XEQ 10
039 2 ; ('T'); 2 , 'X', 'X', [sqrt('X')]
040 XEQ 10
041 4 ; ('T'); 4 , 'X', 'X', [sqrt('X')]
042 XEQ 10
043 2 ; ('T'); 2 , 'X', 'X', [sqrt('X')]
044 XEQ 10
045 4 ; ('T'); 4 , 'X', 'X', [sqrt('X')]
046 XEQ 10
047 6 ; ('T'); 6 , 'X', 'X', [sqrt('X')]
048 XEQ 10
049 2 ; ('T'); 2 , 'X', 'X', [sqrt('X')]
050 XEQ 10
051 6 ; ('T'); 6 , 'X', 'X', [sqrt('X')]
052 XEQ 10 ; ('T'); 'X', 'X', [sqrt('X')]
053 RCL L ; ('T'); 'T', 'X', 'X', [sqrt('X')]
054 x<? T ;'X'>'T'^2 -> continue
055 BACK 019
056 x>< Y ; ('T'); 'X', 'T', 'X', [sqrt('X')]
057 STOP
058 RCL/ X ; ('X'); 1 , 'T', 'X', [sqrt('X')]
059 END
The time used by the equivalent routine, using global registers instead, is the same.
While testing the speed I usually terminated after the second factor and started over from the beginning - soon I found out that the WP 34S keeps it's subroutine stack, differently from HP 35s and HP 42s that purge it upon a manual XEQ, GTO or OFF.
Thus I thought it would be preferrable to avoid subroutines during which the program could be interrupted and terminated - I tried to use indirect addressing for a very short subroutine returning just the step between the division 'candidates' 1 - 2 - 2 - 4 - 2 - 4 - 2 - 4 - 6 - 2 - 6 (then continuing repeating the 8 elements starting from the first '4'). And I tried another way doing just an indirect branch, since the step value would be needed always at the same position in the main loop. Both were much slower than the following MOD 6 approach, that saves some time by employing integer arithmetics and keeping the root for the end test:
Code:
WP 34S (3.3 3678) 17.Feb.2015
Examples: 7 x 38447 x 62119 - 1.-2.Factor 29.5"
or 33013x33013 1.Factor 25.7 s
Find prime factors for (unknown)
- Version holding root(x) for end condition
using stack (X, Y, Z, T, L) only
usage: (unknown) XEQ 'Prm' - R/S [- R/S] ... '1'
- Registers: RPN-stack (ssize4 dbloff recommended)
--------------------------------------------------------------------------------
001 LBL 'Prm'
002 RCL X ;push stack up ; ( ), 'X', 'X'
003 2 ;test-factor 'T' ; ( ), 'T', 'X', 'X'
004 XEQ 01 ; ( ), 'T', 'x', 'x', sqrt('x')
005 RMDR ; ('T'), ? , 'X', sqrt('X')
006 x!=0?
007 SKIP 012
008 x>< L ; ( 0 ), 'T', 'X0', sqrt('x0")
009 STO/ Y ; ( ), 'T', 'x1', sqrt('x1')
010 STOP ;show factor 2 or 3
011 BACK 007
012 LBL 01 ; ( ), 'T', 'X1', sqrt('X0')
013 x>< Z ; ( ), sqrt('X0'), 'X1', 'T'
014 RCL Y ; ( ), 'X1', sqrt('X0'), 'X1', 'T'
015 STO Y ; ( ), 'X1', 'X1', 'X1', 'T'
016 SQRT() ; ('x1'), sqrt('x1'), 'x1', 'x1', 'T'
017 IP ; (sqrt'x1'), [sqrt('x1')], 'x1', 'x1', 'T'
018 x>< T ; ( ), 'T', 'x1', 'x1', [sqrt('x1')]
019 RTN
020 CLx ; ('T'), 0 , 'X', sqrt('X')
021 3 ; ('T0'), 3 , 'X', sqrt('X')
022 x!=? L
023 BACK 019
024 STO/ L ; ( 1 ), 3 , 'X', sqrt('X')
;main Loop: 14(16) lines for 2(6) candidates
;i.e. 70(80) for 10(30)
025 CLx ; ('T'), 0 , 'X', sqrt('X')
026 RCL Y ;duplicate ; ('T'), 'X', 'X', sqrt('X')
027 4 ; ('T0'), 4 , 'X', 'X', sqrt('X')
028 RCL+ L ;test-factor ; ( 4 ), 'T1', 'X', 'X', sqrt('X')
029 RMDR ; ('T'), ? , 'x', sqrt('x')
030 x=0?
031 SKIP 014
032 CLx ; ('T'), 0 , 'x', sqrt('x')
033 RCL Y ;duplicate ; ('T'), 'x', 'x', sqrt('x')
034 2 ; ('T0'), 2 , 'x', 'x', sqrt('x')
035 RCL+ L ;test-factor ; ( 2 ), 'T1', 'X', 'X', sqrt('X')
036 x>? T ;'T'>sqrt('X')?
037 SKIP 013 ;-> End
038 RMDR ; ('T'), ? , 'x', sqrt('x'), sqrt('x')
039 x!=0?
040 BACK 015
041 x>< L ; ( 0 ), 'T', 'X0', sqrt('X0')
042 STO/ Y ; ( 0 ), 'T', 'X1', sqrt('X0')
043 STOP ;factor 6n-1
044 XEC 01 ; ( ), 'T', 'x1', 'x1', sqrt('x1')
045 BACK 009
046 x>< L ; ( 0 ), 'T', 'X0', sqrt('X0')
047 STO/ Y ; ( 0 ), 'T', 'X1', sqrt('X0')
048 STOP ;factor 6n+1
049 XEQ 01 ; ( ), 'T', 'x1', 'x1', sqrt('x1')
050 BACK 021
051 x><y ;END ; ( ), 'X', 'T', 'X', sqrt('X')
052 STOP
053 RCL/ X ; ('X'), 1 , 'T', 'X', sqrt('X')
054 END
This is almost as fast as the MOD 30 above and even 5 lines shorter. It also demonstrates that despite the main loop running one line more than the MOD 6 program with RPN stack x, y, L only cited above, the integer arithmetics RMDR and comparison with an integer gain a measurable amount of time.
For the use of indirectly addressed local registers can be reviewed here, I'd like to show here my experiment using another indirect addressing for the factor chain step.
I found it not too clear in the manual, but I understand that the global registers are indirect 00-99, the 12 stack and helper registers indirect 100-111, and local registers starting .00 become indirect 112...
Here it is, the local registers are released before showing a factor that was found, then re-allocated; anyway, if the program is interrupted, they'd need to be freed manually:
Code:
WP 34S (3.3 3678) 8.Feb.2015
Example: 7 x 38447 x 62119 - 1.-2.Factor 32"
Find prime factors for (unknown)
- Version using indirect local registers for MOD 30 chain
and holding root(x) for end condition
usage: (unknown) XEQ 'PR' - R/S [- R/S] ... '1'
- Registers:
00 - unknown 'X', 01 - tester 'T', 02 - sqrt('X'), 03 - pointer
RPN-stack, Local registers .00-.10 - factor sequence steps
--------------------------------------------------------------------------------
001 LBL 'PR'
002 STO 00
003 # 111 ;loop end value
004 STO 03 ;counter/pointer
005 3
006 10^x
007 STO/ 03
008 # 122 ;loop start value
009 STO+ 03
010 2
011 STO 01 ;first test candidate
012 SKIP 004
013 RCL 01
014 PopLR ;free local registers
015 STO/ 00 ;leave remaining nnn
016 STOP ;show found factor
017 RCL 00
018 SQRT()
019 IP
020 STO 02 ;root(nnn) for end check
021 LocR 011 ;build factor sequence steps
022 1
023 STO .10
024 4
025 STO .07
026 STO .05
027 STO .03
028 2
029 STO .09
030 STO .08
031 STO .06
032 STO .04
033 STO .01
034 6
035 STO .02
036 STO .00
037 RCL 00 ;main loop 7x8(9)+12(14) lines
038 RCL 01 ;i.e. 68 for 8(10/30) candidates
039 RMDR
040 x=0?
041 BACK 028 ;factor found
042 RCL ->03
043 STO+ 01 ;next candidate
044 DSE 03
045 BACK 008
046 8
047 STO+ 03
048 RCL 01
049 x<=? 02 ;factor<=squrt(nnn)?
050 BACK 013 ;-> continue
051 RCL 00 ;last Factor 'X'
052 BACK 038
053 END
Maybe, this reply is a bit late, but I received my WP 34S only these weeks, and it was also my first program to do my own prime factorization - thank you for the interesting challenge with the 'PF' and the MOD 30 extention!
Hopefully someone finds something interesting in it, too!
On my somewhat slower android mobile on free42 I started exploring the WP34S's integer space and factorized 2^50+3 = 7 x 3 435 997 x 46 811 113.
This ran about 3 minutes(!) till the second factor on the mobile, while the test 7 x 38447 x 62119 ran rather 3 seconds on it. Thus I'd be a little afraid running out of batteries or patience factorizing any double precision integer on the WP 34S.
The 'PF' from the library ran more than 5 minutes factorizing 33013x33013 when I lost patience and stopped it before it found the first factor...
Cheers!
Dirk
|