The HP 50g program << ! DIVIS SIZE >> returns the number of positive divisors of the input factorial. However, this brute-force program becomes impossibly slow for medium-sized inputs, and runs out of memory for any input that's interestingly large. Just look how this slows down:

<< 12 ! DIVIS SIZE >> returns 792 in 6.1 seconds

<< 13 ! DIVIS SIZE >> returns 1584 in 17.5 seconds

<< 14 ! DIVIS SIZE >> returns 2592 in 39.3 seconds

<< 15 ! DIVIS SIZE >> returns 4032 in 86.2 seconds!

(Unrelated note: Prime's CAS returns 4032 for size(idivis(15!)) in 0.7 seconds)

The winner of this challenge will be the 49G/49g+/50g RPL program that returns the exact number of positive divisors of X! the fastest. Testing will focus on large inputs.

Although half the fun of this challenge will consist in thinking up different ways of doing it (obviously avoiding the factorial and DIVIS functions), some algorithmic hints can be found here if you totally get stuck:

https://oeis.org/A027423
Many thanks to Gerald H for his many OEIS-related postings and programs which inspired this mini-challenge.

I realize that this doesn’t qualify since it’s a 42S program, but I couldn’t resist:

Code:

`00 { 67-Byte Prgm }`

01▸LBL "PF"

02 STO 00

03 1

04 STO 01

05 STO 04

06 2

07 GTO 02

08▸LBL 00

09 2

10 STO+ 01

11 1

12 STO 02

13▸LBL 01

14 2

15 STO+ 02

16 RCL 02

17 X↑2

18 RCL 01

19 X<Y?

20 GTO 02

21 LASTX

22 MOD

23 X=0?

24 GTO 00

25 GTO 01

26▸LBL 02

27 STO 03

28 RCL 00

29 X<Y?

30 GTO 04

31 1

32 X<>Y

33▸LBL 03

34 RCL÷ 03

35 IP

36 STO+ ST Y

37 X≠0?

38 GTO 03

39 R↓

40 STO× 04

41 GTO 00

42▸LBL 04

43 RCL 04

44 END

Constant memory use; run time something like O(n*sqrt(n)), I think, but that analysis is harder than writing the program. :-)

(09-30-2017 01:48 AM)Thomas Okken Wrote: [ -> ]I realize that this doesn’t qualify since it’s a 42S program, but I couldn’t resist...

Wow, that's not only very fast, but Free42 (decimal version) even gets the full correct answers up to an input of 289 (of course, the SHOW key must be pressed to see the full answer). Good job!

Very nice Thomas.

On real 33s input 66 correctly processed in < 4s.

HP-67: 52 → 8087040000 in 69 s.

(I really should get that card reader restored. It’s been 40 years since the last time I used a calculator that would forget everything when you turned it off!)

UPDATE:

HP-25: 52 → 8087040000 in 54 s. :-)

Code:

`01 23 00 STO 0`

02 01 1

03 23 01 STO 1

04 23 04 STO 4

05 02 2

06 13 24 GTO 24

07 02 2

08 23 51 01 STO + 1

09 01 1

10 23 02 STO 2

11 02 2

12 23 51 02 STO + 2

13 24 02 RCL 2

14 15 02 x^2

15 24 01 RCL 1

16 14 41 x<y

17 13 24 GTO 24

18 14 73 LASTx

19 71 ÷

20 15 01 FRAC

21 15 71 x=0

22 13 07 GTO 07

23 13 11 GTO 11

24 23 03 STO 3

25 24 00 RCL 0

26 14 41 x<y

27 13 40 GTO 40

28 01 1

29 23 05 STO 5

30 22 R↓

31 24 03 RCL 3

32 71 ÷

33 14 01 INT

34 23 51 05 STO + 5

35 15 61 x≠0

36 13 31 GTO 31

37 24 05 RCL 5

38 23 61 04 STO × 4

39 13 07 GTO 07

40 24 04 RCL 4

41 13 00 GTO 00

I have a 50g SysRPL programme that does 100 in 1.4s & bestfit gives the input time relationship as

t=0.0020*N^1.5191

I would love to see a 50g version of Thomas' programme but I can't get my head around the 42S version.

Thomas, could you please explain your algorithm?

Code:

`::`

CK1&Dispatch

# FF

::

PTR 2EF44

{

ROTDROPSWAP

%1+

FPTR2 ^R>Z

FPTR2 ^QMul

}

1LAMBIND

::

FPTR2 ^ZAbs

FPTR2 ^DupQIsZero?

caseSIZEERR

FPTR2 ^DupZIsOne?

?SEMI

FPTR2 ^MZSQFF

#2/

ZINT 1

SWAP

ZERO_DO

1GETLAM

COMPEVAL

LOOP

;

ABND

;

;

My algorithm is based on the prime factorization of n!. When a number has prime factorization Π(i=1,j,p[i]^k[i]), i.e. it is the product of j distinct primes and each prime p[i] occurs k[i] times in the factorization, then the number of divisors is Π(i=1,j,k[i]+1).

To find this prime factorization of n!, realize that it can only contain primes that are less than or equal to n, so the search space is pleasantly small. And how often does a prime p occur in n!? Answer: floor(n/p) of the factors 1, 2, 3, ... , n are divisible by p; floor(n/p^2) are divisible by p^2, etc. Keep dividing n by p and rounding toward zero, until you reach zero; the sum of all the intermediate results equals the exponent k[i] of the prime p[i], and the number of possible contributions of p[i] to divisors of n! is thus k[i]+1. Multiply all those k[i]+1 factors together, and you end up with the total number of divisors.

The first part of the program finds primes using a simple but reasonably efficient algorithm, only checking divisors <= sqrt(p). Hence the O(n*sqrt(n)) part of my time estimate. Once a prime has been found, the repeated divisions take O(log(n)). Working this out more accurately would require dusting off my rusty calculus skills... Maybe later. :-)

Thank you, Thomas.

Here a SysRPL programme following Thomas' suggestion which correctly processes 100 in

0.5604s

a definite improvement on my first attempt.

Size: 147.0000

CkSum: # FDBAh

Code:

`::`

CK1&Dispatch

# FF

::

ZINT 1

SWAP

ZINT 0

BEGIN

FPTR2 ^Prime+

2DUP

Z>=

WHILE

::

2DUP

ZINT 0

OVER

2SWAP

BEGIN

2DUP

FPTR2 ^ZQUOText

DUP

ZINT 0

Z>

WHILE

::

5ROLL

FPTR2 ^RADDext

4UNROLL

3PICK

FPTR2 ^RMULText

;

REPEAT

4DROP

ZINT 1

FPTR2 ^RADDext

4ROLL

FPTR2 ^RMULText

3UNROLL

;

REPEAT

2DROP

;

;

Also the time characteristic is better, bestfit claims a linear relationship on the input.

(09-30-2017 04:29 PM)Gerald H Wrote: [ -> ]Here a SysRPL programme following Thomas' suggestion which correctly processes 100 in 0.5604s, a definite improvement on my first attempt.

Size: 147.0000

CkSum: # FDBAh

Wow, very impressive! Two quick questions:

(1) FPTR 6 119 (^RADDext) and FPTR 6 118 (^QAdd) both resolve to address 6:5253B, so they are really identical functions. Is there a reason that you prefer using ^RADDext instead of ^QAdd in this program? Same question for the identical functions ^RMULText and ^QMul. (It might help if I knew what the "Q" and the "R" stand for in those function names.)

(2) Your second System RPL program evaluates an input of 100 in half a second, so when I keyed up the same program in User RPL (exactly the same, step-by-step, just in URPL), I expected it to run slower, but it ran MUCH slower than I anticipated. URPL is always slower than SysRPL, but this difference seems crazy:

320 SysRPL --> 1.5 seconds

320 URPL --> 18.6 seconds!

Why is this SysRPL program so much faster? Is it simply because SysRPL is always faster, especially when looping is involved?

Always eager to learn! Thanks in advance for your insights!

The User programme below took

14.9141s

to process 320.

To question 1, ^RMULText etc is just habit, to q 2 I have surely less idea than you.

How great a speed improvement occurs is variable, loops do seem to go much faster in Sys.

Code:

`« 1 SWAP 0`

WHILE NEXTPRIME

DUP2 ≥

REPEAT DUP2 0

OVER 4 ROLL 4 ROLL

WHILE DUP2

IQUOT DUP 0 >

REPEAT 5 ROLL +

4 ROLLD PICK3 *

END 4 DROPN 1 +

4 ROLL * UNROT

END DROP2

»

So for now I wrote my own in URPL, 15 takes 1.7614 seconds, 100 needs 15,2s and 320 nearly 75 seconds. seems I need to improve my code a bit

Code:

` %%HP: T(3)A(R)F(.);`

\<< 1 OVER NDUPN \->LIST \-> L

\<< 2

FOR K K FACTORS REVLIST OBJ\-> 2 / 1 SWAP

START L OVER GET ROT + L UNROT PUT 'L' STO

NEXT -1

STEP L DUP \PILIST

\>>

\>>

moving the list to memory and back is the main slowdown I think.

Arno

(09-30-2017 08:17 PM)Joe Horn Wrote: [ -> ](2) Your second System RPL program evaluates an input of 100 in half a second, so when I keyed up the same program in User RPL (exactly the same, step-by-step, just in URPL), I expected it to run slower, but it ran MUCH slower than I anticipated. URPL is always slower than SysRPL, but this difference seems crazy:

320 SysRPL --> 1.5 seconds

320 URPL --> 18.6 seconds!

Why is this SysRPL program so much faster? Is it simply because SysRPL is always faster, especially when looping is involved?

In addition to the inherent loop structure speedup that SysRPL gives, I think if you look at what's inside those loops you'll see another reason for the speed advantage that Gerald's SysRPL version has: lots of "combo" commands that directly manipulate the stack without having to validate the arguments given. 5ROLL, 4UNROLL, 2DROP, etc. go about their business without bothering to check that the stack can accommodate their action or that the numeric constant applies. So in many of the "equivalent" UserRPL steps, you've not only got the usual overhead for argument count/type checking (and LASTARG/cached stack saving), you've also got additional RPL stream maintenance and stack validation occurring that aren't present when executing the SysRPL code.

Those combo commands for manipulating the stack can be a pain to remember in SysRPL but they really pay off sometimes in inner loops.

This one is cheating but it is very very fast:

<< [ 1 2 4 8 16 30 60 96 160 270 540 792 1584 2592 4032 5376 10752 14688 29376 41040 60800 96000 192000 242880 340032 532224 677376 917280 1834560 2332800 4665600 5529600 7864320 12165120 16422912 19595520 39191040 60466176 85100544 102435840 204871680 258048000 516096000 677376000 819624960 1258709760 2517419520 2876670720 3698576640 4464046080 6210846720 8087040000 16174080000 18559756800 23984916480 28217548800 39016857600 59609088000 119218176000 137106432000 274212864000 418535424000 492139929600 543050956800 695105224704 850195906560 1700391813120 2190889451520 3012472995840 3543845437440 7087690874880 7848891187200 15697782374400 23878316851200 ] SWAP GET >>

And it works on a 28S. The next value in the sequence cannot be represented exactly, so I stopped.

Pauli

(09-30-2017 09:31 PM)Gerald H Wrote: [ -> ]The User programme below took 14.9141s to process 320.

Code:

`« 1 SWAP 0`

WHILE NEXTPRIME

DUP2 ≥

REPEAT DUP2 0

OVER 4 ROLL 4 ROLL

WHILE DUP2

IQUOT DUP 0 >

REPEAT 5 ROLL +

4 ROLLD PICK3 *

END 4 DROPN 1 +

4 ROLL * UNROT

END DROP2

»

Exactly the same program* on my 50g takes 18.6 seconds for an input of 320, so your 50g must be faster than mine. *"Exactly the same" ... well almost; I always use reals for the arguments to stack commands (e.g. 4. ROLL instead of 4 ROLL) because it's a tad faster that way. With integers in their place, it takes 0.3 seconds longer for an input of 320.

(09-30-2017 10:52 PM)DavidM Wrote: [ -> ]In addition to the inherent loop structure speedup that SysRPL gives, I think if you look at what's inside those loops you'll see another reason for the speed advantage that Gerald's SysRPL version has: lots of "combo" commands that directly manipulate the stack without having to validate the arguments given. 5ROLL, 4UNROLL, 2DROP, etc. go about their business without bothering to check that the stack can accommodate their action or that the numeric constant applies. So in many of the "equivalent" UserRPL steps, you've not only got the usual overhead for argument count/type checking (and LASTARG/cached stack saving), you've also got additional RPL stream maintenance and stack validation occurring that aren't present when executing the SysRPL code.

Those combo commands for manipulating the stack can be a pain to remember in SysRPL but they really pay off sometimes in inner loops.

Good point. In

his System RPL book, Jim Donnelly wrote, "The System-RPL example will run faster than the User-RPL program, because all the argument checking code has been bypassed." Nice to know that the speed increase can be this dramatic!

(10-01-2017 02:48 AM)Joe Horn Wrote: [ -> ]Exactly the same program* on my 50g takes 18.6 seconds for an input of 320, so your 50g must be faster than mine. *"Exactly the same" ... well almost; I always use reals for the arguments to stack commands (e.g. 4. ROLL instead of 4 ROLL) because it's a tad faster that way. With integers in their place, it takes 0.3 seconds longer for an input of 320.

Yes, down from 14.9s to 14.6s with real arguments for the stack commands

(09-30-2017 10:52 PM)DavidM Wrote: [ -> ]In addition to the inherent loop structure speedup that SysRPL gives, I think if you look at what's inside those loops you'll see another reason for the speed advantage that Gerald's SysRPL version has: lots of "combo" commands that directly manipulate the stack without having to validate the arguments given. 5ROLL, 4UNROLL, 2DROP, etc. go about their business without bothering to check that the stack can accommodate their action or that the numeric constant applies. So in many of the "equivalent" UserRPL steps, you've not only got the usual overhead for argument count/type checking (and LASTARG/cached stack saving), you've also got additional RPL stream maintenance and stack validation occurring that aren't present when executing the SysRPL code.

Those combo commands for manipulating the stack can be a pain to remember in SysRPL but they really pay off sometimes in inner loops.

A related issue is that User RPL stack commands that take a numeric argument (PICK, ROLL, etc) take about 7 times as long as simple stack commands such as SWAP or ROT. I go to great lengths to avoid using those slow commands inside loops.

Also, WHILE and UNTIL loops are slower than START loops in User RPL whereas WHILE is very fast in SysRPL.

John

(10-01-2017 02:09 PM)John Keith Wrote: [ -> ]A related issue is that User RPL stack commands that take a numeric argument (PICK, ROLL, etc) take about 7 times as long as simple stack commands such as SWAP or ROT. I go to great lengths to avoid using those slow commands inside loops.

Also, WHILE and UNTIL loops are slower than START loops in User RPL whereas WHILE is very fast in SysRPL.

John

John it seems that you have some knowledge of speed of some commands (considering also the other thread where you today wrote that CEIL is faster than 1 + IP ). By chance do you have some measurements in time for each command tested by you?

(10-01-2017 03:47 PM)pier4r Wrote: [ -> ]John it seems that you have some knowledge of speed of some commands (considering also the other thread where you today wrote that CEIL is faster than 1 + IP ). By chance do you have some measurements in time for each command tested by you?

Note that you can determine such timings yourself using the 50g's TEVAL command. For example, put << 1. 1000. START 1.5 DROP NEXT >> on the stack and execute TEVAL to time it. Now insert either CEIL or 1. + IP between the 1.5 and the DROP, and use TEVAL to time each. Divide the times by 1000 to see the difference. I keep << -40 CF MEM DROP 0.5 WAIT TEVAL -40 SF >> assigned to a key for accurate timings.

(10-02-2017 05:30 AM)Joe Horn Wrote: [ -> ] (10-01-2017 03:47 PM)pier4r Wrote: [ -> ]John it seems that you have some knowledge of speed of some commands (considering also the other thread where you today wrote that CEIL is faster than 1 + IP ). By chance do you have some measurements in time for each command tested by you?

Note that you can determine such timings yourself using the 50g's TEVAL command. For example, put << 1. 1000. START 1.5 DROP NEXT >> on the stack and execute TEVAL to time it. Now insert either CEIL or 1. + IP between the 1.5 and the DROP, and use TEVAL to time each. Divide the times by 1000 to see the difference. I keep << -40 CF MEM DROP 0.5 WAIT TEVAL -40 SF >> assigned to a key for accurate timings.

Why do you have MEM in the programme? TEVAL does garbage collection.

What's the point of 0.5 WAIT?