10-31-2014, 11:21 PM

Inspired by Patrice's relentless search for speed, I created the following versions of the N-Queens problem, solved initially for the Calculator's Benchmark

1) The original unmodified code was tested as per the article.

2) The same algorithm, but using PICK/UNPICK on the stack instead of lists:

3) The original algorithm, but removing the list access from the inner loop:

4) The algorithm in 2) removing the indirection from the inner loop:

5) A new recursive algorithm:

I previously posted the benchmark results, but I will repost here for completeness.

Benchmark results:

These results are at the normal speed for both environments (userRPL=75 MHz, newRPL=192 MHz), both running on different 50g machines.

The exact same code was executed on both machines. The only difference was that the trailing dots were removed from the numbers on newRPL. The change is justified since they are a quirk of the 50g, to force the use of faster real numbers instead of variable precision integers. newRPL has automatic integer to real promotion and variable precision reals, so there is no need for the explicit use of reals. Leaving the dots would actually cause a slow-down on newRPL due to the forced use of reals.

newRPL produced the wrong result when running the recursive method, which I will investigate, fix it and report back.

UPDATE: after fixing a bug in the compiler, newRPL runs the recursive version properly. The result was added to the list, as well as Werner's algorithm running unmodified on newRPL.

Claudio

1) The original unmodified code was tested as per the article.

2) The same algorithm, but using PICK/UNPICK on the stack instead of lists:

Code:

« 8. 0. 0. 0. \-> R S X Y

« CLEAR TICKS 1. R

START 0.

NEXT

DO R 'X' INCR UNPICK

DO 'S' INCR DROP X 'Y' STO

WHILE Y 1. >

REPEAT X PICK 'Y' DECR 1. + PICK -

IF DUP 0. == SWAP ABS X Y - == OR

THEN 0. 'Y' STO X PICK 1. - X UNPICK

WHILE X PICK 0. ==

REPEAT 'X' DECR PICK 1. - X UNPICK

END

END

END

UNTIL Y 1. ==

END

UNTIL X R ==

END 8. DROPN TICKS SWAP - B\->R 8192. / S

»

»

3) The original algorithm, but removing the list access from the inner loop:

Code:

« 8. 0. 0. 0. \-> R S X Y

« CLEAR TICKS 1. R

START 0.

NEXT

DO R 'X' INCR UNPICK

DO 'S' INCR DROP X 'Y' STO

WHILE Y 1. >

REPEAT X PICK 'Y' DECR 1. + PICK -

IF DUP 0. == SWAP ABS X Y - == OR

THEN 0. 'Y' STO X PICK 1. - X UNPICK

WHILE X PICK 0. ==

REPEAT 'X' DECR PICK 1. - X UNPICK

END

END

END

UNTIL Y 1. ==

END

UNTIL X R ==

END 8. DROPN TICKS SWAP - B\->R 8192. / S

»

»

4) The algorithm in 2) removing the indirection from the inner loop:

Code:

« 8. 0. 0. 0. 0. \-> R S X Y Ax

« CLEAR TICKS 1. R

START 0.

NEXT

DO 'X' INCR DROP R 'Ax' STO

DO 'S' INCR DROP X 'Y' STO

WHILE Y 1. >

REPEAT Ax Y PICK - 'Y' DECR DROP

IF DUP 0. == SWAP ABS X Y - == OR

THEN 0. 'Y' STO 'Ax' DECR DROP

WHILE Ax 0. ==

REPEAT 'X' DECR PICK 1. - 'Ax' STO

END

END

END

UNTIL Y 1. ==

END Ax X UNPICK

UNTIL X R ==

END TICKS 10. ROLL - B\->R 8192. / S

»

»

5) A new recursive algorithm:

Code:

ROUTINE CHECKQUEEN:

--------------------------

« 1. \-> X RES «

IF X 1 > THEN

X PICK

1 X 1 - FOR I

DUP I 2 + PICK - ABS X I - ABS

IF == THEN

0. 'RES' STO X 'I' STO

END

NEXT

DROP RES

ELSE

1.

END

»

»

ROUTINE DOLEVEL:

---------------------

« 9 OVER - \-> X LIMIT

«

1 8 START LIMIT 8 + ROLLD NEXT

LIMIT DUPN 1 8 START LIMIT LIMIT 8 + + ROLL NEXT

DO 9. ROLL X UNPICK

IF X CHECKQUEEN

THEN X 1 + DUP

IF 8 \<= THEN IF DOLEVEL THEN 0. 9 ROLLD 0. 'LIMIT' STO ELSE X PICK 17 X - ROLLD END

ELSE 9 ROLLD 0 'LIMIT' STO END

ELSE X PICK 17 X - ROLLD

END 'LIMIT' DECR

UNTIL 0. \<=

END

1 9 X - START 9 ROLL DROP NEXT

IF LIMIT 0. ==

THEN 0 ELSE 1 END

»

»

MAIN PROGRAM NQUEENS:

------------------------------

« TICKS

1. 2. 3. 4. 5. 6. 7. 8.

0. 0. 0. 0. 0. 0. 0. 0.

1. DOLEVEL DROP

1. 8. START 9. ROLL DROP NEXT

TICKS

10. ROLL - B\->R 8192. /

»

I previously posted the benchmark results, but I will repost here for completeness.

Benchmark results:

- Original: userRPL = 90 seconds. newRPL = 197 ms
- Stack: userRPL = 60.6 seconds. newRPL = 154 ms
- Original w/optimized inner loop: userRPL = 67.6 seconds. newRPL = 154 ms
- Stack w/Optimized inner loop: userRPL = 57.7 seconds. newRPL = 143 ms
- Recursive stack: userRPL = 29.7 seconds. newRPL = 101 ms
- Werner's stack (see below): userRPL = 22.6 seconds. newRPL = 106 ms

These results are at the normal speed for both environments (userRPL=75 MHz, newRPL=192 MHz), both running on different 50g machines.

The exact same code was executed on both machines. The only difference was that the trailing dots were removed from the numbers on newRPL. The change is justified since they are a quirk of the 50g, to force the use of faster real numbers instead of variable precision integers. newRPL has automatic integer to real promotion and variable precision reals, so there is no need for the explicit use of reals. Leaving the dots would actually cause a slow-down on newRPL due to the forced use of reals.

newRPL produced the wrong result when running the recursive method, which I will investigate, fix it and report back.

UPDATE: after fixing a bug in the compiler, newRPL runs the recursive version properly. The result was added to the list, as well as Werner's algorithm running unmodified on newRPL.

Claudio