10-31-2014, 11:21 PM
Inspired by Patrice's relentless search for speed, I created the following versions of the N-Queens problem, solved initially for the Calculator's Benchmark
1) The original unmodified code was tested as per the article.
2) The same algorithm, but using PICK/UNPICK on the stack instead of lists:
3) The original algorithm, but removing the list access from the inner loop:
4) The algorithm in 2) removing the indirection from the inner loop:
5) A new recursive algorithm:
I previously posted the benchmark results, but I will repost here for completeness.
Benchmark results:
These results are at the normal speed for both environments (userRPL=75 MHz, newRPL=192 MHz), both running on different 50g machines.
The exact same code was executed on both machines. The only difference was that the trailing dots were removed from the numbers on newRPL. The change is justified since they are a quirk of the 50g, to force the use of faster real numbers instead of variable precision integers. newRPL has automatic integer to real promotion and variable precision reals, so there is no need for the explicit use of reals. Leaving the dots would actually cause a slow-down on newRPL due to the forced use of reals.
newRPL produced the wrong result when running the recursive method, which I will investigate, fix it and report back.
UPDATE: after fixing a bug in the compiler, newRPL runs the recursive version properly. The result was added to the list, as well as Werner's algorithm running unmodified on newRPL.
Claudio
1) The original unmodified code was tested as per the article.
2) The same algorithm, but using PICK/UNPICK on the stack instead of lists:
Code:
« 8. 0. 0. 0. \-> R S X Y
« CLEAR TICKS 1. R
START 0.
NEXT
DO R 'X' INCR UNPICK
DO 'S' INCR DROP X 'Y' STO
WHILE Y 1. >
REPEAT X PICK 'Y' DECR 1. + PICK -
IF DUP 0. == SWAP ABS X Y - == OR
THEN 0. 'Y' STO X PICK 1. - X UNPICK
WHILE X PICK 0. ==
REPEAT 'X' DECR PICK 1. - X UNPICK
END
END
END
UNTIL Y 1. ==
END
UNTIL X R ==
END 8. DROPN TICKS SWAP - B\->R 8192. / S
»
»
3) The original algorithm, but removing the list access from the inner loop:
Code:
« 8. 0. 0. 0. \-> R S X Y
« CLEAR TICKS 1. R
START 0.
NEXT
DO R 'X' INCR UNPICK
DO 'S' INCR DROP X 'Y' STO
WHILE Y 1. >
REPEAT X PICK 'Y' DECR 1. + PICK -
IF DUP 0. == SWAP ABS X Y - == OR
THEN 0. 'Y' STO X PICK 1. - X UNPICK
WHILE X PICK 0. ==
REPEAT 'X' DECR PICK 1. - X UNPICK
END
END
END
UNTIL Y 1. ==
END
UNTIL X R ==
END 8. DROPN TICKS SWAP - B\->R 8192. / S
»
»
4) The algorithm in 2) removing the indirection from the inner loop:
Code:
« 8. 0. 0. 0. 0. \-> R S X Y Ax
« CLEAR TICKS 1. R
START 0.
NEXT
DO 'X' INCR DROP R 'Ax' STO
DO 'S' INCR DROP X 'Y' STO
WHILE Y 1. >
REPEAT Ax Y PICK - 'Y' DECR DROP
IF DUP 0. == SWAP ABS X Y - == OR
THEN 0. 'Y' STO 'Ax' DECR DROP
WHILE Ax 0. ==
REPEAT 'X' DECR PICK 1. - 'Ax' STO
END
END
END
UNTIL Y 1. ==
END Ax X UNPICK
UNTIL X R ==
END TICKS 10. ROLL - B\->R 8192. / S
»
»
5) A new recursive algorithm:
Code:
ROUTINE CHECKQUEEN:
--------------------------
« 1. \-> X RES «
IF X 1 > THEN
X PICK
1 X 1 - FOR I
DUP I 2 + PICK - ABS X I - ABS
IF == THEN
0. 'RES' STO X 'I' STO
END
NEXT
DROP RES
ELSE
1.
END
»
»
ROUTINE DOLEVEL:
---------------------
« 9 OVER - \-> X LIMIT
«
1 8 START LIMIT 8 + ROLLD NEXT
LIMIT DUPN 1 8 START LIMIT LIMIT 8 + + ROLL NEXT
DO 9. ROLL X UNPICK
IF X CHECKQUEEN
THEN X 1 + DUP
IF 8 \<= THEN IF DOLEVEL THEN 0. 9 ROLLD 0. 'LIMIT' STO ELSE X PICK 17 X - ROLLD END
ELSE 9 ROLLD 0 'LIMIT' STO END
ELSE X PICK 17 X - ROLLD
END 'LIMIT' DECR
UNTIL 0. \<=
END
1 9 X - START 9 ROLL DROP NEXT
IF LIMIT 0. ==
THEN 0 ELSE 1 END
»
»
MAIN PROGRAM NQUEENS:
------------------------------
« TICKS
1. 2. 3. 4. 5. 6. 7. 8.
0. 0. 0. 0. 0. 0. 0. 0.
1. DOLEVEL DROP
1. 8. START 9. ROLL DROP NEXT
TICKS
10. ROLL - B\->R 8192. /
»
I previously posted the benchmark results, but I will repost here for completeness.
Benchmark results:
- Original: userRPL = 90 seconds. newRPL = 197 ms
- Stack: userRPL = 60.6 seconds. newRPL = 154 ms
- Original w/optimized inner loop: userRPL = 67.6 seconds. newRPL = 154 ms
- Stack w/Optimized inner loop: userRPL = 57.7 seconds. newRPL = 143 ms
- Recursive stack: userRPL = 29.7 seconds. newRPL = 101 ms
- Werner's stack (see below): userRPL = 22.6 seconds. newRPL = 106 ms
These results are at the normal speed for both environments (userRPL=75 MHz, newRPL=192 MHz), both running on different 50g machines.
The exact same code was executed on both machines. The only difference was that the trailing dots were removed from the numbers on newRPL. The change is justified since they are a quirk of the 50g, to force the use of faster real numbers instead of variable precision integers. newRPL has automatic integer to real promotion and variable precision reals, so there is no need for the explicit use of reals. Leaving the dots would actually cause a slow-down on newRPL due to the forced use of reals.
newRPL produced the wrong result when running the recursive method, which I will investigate, fix it and report back.
UPDATE: after fixing a bug in the compiler, newRPL runs the recursive version properly. The result was added to the list, as well as Werner's algorithm running unmodified on newRPL.
Claudio