Post Reply 
N-Queens on 50g (RPL language)
10-31-2014, 11:21 PM (This post was last modified: 11-06-2014 12:33 AM by Claudio L..)
Post: #1
N-Queens on 50g (RPL language)
Inspired by Patrice's relentless search for speed, I created the following versions of the N-Queens problem, solved initially for the Calculator's Benchmark

1) The original unmodified code was tested as per the article.
2) The same algorithm, but using PICK/UNPICK on the stack instead of lists:

Code:

« 8. 0. 0. 0. \-> R S X Y
  « CLEAR TICKS 1. R
    START 0.
    NEXT
    DO R 'X' INCR UNPICK
      DO 'S' INCR DROP X 'Y' STO
        WHILE Y 1. >
        REPEAT X PICK 'Y' DECR 1. + PICK -
          IF DUP 0. == SWAP ABS X Y - == OR
          THEN 0. 'Y' STO X PICK 1. - X UNPICK
            WHILE X PICK 0. ==
            REPEAT 'X' DECR PICK 1. - X UNPICK
            END
          END
        END
      UNTIL Y 1. ==
      END
    UNTIL X R ==
    END 8. DROPN TICKS SWAP - B\->R 8192. / S
  »
»

3) The original algorithm, but removing the list access from the inner loop:

Code:

« 8. 0. 0. 0. \-> R S X Y
  « CLEAR TICKS 1. R
    START 0.
    NEXT
    DO R 'X' INCR UNPICK
      DO 'S' INCR DROP X 'Y' STO
        WHILE Y 1. >
        REPEAT X PICK 'Y' DECR 1. + PICK -
          IF DUP 0. == SWAP ABS X Y - == OR
          THEN 0. 'Y' STO X PICK 1. - X UNPICK
            WHILE X PICK 0. ==
            REPEAT 'X' DECR PICK 1. - X UNPICK
            END
          END
        END
      UNTIL Y 1. ==
      END
    UNTIL X R ==
    END 8. DROPN TICKS SWAP - B\->R 8192. / S
  »
»

4) The algorithm in 2) removing the indirection from the inner loop:

Code:

« 8. 0. 0. 0. 0. \-> R S X Y Ax
  « CLEAR TICKS 1. R
    START 0.
    NEXT
    DO 'X' INCR DROP R 'Ax' STO
      DO 'S' INCR DROP X 'Y' STO
        WHILE Y 1. >
        REPEAT Ax Y PICK - 'Y' DECR DROP
          IF DUP 0. == SWAP ABS X Y - == OR
          THEN 0. 'Y' STO 'Ax' DECR DROP
            WHILE Ax 0. ==
            REPEAT 'X' DECR PICK 1. - 'Ax' STO
            END
          END
        END
      UNTIL Y 1. ==
      END Ax X UNPICK
    UNTIL X R ==
    END TICKS 10. ROLL - B\->R 8192. / S
  »
»

5) A new recursive algorithm:

Code:

ROUTINE CHECKQUEEN:
--------------------------
« 1. \-> X RES «
IF X 1 > THEN
X PICK
 1 X 1 - FOR I
  DUP I 2 + PICK - ABS X I - ABS
   IF == THEN
     0. 'RES' STO X 'I' STO
   END
   NEXT
   DROP RES
 ELSE
   1.
 END
»
» 

ROUTINE DOLEVEL:
---------------------
«  9 OVER - \-> X LIMIT 
«
 
 1 8 START LIMIT 8 + ROLLD NEXT
 LIMIT DUPN 1 8 START LIMIT LIMIT 8 + + ROLL NEXT
  
  DO 9. ROLL X UNPICK
        IF X CHECKQUEEN
        THEN X 1 + DUP 
         IF 8 \<= THEN IF DOLEVEL THEN 0. 9 ROLLD 0. 'LIMIT' STO ELSE X PICK 17 X - ROLLD END
          ELSE 9 ROLLD 0 'LIMIT' STO END
          
        ELSE X PICK 17 X - ROLLD
        END 'LIMIT' DECR
      UNTIL 0. \<=
      END
      
      1 9 X - START 9 ROLL DROP NEXT
    
      
      IF LIMIT 0. ==
      THEN 0 ELSE 1 END


  »
»
MAIN PROGRAM NQUEENS:
------------------------------
« TICKS
  1. 2. 3. 4. 5. 6. 7. 8.
  0. 0. 0. 0. 0. 0. 0. 0.
  1. DOLEVEL DROP
  
  1. 8. START 9. ROLL DROP NEXT
   
  TICKS
  
  10. ROLL - B\->R 8192. /
»

I previously posted the benchmark results, but I will repost here for completeness.

Benchmark results:

  1. Original: userRPL = 90 seconds. newRPL = 197 ms
  2. Stack: userRPL = 60.6 seconds. newRPL = 154 ms
  3. Original w/optimized inner loop: userRPL = 67.6 seconds. newRPL = 154 ms
  4. Stack w/Optimized inner loop: userRPL = 57.7 seconds. newRPL = 143 ms
  5. Recursive stack: userRPL = 29.7 seconds. newRPL = 101 ms
  6. Werner's stack (see below): userRPL = 22.6 seconds. newRPL = 106 ms


These results are at the normal speed for both environments (userRPL=75 MHz, newRPL=192 MHz), both running on different 50g machines.
The exact same code was executed on both machines. The only difference was that the trailing dots were removed from the numbers on newRPL. The change is justified since they are a quirk of the 50g, to force the use of faster real numbers instead of variable precision integers. newRPL has automatic integer to real promotion and variable precision reals, so there is no need for the explicit use of reals. Leaving the dots would actually cause a slow-down on newRPL due to the forced use of reals.

newRPL produced the wrong result when running the recursive method, which I will investigate, fix it and report back.

UPDATE: after fixing a bug in the compiler, newRPL runs the recursive version properly. The result was added to the list, as well as Werner's algorithm running unmodified on newRPL.

Claudio
Find all posts by this user
Quote this message in a reply
11-01-2014, 10:03 AM
Post: #2
RE: N-Queens on 50g (RPL language)
Here's my entry, still using the same algorithm:
[49G] 125 bytes #A38Ch

Code:
\<<
  0.
  DO
    8. SWAP 1. +              @ Add Queen
    WHILE
                              @ Test if Qn can be caught by Q1..Qn-1
      DUP2                    @ Q1 .. Qn n Qn n
      DO 1. -                 @ Q1 .. Qn n Qn n-i
      UNTIL
        DUP2 5. + PICK - ABS  @ n-i |Qn-Qi|
        DUP2 - * NOT          @ n-i T/F
      END
    REPEAT                    @ Qn can be caught
      DROP
                              @ Next square for Queen
      WHILE SWAP DUP 1. SAME
      REPEAT -                @ Drop Queen - will not remove the first
      END
      1. - SWAP
    END
    DROP
  UNTIL DUP 8. SAME           @ Done
  END
  \->LIST
\>>

Compatible with all RPL models, I think.
timing on a real 49G: 54_s

Cheers, Werner
Find all posts by this user
Quote this message in a reply
11-01-2014, 01:34 PM
Post: #3
RE: N-Queens on 50g (RPL language)
(11-01-2014 10:03 AM)Werner Wrote:  timing on a real 49G: 54_s

Excellent! Very clean and efficient. This is a proper implementation of the original algorithm on-stack (mine was just taking the same code and changing GET/PUT with PICK/UNPICK, so not really well thought out). No lists, not even local variables. I think we have a winner here. It will be very difficult for anybody to beat this.
I ran it on the 50g for proper comparison: 22.6 seconds, so it smokes my recursive implementation and then some.
Great work! I'll edit the original post to report your results as well.

Thanks for the entry.
Claudio
Find all posts by this user
Quote this message in a reply
11-02-2014, 02:24 AM
Post: #4
RE: N-Queens on 50g (RPL language)
(10-31-2014 11:21 PM)Claudio L. Wrote:  
  • Recursive stack: userRPL = 29.7 seconds. newRPL = 111 ms
  • Werner's stack (see below): userRPL = 22.6 seconds. newRPL = 141 ms

I was able to run things on newRPL and updated the original post with the new results.
Interestingly, the recursive algorithm runs faster than Werner's stack algorithm on newRPL, while the opposite is true for userRPL.
This shows how different optimizations work on different platforms. I believe the difference is in the use of local variables: userRPL does a search by name within the environment, while the newRPL compiler is capable of optimizing into PUTLAM/GETLAM style operations at compile time, making the use of named local variables just as efficient as using the stack.

Then, the recursive algorithm is theoretically supposed to be faster, since it only checks 301 cases versus the 876 of the general algorithm. Not 3 times faster as the number of checks would suggest, since the recursive algorithm wastes some time shuffling numbers in the stack, and the discarded cases are so trivial the other algorithm doesn't take too much time on them, but should be a little faster. When running on userRPL it takes a big hit for using variables, enough to overturn the results.

Another interesting observation is that the speedup ratio between the slowest algorithm (the original one) vs the fastest on each platform is very different: 3.98x for userRPL and 2.04x for newRPL. This is also likely due to the smarter compiler in newRPL speeding up things.
Find all posts by this user
Quote this message in a reply
11-02-2014, 03:00 AM
Post: #5
RE: N-Queens on 50g (RPL language)
(11-02-2014 02:24 AM)Claudio L. Wrote:  ...I believe the difference is in the use of local variables: userRPL does a search by name within the environment, while the newRPL compiler is capable of optimizing into PUTLAM/GETLAM style operations at compile time, making the use of named local variables just as efficient as using the stack.

This is a Very Good Thing.

Keep up the good work!
Find all posts by this user
Quote this message in a reply
11-02-2014, 08:17 AM
Post: #6
RE: N-Queens on 50g (RPL language)
Very interesting! As soon as I'm back from my journey, I'll have a closer look.
Find all posts by this user
Quote this message in a reply
11-02-2014, 10:36 AM
Post: #7
RE: N-Queens on 50g (RPL language)
Well done guys.

Nice to see your efforts to make yours calcs more "Fast and Furious"

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
Find all posts by this user
Quote this message in a reply
11-05-2014, 11:57 PM
Post: #8
RE: N-Queens on 50g (RPL language)
Applying two simple improvements, one in the instruction execution loop of newRPL (moved some unnecessary things out of the loop), and then replacing memmove() with a version that moves words instead of bytes (which I should've done from the beginning), produced a nice speed improvement between 10 to 25%.
These improvements are nothing special and apply to all programs running on newRPL.
The original went from 226 ms to 198 ms, while Werner's went from 141 ms to 106 ms, being the one most impacted by the improvements.
I'll update the times in the original post with the new marks for recordkeeping.
Find all posts by this user
Quote this message in a reply
11-06-2014, 03:29 AM
Post: #9
RE: N-Queens on 50g (RPL language)
Awesome!

Graph 3D | QPI | SolveSys
Find all posts by this user
Quote this message in a reply
11-13-2014, 06:08 PM
Post: #10
RE: N-Queens on 50g (RPL language)
Thanks to Claudio for the effort and comparisons.

And special thanks to Werner for this impressive solution and the new test code for userRPL.

Besides the 49G and 50G, I've updated the results of the 28C und 28S (normal and turbo speed).

Unfortunately I had to remove the userRPL results of the 48SX and 48GX and also the sysRPL result
of the 48GX, because these are not comparable any more.
Find all posts by this user
Quote this message in a reply
11-18-2014, 07:39 AM (This post was last modified: 11-18-2014 10:39 AM by Werner.)
Post: #11
RE: N-Queens on 50g (RPL language)
Xerxes,
the RPL is compatible with the 48 (S,G and GX) as well.
The dots after the numbers are only needed for the 49 and 50, as these have the integer type, and I want reals (they are much faster). Depending on the state of flag 51, the dot must be a period (clear) or a comma (set).
As for SysRPL, here's a quick transcript of the RPL:

Code:
::
  ZERO
  BEGIN
    EIGHT SWAP#1+
    ::
    BEGIN
      2DUP
      BEGIN
        #1-
        2DUP FIVE #+PICK 2DUP#> IT SWAP #-
        2DUP#+ 2#0=OR 
      UNTIL
      #0=case AGAIN
      DROPSWAP
      BEGIN DUP#1=
      WHILE #-SWAP
      REPEAT
      #1-SWAP
    AGAIN
    ;
    DROPRDROP
    EIGHT OVER#=
  UNTIL
  ZERO_DO
    8ROLL UNCOERCE
  LOOP
  EIGHT {}N
;
@

Cheers, Werner
Find all posts by this user
Quote this message in a reply
11-18-2014, 09:45 PM
Post: #12
RE: N-Queens on 50g (RPL language)
It would be very nice to have the results of the 48SX/GX with your userRPL and sysRPL test codes in the list, but I haven't the possibility to make these tests.
Find all posts by this user
Quote this message in a reply
11-19-2014, 06:22 AM
Post: #13
RE: N-Queens on 50g (RPL language)
I have a 48GX, but no SX. Will test at first opportunity, that would be tonight if I don't forget ;-)
Werner
Find all posts by this user
Quote this message in a reply
11-19-2014, 03:19 PM
Post: #14
RE: N-Queens on 50g (RPL language)
(11-18-2014 07:39 AM)Werner Wrote:  As for SysRPL, here's a quick transcript of the RPL:
Interesting.
On the 50g, your sysRPL version clocked 2.318 seconds (average of 5 runs doing MEM right before). Not bad at all!
In the list, it fits right in line with the WP-34S (2.1/2.3 seconds).

Comparing this result with your algorithm written in userRPL shows a speedup of sysRPL vs userRPL of almost 10x. Then from sysRPL to newRPL the speedup is 16x, but since it's running at 192 MHz, the real gain due to bypassing the Saturn layer is around 6.4x (at the same clock speed).

Normalizing results, for language efficiency comparison only:

userRPL = 1.0x (by definition)
sysRPL = 9.75x (speedup due to BINT use, lost argument safety checks)
newRPL = 62.6x (speedup due to emulation layer, at 75 MHz)

Notice that sysRPL loses all safety checks to increase efficiency, newRPL and userRPL have all safety checks in place. newRPL uses BINTs automatically, so it's closer to sysRPL than userRPL in this aspect.
N-Queens does not make heavy use of basic arithmetic operations on reals, so it's more oriented towards loop efficiency and integer math/counters. I've been toying with other tests that do some number crunching (mainly some PI finding algorithms, some series expansions, etc). The idea is to measure real computing speed (real as in real numbers), but I can't find a good algorithm, so any ideas are welcome.

Here's a couple of examples, but none of them work for what I want to measure:

Simple product that converges to PI/2:
Code:

<< 1. .5 √ 1. 100. START
           DUP ROT * SWAP .5 * .5 + √ NEXT
 * INV 2. *
 >>

Archimedes's method of finding PI:
Code:

<<
4. 3. √ * 6. -> a b
<< 1. 100. START 
2. a b * * a b + / 'a' STO b a * √ 'b' STO
NEXT a 2 /
>>
>>

Both examples are not good because products that converge tend to 1, and after a few terms, it all becomes "multiply by 1". It's also very fast in userRPL, which makes it difficult to measure in newRPL.

What other problems could we use to benchmark real numbers?
Find all posts by this user
Quote this message in a reply
11-20-2014, 08:53 AM
Post: #15
RE: N-Queens on 50g (RPL language)
Fast Fourier transforms or pretty much anything involving matrices would probably be reasonable speed tests. Linear or polynomial regressions on large data sets using vandermonde matrices come to mind for me since I have been working with them recently.

Graph 3D | QPI | SolveSys
Find all posts by this user
Quote this message in a reply
11-20-2014, 09:27 PM
Post: #16
RE: N-Queens on 50g (RPL language)
Claudio, I've updated the list with the sysRPL und newRPL results. Thanks for testing.

(11-19-2014 03:19 PM)Claudio L. Wrote:  N-Queens does not make heavy use of basic arithmetic operations on reals, so it's more oriented towards loop efficiency and integer math/counters.

At that time I had also difficulties to find a real numbers problem for benchmarking. But as soon as I decided to test assembly languages too,
I switched to an integer problem. A new real number benchmark would be interesting.
Find all posts by this user
Quote this message in a reply
11-20-2014, 10:57 PM
Post: #17
RE: N-Queens on 50g (RPL language)
@Han: Using matrices uses more memory, so memory management starts playing a bigger role. I like the FFT, which is a lot of multiplications but it needs a lot of storage as well.

How about this one: Is the sum of the reciprocal of triangular numbers, which very slowly converges to 2.
The slow convergence is good so we can do thousands of operations without going out of range.
\[ \sum { 2 \over n (n+1) } \]
Here's the code:
Code:

<< 0. 10000. 1. FOR n
   2 n DUP 1. + * / +
   -1 STEP
>>

It has one multiplication, one division and two additions at each step. I chose 10000 terms quite arbitrarily. The last term is in the order 1e-6, so it is still significant even with the 12-digit precision. It takes about 87 seconds on the 50g, so it can be measured comfortably.
Notice that the loop runs backwards, so there's less loss of significant digits.

One drawback I see is that the multiplication is always between two integers...

Any objections or other proposals?
Find all posts by this user
Quote this message in a reply
11-21-2014, 04:27 PM
Post: #18
RE: N-Queens on 50g (RPL language)
Easy solution: replace n=10*k and work with the integers divided by 10, then compensate with the numeric factor, and no more integers anywhere:

Code:

<< 0. 1000. 0.1 FOR n
   0.02 n DUP 0.1 + * / +
   -0.1 STEP
>>

The exact result is:
1 followed by 'x' 9's, then an 8, followed by 'x' 0's, then repeats the sequence with a 1 followed by...
'x' is the exponent of the end of the loop. If the loop starts from 1e3, there's 3 9's and 3 zeroes:
1.999 8 000 1 999 8 000 1 999 8 000 1 999 8 000 ...

So it's easy to check for rounding errors even at higher precisions (which I want to benchmark too).
Find all posts by this user
Quote this message in a reply
11-26-2014, 05:32 AM (This post was last modified: 11-26-2014 01:32 PM by Claudio L..)
Post: #19
RE: N-Queens on 50g (RPL language)
I finally had the time to get some results in.
The following code was tested on a 50g under userRPL and newRPL under various precision settings.
Code:

<< 0. 1000. 0.1 FOR n
   0.02 n DUP 0.1 + * / +
   -0.1 STEP
>>

The times resulted as follows:

userRPL = 87.4 seconds
newRPL = 0.893 seconds (at 36 digits precision)
newRPL = 1.615 seconds (at 108 digits precision)
newRPL = 10.843 seconds (at 1008 digits precision)

A few observations from these results:
a) newRPL vs userRPL shows a speedup of 98x. Not as impressive as the 200x obtained in integer math, but still good, especially considering it's using 36-digit precision vs. 12-digits in userRPL.
b) Execution time seems to be growing fairly linear with precision, which is somewhat unexpected (at least for multiplication and division, which should have a big influence on this test).
c) Since the multiplication involves numbers of known precision (variable n), the time used in the multiplication in this test should be independent of the precision. In this case, the division and the last addition are the only operations affected by the selected precision (not sure if that is enough to justify the linear behavior).

Claudio
Find all posts by this user
Quote this message in a reply
09-08-2015, 10:41 AM
Post: #20
RE: N-Queens on 50g (RPL language)
(11-18-2014 09:45 PM)xerxes Wrote:  It would be very nice to have the results of the 48SX/GX with your userRPL and sysRPL test codes in the list, but I haven't the possibility to make these tests.

Hi,

below are the results using the Werner's Algorithm on my HP48(s) :

UserRPL (125 bytes ; #5987h):

HP48S ROM revJ: 76.97s
HP48SX ROM revE: 78.94s
HP48GX ROM revM (@3.6Mhz): 54.46s

SysRPL (105 bytes ; #D802h):

HP48S ROM revJ: 15.06s
HP48SX ROM revE: 15.98s
HP48GX ROM revM (@3.6Mhz): 10.19s
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)