NQueens on 50g (RPL language)

10312014, 11:21 PM
(This post was last modified: 11062014 12:33 AM by Claudio L..)
Post: #1




NQueens on 50g (RPL language)
Inspired by Patrice's relentless search for speed, I created the following versions of the NQueens problem, solved initially for the Calculator's Benchmark
1) The original unmodified code was tested as per the article. 2) The same algorithm, but using PICK/UNPICK on the stack instead of lists: Code:
3) The original algorithm, but removing the list access from the inner loop: Code:
4) The algorithm in 2) removing the indirection from the inner loop: Code:
5) A new recursive algorithm: Code:
I previously posted the benchmark results, but I will repost here for completeness. Benchmark results:
These results are at the normal speed for both environments (userRPL=75 MHz, newRPL=192 MHz), both running on different 50g machines. The exact same code was executed on both machines. The only difference was that the trailing dots were removed from the numbers on newRPL. The change is justified since they are a quirk of the 50g, to force the use of faster real numbers instead of variable precision integers. newRPL has automatic integer to real promotion and variable precision reals, so there is no need for the explicit use of reals. Leaving the dots would actually cause a slowdown on newRPL due to the forced use of reals. newRPL produced the wrong result when running the recursive method, which I will investigate, fix it and report back. UPDATE: after fixing a bug in the compiler, newRPL runs the recursive version properly. The result was added to the list, as well as Werner's algorithm running unmodified on newRPL. Claudio 

11012014, 10:03 AM
Post: #2




RE: NQueens on 50g (RPL language)
Here's my entry, still using the same algorithm:
[49G] 125 bytes #A38Ch Code: \<< Compatible with all RPL models, I think. timing on a real 49G: 54_s Cheers, Werner 

11012014, 01:34 PM
Post: #3




RE: NQueens on 50g (RPL language)
(11012014 10:03 AM)Werner Wrote: timing on a real 49G: 54_s Excellent! Very clean and efficient. This is a proper implementation of the original algorithm onstack (mine was just taking the same code and changing GET/PUT with PICK/UNPICK, so not really well thought out). No lists, not even local variables. I think we have a winner here. It will be very difficult for anybody to beat this. I ran it on the 50g for proper comparison: 22.6 seconds, so it smokes my recursive implementation and then some. Great work! I'll edit the original post to report your results as well. Thanks for the entry. Claudio 

11022014, 02:24 AM
Post: #4




RE: NQueens on 50g (RPL language)
(10312014 11:21 PM)Claudio L. Wrote: I was able to run things on newRPL and updated the original post with the new results. Interestingly, the recursive algorithm runs faster than Werner's stack algorithm on newRPL, while the opposite is true for userRPL. This shows how different optimizations work on different platforms. I believe the difference is in the use of local variables: userRPL does a search by name within the environment, while the newRPL compiler is capable of optimizing into PUTLAM/GETLAM style operations at compile time, making the use of named local variables just as efficient as using the stack. Then, the recursive algorithm is theoretically supposed to be faster, since it only checks 301 cases versus the 876 of the general algorithm. Not 3 times faster as the number of checks would suggest, since the recursive algorithm wastes some time shuffling numbers in the stack, and the discarded cases are so trivial the other algorithm doesn't take too much time on them, but should be a little faster. When running on userRPL it takes a big hit for using variables, enough to overturn the results. Another interesting observation is that the speedup ratio between the slowest algorithm (the original one) vs the fastest on each platform is very different: 3.98x for userRPL and 2.04x for newRPL. This is also likely due to the smarter compiler in newRPL speeding up things. 

11022014, 03:00 AM
Post: #5




RE: NQueens on 50g (RPL language)
(11022014 02:24 AM)Claudio L. Wrote: ...I believe the difference is in the use of local variables: userRPL does a search by name within the environment, while the newRPL compiler is capable of optimizing into PUTLAM/GETLAM style operations at compile time, making the use of named local variables just as efficient as using the stack. This is a Very Good Thing. Keep up the good work! 

11022014, 08:17 AM
Post: #6




RE: NQueens on 50g (RPL language)
Very interesting! As soon as I'm back from my journey, I'll have a closer look.


11022014, 10:36 AM
Post: #7




RE: NQueens on 50g (RPL language)
Well done guys.
Nice to see your efforts to make yours calcs more "Fast and Furious" Patrice “Everything should be made as simple as possible, but no simpler.” Albert Einstein 

11052014, 11:57 PM
Post: #8




RE: NQueens on 50g (RPL language)
Applying two simple improvements, one in the instruction execution loop of newRPL (moved some unnecessary things out of the loop), and then replacing memmove() with a version that moves words instead of bytes (which I should've done from the beginning), produced a nice speed improvement between 10 to 25%.
These improvements are nothing special and apply to all programs running on newRPL. The original went from 226 ms to 198 ms, while Werner's went from 141 ms to 106 ms, being the one most impacted by the improvements. I'll update the times in the original post with the new marks for recordkeeping. 

11062014, 03:29 AM
Post: #9




RE: NQueens on 50g (RPL language)
Awesome!
Graph 3D  QPI  SolveSys 

11132014, 06:08 PM
Post: #10




RE: NQueens on 50g (RPL language)
Thanks to Claudio for the effort and comparisons.
And special thanks to Werner for this impressive solution and the new test code for userRPL. Besides the 49G and 50G, I've updated the results of the 28C und 28S (normal and turbo speed). Unfortunately I had to remove the userRPL results of the 48SX and 48GX and also the sysRPL result of the 48GX, because these are not comparable any more. 

11182014, 07:39 AM
(This post was last modified: 11182014 10:39 AM by Werner.)
Post: #11




RE: NQueens on 50g (RPL language)
Xerxes,
the RPL is compatible with the 48 (S,G and GX) as well. The dots after the numbers are only needed for the 49 and 50, as these have the integer type, and I want reals (they are much faster). Depending on the state of flag 51, the dot must be a period (clear) or a comma (set). As for SysRPL, here's a quick transcript of the RPL: Code: :: Cheers, Werner 

11182014, 09:45 PM
Post: #12




RE: NQueens on 50g (RPL language)
It would be very nice to have the results of the 48SX/GX with your userRPL and sysRPL test codes in the list, but I haven't the possibility to make these tests.


11192014, 06:22 AM
Post: #13




RE: NQueens on 50g (RPL language)
I have a 48GX, but no SX. Will test at first opportunity, that would be tonight if I don't forget ;)
Werner 

11192014, 03:19 PM
Post: #14




RE: NQueens on 50g (RPL language)
(11182014 07:39 AM)Werner Wrote: As for SysRPL, here's a quick transcript of the RPL:Interesting. On the 50g, your sysRPL version clocked 2.318 seconds (average of 5 runs doing MEM right before). Not bad at all! In the list, it fits right in line with the WP34S (2.1/2.3 seconds). Comparing this result with your algorithm written in userRPL shows a speedup of sysRPL vs userRPL of almost 10x. Then from sysRPL to newRPL the speedup is 16x, but since it's running at 192 MHz, the real gain due to bypassing the Saturn layer is around 6.4x (at the same clock speed). Normalizing results, for language efficiency comparison only: userRPL = 1.0x (by definition) sysRPL = 9.75x (speedup due to BINT use, lost argument safety checks) newRPL = 62.6x (speedup due to emulation layer, at 75 MHz) Notice that sysRPL loses all safety checks to increase efficiency, newRPL and userRPL have all safety checks in place. newRPL uses BINTs automatically, so it's closer to sysRPL than userRPL in this aspect. NQueens does not make heavy use of basic arithmetic operations on reals, so it's more oriented towards loop efficiency and integer math/counters. I've been toying with other tests that do some number crunching (mainly some PI finding algorithms, some series expansions, etc). The idea is to measure real computing speed (real as in real numbers), but I can't find a good algorithm, so any ideas are welcome. Here's a couple of examples, but none of them work for what I want to measure: Simple product that converges to PI/2: Code:
Archimedes's method of finding PI: Code:
Both examples are not good because products that converge tend to 1, and after a few terms, it all becomes "multiply by 1". It's also very fast in userRPL, which makes it difficult to measure in newRPL. What other problems could we use to benchmark real numbers? 

11202014, 08:53 AM
Post: #15




RE: NQueens on 50g (RPL language)
Fast Fourier transforms or pretty much anything involving matrices would probably be reasonable speed tests. Linear or polynomial regressions on large data sets using vandermonde matrices come to mind for me since I have been working with them recently.
Graph 3D  QPI  SolveSys 

11202014, 09:27 PM
Post: #16




RE: NQueens on 50g (RPL language)
Claudio, I've updated the list with the sysRPL und newRPL results. Thanks for testing.
(11192014 03:19 PM)Claudio L. Wrote: NQueens does not make heavy use of basic arithmetic operations on reals, so it's more oriented towards loop efficiency and integer math/counters. At that time I had also difficulties to find a real numbers problem for benchmarking. But as soon as I decided to test assembly languages too, I switched to an integer problem. A new real number benchmark would be interesting. 

11202014, 10:57 PM
Post: #17




RE: NQueens on 50g (RPL language)
@Han: Using matrices uses more memory, so memory management starts playing a bigger role. I like the FFT, which is a lot of multiplications but it needs a lot of storage as well.
How about this one: Is the sum of the reciprocal of triangular numbers, which very slowly converges to 2. The slow convergence is good so we can do thousands of operations without going out of range. \[ \sum { 2 \over n (n+1) } \] Here's the code: Code:
It has one multiplication, one division and two additions at each step. I chose 10000 terms quite arbitrarily. The last term is in the order 1e6, so it is still significant even with the 12digit precision. It takes about 87 seconds on the 50g, so it can be measured comfortably. Notice that the loop runs backwards, so there's less loss of significant digits. One drawback I see is that the multiplication is always between two integers... Any objections or other proposals? 

11212014, 04:27 PM
Post: #18




RE: NQueens on 50g (RPL language)
Easy solution: replace n=10*k and work with the integers divided by 10, then compensate with the numeric factor, and no more integers anywhere:
Code:
The exact result is: 1 followed by 'x' 9's, then an 8, followed by 'x' 0's, then repeats the sequence with a 1 followed by... 'x' is the exponent of the end of the loop. If the loop starts from 1e3, there's 3 9's and 3 zeroes: 1.999 8 000 1 999 8 000 1 999 8 000 1 999 8 000 ... So it's easy to check for rounding errors even at higher precisions (which I want to benchmark too). 

11262014, 05:32 AM
(This post was last modified: 11262014 01:32 PM by Claudio L..)
Post: #19




RE: NQueens on 50g (RPL language)
I finally had the time to get some results in.
The following code was tested on a 50g under userRPL and newRPL under various precision settings. Code:
The times resulted as follows: userRPL = 87.4 seconds newRPL = 0.893 seconds (at 36 digits precision) newRPL = 1.615 seconds (at 108 digits precision) newRPL = 10.843 seconds (at 1008 digits precision) A few observations from these results: a) newRPL vs userRPL shows a speedup of 98x. Not as impressive as the 200x obtained in integer math, but still good, especially considering it's using 36digit precision vs. 12digits in userRPL. b) Execution time seems to be growing fairly linear with precision, which is somewhat unexpected (at least for multiplication and division, which should have a big influence on this test). c) Since the multiplication involves numbers of known precision (variable n), the time used in the multiplication in this test should be independent of the precision. In this case, the division and the last addition are the only operations affected by the selected precision (not sure if that is enough to justify the linear behavior). Claudio 

09082015, 10:41 AM
Post: #20




RE: NQueens on 50g (RPL language)
(11182014 09:45 PM)xerxes Wrote: It would be very nice to have the results of the 48SX/GX with your userRPL and sysRPL test codes in the list, but I haven't the possibility to make these tests. Hi, below are the results using the Werner's Algorithm on my HP48(s) : UserRPL (125 bytes ; #5987h): HP48S ROM revJ: 76.97s HP48SX ROM revE: 78.94s HP48GX ROM revM (@3.6Mhz): 54.46s SysRPL (105 bytes ; #D802h): HP48S ROM revJ: 15.06s HP48SX ROM revE: 15.98s HP48GX ROM revM (@3.6Mhz): 10.19s 

« Next Oldest  Next Newest »

User(s) browsing this thread: 2 Guest(s)