WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
|
10-12-2014, 11:38 PM
(This post was last modified: 10-15-2014 11:09 PM by iceman.)
Post: #1
|
|||
|
|||
WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
First of all, thanks to forum member Patrice for sharing a work in progress (His source code for the HP-16C N-Queen's Benchmark).
I have ported this code to the bit commands of the WP-34S which has resulted in doubling the speed of the N-Queens solution over the traditional WP-34S N-Queens benchmark. Set Wordsize to 32. Here is the code, starts at memory location 32. I also attach a .dat file so that this code can be loaded into the emulator: Code: WP-34S |
|||
10-14-2014, 05:17 AM
Post: #2
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Hi Tim,
Thank you for your request about this program, it was fun to comeback on it. Patrice “Everything should be made as simple as possible, but no simpler.” Albert Einstein |
|||
10-15-2014, 06:39 PM
(This post was last modified: 10-15-2014 11:07 PM by iceman.)
Post: #3
|
|||
|
|||
IMPROVED 30% over previous post
Patrice Torchet has optimized the WP-34S benchmark posted above.
Performance statistics using WP34S Firmware version 3.3 on HP-30B Hardware for the "Traditional" WP-34S NQueens Benchmark (See Xerxes Article) WP-34S "Traditional" mode: [f] H.d (DECM):------------------------>68 Ticks WP-34S "Traditional" mode: BASE 10 : -----------------------------> 61 Ticks Performance statistics using WP34S Firmware version 3.3 on HP-30B Hardware for the code below are as follows: Torchet WP-34S Optimized:-------------------------------------------->21 Ticks There are approximately 10 Ticks per second Here is the optimized code: Code: 140 [f] LBL C Merci Patrice, it is a work of beauty! |
|||
10-16-2014, 01:01 PM
Post: #4
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Just for completeness.
The Queens positions are in registers 2, 4, 6, 8 .... Patrice “Everything should be made as simple as possible, but no simpler.” Albert Einstein |
|||
10-16-2014, 04:30 PM
Post: #5
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-15-2014 06:39 PM)iceman Wrote: Performance statistics using WP34S Firmware version 3.3 on HP-30B Hardware for the "Traditional" WP-34S NQueens Benchmark (See Xerxes Article) Thank you for sharing your test results. I've updated the list. Calculator Benchmark |
|||
10-16-2014, 06:59 PM
Post: #6
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Is it still the same algorithm?
I thought the idea was to have the same algorithm ported in all platforms, so they can be directly compared. I'm sure other calculators will also get a boost if you change the algorithm. I didn't look at the new code in detail, but looks to me like a completely different algorithm. Is this sufficiently close to the original algorithm so it can be included in the original list? Otherwise will introduce noise into an otherwise very neat collection of results. |
|||
10-16-2014, 07:05 PM
Post: #7
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Greetings Xerxes!
Thank-you so much for monitoring this discussion and incorporating the findings into your excellent benchmark article. May I respectfully recommend that the new BITMAP based N-Queens benchmark for WP-34S also be included in article 700, along with the most remarkable test result of 2.1 seconds on F/W version 3.3? As you can determine from the attached photo, this benchmark developed by forum member Patrice correctly yields the Queen's positions in registers 4, 6, 8, 10, 12, 14, 16 and 18. This benchmark uses the WP-34S Binary operations commands (a-la HP-16C) and so represents a second legitimate example of the WP-34S N-Queens benchmark against the HP-16C. The original method is still relevant of course when comparing calculators that do not support bit functions, so I recommend including both in your article. Here is the code for Patrice Torchet's BITMAP version: Code: WP-34S BITMAP Based N-Queens (Patrice Torchet) P.S. There is also an HP-16C version of this benchmark if you are interested, I am sure that Patrice would be willing to share it. |
|||
10-16-2014, 08:53 PM
Post: #8
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-16-2014 06:59 PM)Claudio L. Wrote: Is it still the same algorithm?It is the same in the sense that it apply the same rules and find a solution. It is close enough to end by displaying the loop counter with the expected value. Patrice “Everything should be made as simple as possible, but no simpler.” Albert Einstein |
|||
10-17-2014, 08:02 PM
Post: #9
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-16-2014 06:59 PM)Claudio L. Wrote: Is it still the same algorithm?There is a problem with the benchmark, the way it is done. It is biaised in more than one way from the beginning. 1) Algorithm chosen: is it not the most powerful. It prevent some calculators from showing their full potential. Ex: hp16C wp34s 2)THe reference source code (BASIC) is poorly written. It use a double array access in the inner loop when a single array access is enough. - Problem, C code is allowed and C compilers are smart enough to detect that one of the array access is a constant inside the loop and optimize it. Is it fair ? - Problem, C/RegVars code is directly using a single array access, it is an advantage. Is it fair ? - Problem, calculators are features poor. hp15c, hp16C and wp34S and many others show poor performance because they stick to the BASIC code and spend their time swaping array pointers in the innier loop. Is it fair ? You have seen that the wp34s can be 3 times faster, that is huge. Is it fair not to show it in the benchmark? Every program using the double array access should be rewriten to single array access, just to be fair with C/RegVars code. Patrice “Everything should be made as simple as possible, but no simpler.” Albert Einstein |
|||
10-18-2014, 01:36 AM
Post: #10
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Accurate benchmarking seems to be a very hard task. I choosed the n-queens problem with a simple algorithm, to be able
to compare the basic programming elements of the languages of as many calculators as possible, even if the programming capabilities are limited. Of course there are much better algorithms to solve this puzzle, but these are not suited for many calculators. I think that in most cases the translation of the original BASIC code is accurate enough, because the speed difference between one array access or two is very small, but as Patrice mentioned, I also see a disadvantage for some RPN versions, because there is only one register for indirect addressing unlike the WP-34S, that makes the inner loop more complicated than neccessary. So a RPN friendly translation of the code would be more fair. If I remember correctly Patrice already tried this with about 30% speed advantage compared to the actual RPN versions. IMHO using this version would be really fair, because it's RPN friendly but also close to the original algorithm. Too bad that we haven't used this version from the beginning. I like the BITMAP solution a lot, but how can I use it in the benchmark list without putting many other calculators at a disadvantage? Please take into consideration that the BITMAP version would be also much faster on some other non HP calculators and the comparability of the results will suffer, due to the use of different algorithms. The BITMAP solution has earned an own article. Calculator Benchmark |
|||
10-19-2014, 05:35 PM
Post: #11
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Hi XerXes,
I didn't want to bother you with all this, as I have no time to conduct a benchmark by myself. The simple knowledge of what was possible was enough for me. At best I can give a hand to help optimize a program. For thoses interested in seeing how bad I speak english , here is a présentation I gave in HHC 2011 https://www.youtube.com/watch?v=g0sQA0H6...SU30gorNFw about what should nit be done and about bitmap. and here is the slide I used HHC 2011 HP15C Benchmark.ppt Patrice “Everything should be made as simple as possible, but no simpler.” Albert Einstein |
|||
10-20-2014, 01:07 PM
Post: #12
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-17-2014 08:02 PM)patrice Wrote: There is a problem with the benchmark, the way it is done. It is biaised in more than one way from the beginning.I think the array access issue is minor, and doesn't fundamentally change the algorithm (you still have to explore the same number of nodes and do exactly the same tasks. And if a particular language has the ability to do optimization and use register variables with a simple statement or compiler switch, I personally think it's fair to include it as long as it is marked as such (Xerxes did a great job at that). Same thing with the CPU clock, turbo modes, etc. Many implementations are optimized for the particular language/calculator, but they don't change the algorithm. They solve the same problem through the same path to a solution. (10-17-2014 08:02 PM)patrice Wrote: - Problem, calculators are features poor. hp15c, hp16C and wp34S and many others show poor performance because they stick to the BASIC code and spend their time swaping array pointers in the innier loop. Is it fair ? Perhaps they should all use it, but that will have a relatively small impact, and requires extra storage to "cache" that value. I'd say if a calculator has extra registers and can do it, why not. Having extra registers available is a feature of the calculator and is representative of what the calculator can do. Bit manipulations, on the other hand, is a very specialized subset of features that not all calculators have and is not representative of what the calculator can do at solving problems. I'd think a lot of problems will require arrays and indexes, but not so many can be solved using bit manipulations, so it gives a better idea of what to expect from a particular calculator/language combination in the real world. For fun, I coded a non-optimized bitmap solution (no idea if it's similar to yours, I made it up from scratch) in userRPL on the 50g, and it runs in 75 seconds instead of the 90 from the list. It's split in several routines and does heavy use of subroutine calls done by name (ugly and slow), so it's not exactly the fastest it could be, but enough to prove my point: many other calculators could benefit greatly from a bitmap approach, even the assembly and C versions could get a boost from that. Including your solution in the list would mean we have to throw away all other results and run a bitmap solution on all of them. It's not really about being fair/unfair, it's a question of it being comparable or not to the others, and of it being representative of other problems. I think you should open a new thread and call it the "N-Queens Calculator Competition", (emphasis on "competition" versus "benchmark") where we don't care about comparison or real-world usage, just get the job done with the fastest algorithm optimized the best you can. I think a lot of people will start coding their own algorithms and optimizing them for different machines. It could be fun. Claudio |
|||
10-20-2014, 10:05 PM
Post: #13
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-18-2014 01:36 AM)xerxes Wrote: ... I also see a disadvantage for some RPN versions, And don't forget that the userRPL version for the HP28/48/49/50 is using lists, which is known to be very slow. It could very well leave all 8 values on the stack and use PICK/UNPICK instead of GET/PUT, taking advantage of the unlimited stack that is the most distinctive feature of this calculator series. But people trying to solve a problem would probably use lists, so it is representative of the normal use. For the same reason people wouldn't normally use a bitmap solution, or be concerned with the number of indirect addressing done in a loop (making the BASIC version and its port for all targets also representative of normal use). On the other hand, people using C would instantly get the C/regvars optimization without knowing, so it is representative as well. I think your article is the most serious attempt ever made to have a benchmark for calculators, and wouldn't call it flawed in any way. For a cross-language, cross-platform benchmark with so many different hardware targets, it's nearly impossible to track every optimization on every target. The fact that in 2014 people are still talking about it means overall it's a great benchmark. I used it myself to compare newRPL speed to traditional RPL, so it's actually a development tool too. |
|||
10-21-2014, 09:38 PM
Post: #14
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-20-2014 10:05 PM)Claudio L. Wrote: And don't forget that the userRPL version for the HP28/48/49/50 is using lists, which is known to be very slow. It could very well leave all 8 values on the stack and use PICK/UNPICK instead of GET/PUT, taking advantage of the unlimited stack that is the most distinctive feature of this calculator series. I didn't noticed that before, thank you for your advice and kind words. Calculator Benchmark |
|||
10-21-2014, 09:49 PM
Post: #15
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-18-2014 01:36 AM)xerxes Wrote: Accurate benchmarking seems to be a very hard task.Indeed , That's why I didn't try to do a benchmark by myself. (10-18-2014 01:36 AM)xerxes Wrote: I like the BITMAP solution a lot, but how can I use it in the benchmark list without putting many other calculators atIt's like doing a benchmark/contest with vehicles from point A to point B, most of the cars will take the road. And when you test a 4WD, why take the bumpy track shortcut should not be allowed? And finally when you put you hands on an ugly amphibious vehicle, why shouldn't it just cross the river ? It is slow and ugly but it can do things that a Ferrari can't. (10-18-2014 01:36 AM)xerxes Wrote: The BITMAP solution has earned an own article.Is it on line already ? URL ? I propose a set of rules where I take into account that calculators can be poor in resources. - make a program that build a solution from the rules. You are free to choose any algorithm as long as it solve the problem. - the result is the solution found as a number of 8 figures. it is valid as long as it is a solution. - Ideally, after displaying a solution, a simple branch in the program will lead to a second solution - do not count the number of steps - the goal is to be as fast as possible. Patrice “Everything should be made as simple as possible, but no simpler.” Albert Einstein |
|||
10-21-2014, 10:51 PM
Post: #16
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Hi Claudio,
(10-20-2014 01:07 PM)Claudio L. Wrote: I think the array access issue is minor, and doesn't fundamentally change the algorithm (you still have to explore the same number of nodes and do exactly the same tasks.Take the HP-15C: The single indirection swap in the inner loop imply a mostly 33% overhead. For you, I don't know, but for me, it is NOT MINOR. (10-20-2014 01:07 PM)Claudio L. Wrote: I think you should open a new thread and call it the "N-Queens Calculator Competition", (emphasis on "competition" versus "benchmark") where we don't care about comparison or real-world usage, just get the job done with the fastest algorithm optimized the best you can.It just append that a benchmark IS a competition about being the fastest Here is an optimized code for the 15C: the algorithm is exactly the same as in the benchmark. It simply use available features more efficiently. Code: 42,21,11 LBL A Patrice “Everything should be made as simple as possible, but no simpler.” Albert Einstein |
|||
10-22-2014, 01:10 PM
Post: #17
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-21-2014 10:51 PM)patrice Wrote: Take the HP-15C: The single indirection swap in the inner loop imply a mostly 33% overhead. For you, I don't know, but for me, it is NOT MINOR.30% seems a lot for anybody. When I watched your presentation, I understood that 30% was also including all your other optimizations. But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means. But on other models I don't think that it will be a big impact. If I remember correctly for the C/Regvars, the change in speed was less than 10%, and that included disabling all compiler optimizations, not just that indirection. (10-21-2014 10:51 PM)patrice Wrote: It just append that a benchmark IS a competition about being the fastestWhat I meant is the goals are different. The objective of the benchmark is not to compete, but to measure and compare. People try to compete (naturally), but that's not what the benchmark is about. On the benchmark you try to eliminate the user factor (to compare machines). On the competition the user factor is all that matters (it's who does it faster). For newRPL, for example, I'm not trying to "beat" any particular measures or compete against anything or anybody, yet I'm using this same benchmark just to see where it falls in the "market". (10-21-2014 10:51 PM)patrice Wrote: Here is an optimized code for the 15C: the algorithm is exactly the same as in the benchmark. It simply use available features more efficiently.And that one should go on the list, as long as it represents what a user would do to solve that problem with the machine and as long as it's still the same algorithm. If your optimizations are too obscure (like the bitmap method) they are not representative of normal use, but using "decrease and skip if not zero" style instructions is OK in my opinion (it's common sense). Claudio |
|||
10-22-2014, 09:15 PM
Post: #18
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-22-2014 01:10 PM)Claudio L. Wrote:The inner loop is 20 steps, when optimized, it is only 15 steps.(10-21-2014 10:51 PM)patrice Wrote: Take the HP-15C: The single indirection swap in the inner loop imply a mostly 33% overhead. For you, I don't know, but for me, it is NOT MINOR.30% seems a lot for anybody. When I watched your presentation, I understood that 30% was also including all your other optimizations. But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means. 15/20= 0.75= 1 -25% : it saves 25% in the inner loop 20/15=1.33= 1 +33% : this is 33% overhead for the inner loop (10-22-2014 01:10 PM)Claudio L. Wrote: But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means. But on other models I don't think that it will be a big impact. If I remember correctly for the C/Regvars, the change in speed was less than 10%, and that included disabling all compiler optimizations, not just that indirection.All RPN models are using the same kind of technology and the impact will be essentially the same. What are you comparing the C/Regvars with ? (10-22-2014 01:10 PM)Claudio L. Wrote:What can be the meaning of a comparison when one of the compared thing can be 40% faster? I think the only thing it show is that both can solve the problem.(10-21-2014 10:51 PM)patrice Wrote: It just append that a benchmark IS a competition about being the fastestWhat I meant is the goals are different. The objective of the benchmark is not to compete, but to measure and compare. People try to compete (naturally), but that's not what the benchmark is about. I think that it is by showing people efficient programing technics that you will teach to use them If you have the knowledge that lists are slow on RPL, you are free to find another way to solve the problem. Patrice “Everything should be made as simple as possible, but no simpler.” Albert Einstein |
|||
10-22-2014, 10:36 PM
Post: #19
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-22-2014 09:15 PM)patrice Wrote: All RPN models are using the same kind of technology and the impact will be essentially the same. What are you comparing the C/Regvars with ?With the "structured" C version that's on the benchmark, with no compiler optimizations. (10-22-2014 09:15 PM)patrice Wrote: What can be the meaning of a comparison when one of the compared thing can be 40% faster? I think the only thing it show is that both can solve the problem. I think it still does offer a decent comparison, but as I said previously, since your version uses the same algorithm it should be on the list. I wouldn't put the bitmap solution on the list, but your optimizations seem reasonable and are applicable to other problems. (10-22-2014 09:15 PM)patrice Wrote: I think that it is by showing people efficient programing technics that you will teach to use themDifference is that I don't think it's a competition, so RPL with lists is perfectly fine with me, and I think it represents how people use RPL. If somebody else wants to optimize it to use the stack, that's fine too. My observation was not criticism, it went to show that RPN machines are not being unfairly treated, as other models are also in the same position. I'm sure a careful review will reveal that many other platforms/languages could also get a fair amount of optimization, but that doesn't necessarily invalidate the benchmark. And please notice I have nothing to do with Xerxes, or this benchmark. I never submitted an entry (although somebody else submitted results using my code for the C/Regvars and ARM assembly for the 50g, so you could say I'm the author of those two). I just think its fine the way it is. Perfect? No, but still very useful and informative. PS: On second thought, if I accept that you are right and this is a competition, would you say I "won"? Should I get at least a T shirt or something for authoring the fastest times on the list, undefeated since 2008? Just joking, of course... (L size, please) |
|||
10-23-2014, 12:29 PM
Post: #20
|
|||
|
|||
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-21-2014 10:51 PM)patrice Wrote: Here is an optimized code for the 15C: the algorithm is exactly the same as in the benchmark. It simply use available features more efficiently. Hi Patrice, IMHO this version fulfils the requirements of being fair and accurate enough in sense of comparing the same algorithm, whats the goal of the n-queens benchmark. Thank you for your effort and showing whats possible. Calculator Benchmark |
|||
« Next Oldest | Next Newest »
|
User(s) browsing this thread: 1 Guest(s)