Threaded Mode | Linear Mode

iceman · (This post was last modified: 10-15-2014 11:09 PM by iceman.)

First of all, thanks to forum member Patrice for sharing a work in progress (His source code for the HP-16C N-Queen's Benchmark).

I have ported this code to the bit commands of the WP-34S which has resulted in doubling the speed of the N-Queens solution over the traditional WP-34S N-Queens benchmark.

Set Wordsize to 32.

Here is the code, starts at memory location 32. I also attach a .dat file so that this code can be loaded into the emulator:

Code:

    WP-34S    

============    

32    [f] LBL C        

33    [f] BASE 16        

34    3        

35    STO I        

36    [h] CLx        

37    STO 00    //Count    

38    1        

39    0        

40    1        

41    0        

42    1        

43    0        

44    1        

45    STO 02    //Queen    

46    8        

47    STO 01    //Line    

48    [h] MASKR->X        

49    X~Y        //Mask width specifier got pushed onto stack above the mask

50    R Dn        //Get rid of it

51    [h] SL 01        

52    STO -> I    //Mask    

53    8        

54    [h] X~I        

55    X~Y        

56    [f] LBL 02    //Byte to be tested is in Y.  Bit to test ins in I

57    INC  00    //Count    First increment the line counter        

58    [h] BS?->I        

59    [h] GTO 05                    

60    [f] LBL 03        

61    [h] DSZ I        

62    [h] GTO 02        

63    X~Y        

64    [h] X~I                    

65    INC  01    //Line    

66    [h] DSZ I        

67    RCL->I        

68    [h] DSZ I        

69    RCL->I        

70    X~Y        

71    [h] X~I        

72    X~Y        

73    [h] GTO 03                

74    [f] LBL 05        

75    X~Y        

76    [h] X~I        

77    RCL 02            //Queen    

78    X~Y        

79    [h] ISZ I        

80    STO->I        

81    STO J        //Save the rotation count in the "TEST" register

82    R Dn                //Get rid of it, and pull the rotation candidate into X

83    [h] RL->J        //Rotate by the rotation count. TODO might be better way to do this: RL -> -> I Then could drop two lines above

84    [h] OR                

85    DEC 01             //Line.    Note: Might be better way to do this like: DSZ

86    RCL 01        

87    [f] x=0?            //Are we done?    

88    [h] GTO 04    //Yes Done: Terminate    

89    R Dn                    //No, Not done    

90    RCL 03              //Mask    

91    [h] NOT        

92    [h] AND                

93    [h] RR 09                

94    [h] RRC 08        

95    [h] RL 16        

96    [h] RLC 08            

97    [h] RR 07        

98    ENTER        

99    ENTER        

100    ENTER                

101    [h] RL 08        

102    [h] OR        

103    ENTER                    

104    [h] RL 16        

105    [h] OR        

106    RCL 03    //Mask    

107    [h] AND        

108    RCL L        

109    [h] XOR        

110    [f] X=0?        

111    [h] GTO 06        

112    [h] OR        

113    [h] ISZ I        

114    STO->I        

115    8        

116    [h] X~I        

117    X~Y        

118    [h] GTO 02                

119    [f] LBL 06                

120    8                

121    STO + 00    //Count.    Increment count by 8

122    INC 01    //Line    Increment Line count by 1

123    RCL->I        

124    [h] DSZ I        

125    RCL->I        

126    X~Y        

127    [h] X~I        

128    X~Y        

129    [h] GTO 03                

130    [f] LBL 04        

131    RCL 00    //Count    

132    [g] RTN    //Solution: 8 4 1 3 6 2 7 5 In every second register above 3

patrice · 10-14-2014, 05:17 AM

Hi Tim,

Thank you for your request about this program, it was fun to comeback on it.

iceman · (This post was last modified: 10-15-2014 11:07 PM by iceman.)

Patrice Torchet has optimized the WP-34S benchmark posted above.

Performance statistics using WP34S Firmware version 3.3 on HP-30B Hardware for the "Traditional" WP-34S NQueens Benchmark (See Xerxes Article)

WP-34S "Traditional" mode: [f] H.d (DECM):------------------------>68 Ticks
WP-34S "Traditional" mode: BASE 10 : -----------------------------> 61 Ticks

Performance statistics using WP34S Firmware version 3.3 on HP-30B Hardware for the code below are as follows:

Torchet WP-34S Optimized:-------------------------------------------->21 Ticks

There are approximately 10 Ticks per second

Here is the optimized code:

Code:

140    [f] LBL C        

141    TICKS        

142    STO K        

143    [f] BASE 16        

144    WSIZE 32        

145    [h] CLx        

146    STO C    //loop counter    

147    STO I    //Stack pointer    

148    1        

149    0        

150    1        

151    0        

152    1        

153    0        

154    1        

155    STO A    //0x1010101    

156    8        

157    STO B    //remaining Q    

158    [h] MASKR 08        

159    [h] SL 01        

160    STO 00    //0x1FE    

161    [f] LBL 02    //x=bitmap y=Q position    

162    INC C        

163    [h] BS?->Y    //test Q position    

164    [h] GTO 05    //Q OK, next line    

165    [f] LBL 03    //x=bitmap y=Q position    

166    [h] DSZ Y    //SHIFT QUEEN    

167    [h] GTO 02    //RETRY    

168    INC B    //ootherwise backtrack    

169    RCL->I    //pop a Q    

170    DEC I    //pop a Q    

171    RCL->I    //pop a Q    

172    DEC I    //pop a Q    

173    [h] GTO 03        

174    [f] LBL 05    //x=bitmap y=Q position    

175    INC I        

176    STO->I    //Save bitmap    

177    X<>Y        

178    INC I        

179    STO->I    //save Q    

180    X<>Y        

181    DSZ B    //Finish if 0    

182    [h] GTO 01        

183    [h] GTO 04        

184    LBL 01        

185    RCL A    //get Q mask    

186    [h] RL->Z    //shift    

187    [h] OR    //merge with bitmap    

188    RCL 00        

189    [h] NOT        

190    [h] AND    //clear combination    

191    [h] RR 09        

192    [h] RRC 08    //shift diagonal    

193    RL 16        

194    [h] RLC 08    //shift diagonal    

195    [h] RR 07        

196    ENTER        

197    ENTER        

198    [h] RL 08        

199    [h] OR    //ORing L V R    

200    ENTER        

201    [h] RL 16        

202    [h] OR    //ORing L V R    

203    RCL 00        

204    [h] AND    //mask Combination    

205    RCL L        

206    [h] XOR    //negate Combination    

207    [f] X=0?    //if 0    

208    [h] GTO 06    //deadend    

209    [h] OR    //merge Combination to bitmap    

210    8        

211    X<>Y        

212    [h] GTO 02        

213    [f] LBL 06    //Shortcut    

214    8        

215    STO+C        

216    INC B        

217    RCL->I        

218    DEC I        

219    RCL->I        

220    DEC I        

221    [h] GTO 03        

222    [f] LBL 04        

223    RCL C        

224    BASE 10        

225    TICKS        

226    RCL K        

227    -        

228    STO K        

229    R Dn        

230    [g] RTN        

bitmap principle            

the bitmap contains 4 parts of 8 bits            

L    Left diagonal locks        

V    Vertical locks        

R    Right diagonal locks        

C    Combination    = L or V or R    This is the part tested (BS?)

LLLLLLLVVVVVVVVRRRRRRRRCCCCCCCCL            

register A            

00000001000000010000000100000001            

bitmap is rotated to have combination on bits 1 to 8            

register A is rotated too to match bitmap

Merci Patrice, it is a work of beauty!

patrice · 10-16-2014, 01:01 PM

Just for completeness.
The Queens positions are in registers 2, 4, 6, 8 ....

xerxes · 10-16-2014, 04:30 PM

(10-15-2014 06:39 PM)iceman Wrote: Performance statistics using WP34S Firmware version 3.3 on HP-30B Hardware for the "Traditional" WP-34S NQueens Benchmark (See Xerxes Article)

WP-34S "Traditional" mode: [f] H.d (DECM):------------------------>68 Ticks
WP-34S "Traditional" mode: BASE 10 : -----------------------------> 61 Ticks

Thank you for sharing your test results. I've updated the list.

Claudio L. · 10-16-2014, 06:59 PM

Is it still the same algorithm?
I thought the idea was to have the same algorithm ported in all platforms, so they can be directly compared.
I'm sure other calculators will also get a boost if you change the algorithm. I didn't look at the new code in detail, but looks to me like a completely different algorithm.

Is this sufficiently close to the original algorithm so it can be included in the original list? Otherwise will introduce noise into an otherwise very neat collection of results.

iceman · 10-16-2014, 07:05 PM

Greetings Xerxes!

Thank-you so much for monitoring this discussion and incorporating the findings into your excellent benchmark article.

May I respectfully recommend that the new BITMAP based N-Queens benchmark for WP-34S also be included in article 700, along with the most remarkable test result of 2.1 seconds on F/W version 3.3?

As you can determine from the attached photo, this benchmark developed by forum member Patrice correctly yields the Queen's positions in registers 4, 6, 8, 10, 12, 14, 16 and 18. This benchmark uses the WP-34S Binary operations commands (a-la HP-16C) and so represents a second legitimate example of the WP-34S N-Queens benchmark against the HP-16C. The original method is still relevant of course when comparing calculators that do not support bit functions, so I recommend including both in your article. Smile

Here is the code for Patrice Torchet's BITMAP version:

Code:

WP-34S BITMAP Based N-Queens  (Patrice Torchet)

140    [f] LBL C        

141    TICKS        

142    STO K        

143    [f] BASE 16        

144    WSIZE 32        

145    [h] CLx        

146    STO C    //loop counter    

147    STO I    //Stack pointer    

148    1        

149    0        

150    1        

151    0        

152    1        

153    0        

154    1        

155    STO A    //0x1010101    

156    8        

157    STO B    //remaining Q    

158    [h] MASKR 08        

159    [h] SL 01        

160    STO 00    //0x1FE    

161    [f] LBL 02    //x=bitmap y=Q position    

162    INC C        

163    [h] BS?->Y    //test Q position    

164    [h] GTO 05    //Q OK, next line    

165    [f] LBL 03    //x=bitmap y=Q position    

166    [h] DSZ Y    //SHIFT QUEEN    

167    [h] GTO 02    //RETRY    

168    INC B    //ootherwise backtrack    

169    RCL->I    //pop a Q    

170    DEC I    //pop a Q    

171    RCL->I    //pop a Q    

172    DEC I    //pop a Q    

173    [h] GTO 03        

174    [f] LBL 05    //x=bitmap y=Q position    

175    INC I        

176    STO->I    //Save bitmap    

177    X<>Y        

178    INC I        

179    STO->I    //save Q    

180    X<>Y        

181    DSZ B    //Finish if 0    

182    [h] GTO 01        

183    [h] GTO 04        

184    LBL 01        

185    RCL A    //get Q mask    

186    [h] RL->Z    //shift    

187    [h] OR    //merge with bitmap    

188    RCL 00        

189    [h] NOT        

190    [h] AND    //clear combination    

191    [h] RR 09        

192    [h] RRC 08    //shift diagonal    

193    RL 16        

194    [h] RLC 08    //shift diagonal    

195    [h] RR 07        

196    ENTER        

197    ENTER        

198    [h] RL 08        

199    [h] OR    //ORing L V R    

200    ENTER        

201    [h] RL 16        

202    [h] OR    //ORing L V R    

203    RCL 00        

204    [h] AND    //mask Combination    

205    RCL L        

206    [h] XOR    //negate Combination    

207    [f] X=0?    //if 0    

208    [h] GTO 06    //deadend    

209    [h] OR    //merge Combination to bitmap    

210    8        

211    X<>Y        

212    [h] GTO 02        

213    [f] LBL 06    //Shortcut    

214    8        

215    STO+C        

216    INC B        

217    RCL->I        

218    DEC I        

219    RCL->I        

220    DEC I        

221    [h] GTO 03        

222    [f] LBL 04        

223    RCL C        

224    BASE 10        

225    TICKS        

226    RCL K        

227    -        

228    STO K        

229    R Dn        

230    [g] RTN        

bitmap principle            

the bitmap contains 4 parts of 8 bits            

L    Left diagonal locks        

V    Vertical locks        

R    Right diagonal locks        

C    Combination    = L or V or R    This is the part tested (BS?)

LLLLLLLVVVVVVVVRRRRRRRRCCCCCCCCL            

register A            

00000001000000010000000100000001            

bitmap is rotated to have combination on bits 1 to 8            

register A is rotated too to match bitmap

Queen's positions in registers 4, 6, 8, 10, 12, 14, 16 and 18

P.S. There is also an HP-16C version of this benchmark if you are interested, I am sure that Patrice would be willing to share it.

patrice · 10-16-2014, 08:53 PM

(10-16-2014 06:59 PM)Claudio L. Wrote: Is it still the same algorithm?

It is the same in the sense that it apply the same rules and find a solution.
It is close enough to end by displaying the loop counter with the expected value.

patrice · 10-17-2014, 08:02 PM

(10-16-2014 06:59 PM)Claudio L. Wrote: Is it still the same algorithm?
I thought the idea was to have the same algorithm ported in all platforms, so they can be directly compared.
I'm sure other calculators will also get a boost if you change the algorithm. I didn't look at the new code in detail, but looks to me like a completely different algorithm.

Is this sufficiently close to the original algorithm so it can be included in the original list? Otherwise will introduce noise into an otherwise very neat collection of results.

There is a problem with the benchmark, the way it is done. It is biaised in more than one way from the beginning.

1) Algorithm chosen: is it not the most powerful. It prevent some calculators from showing their full potential. Ex: hp16C wp34s

2)THe reference source code (BASIC) is poorly written. It use a double array access in the inner loop when a single array access is enough.
- Problem, C code is allowed and C compilers are smart enough to detect that one of the array access is a constant inside the loop and optimize it. Is it fair ?
- Problem, C/RegVars code is directly using a single array access, it is an advantage. Is it fair ?
- Problem, calculators are features poor. hp15c, hp16C and wp34S and many others show poor performance because they stick to the BASIC code and spend their time swaping array pointers in the innier loop. Is it fair ?

You have seen that the wp34s can be 3 times faster, that is huge. Is it fair not to show it in the benchmark?

Every program using the double array access should be rewriten to single array access, just to be fair with C/RegVars code.

xerxes · 10-18-2014, 01:36 AM

Accurate benchmarking seems to be a very hard task. I choosed the n-queens problem with a simple algorithm, to be able
to compare the basic programming elements of the languages of as many calculators as possible, even if the programming
capabilities are limited. Of course there are much better algorithms to solve this puzzle, but these are not suited
for many calculators.

I think that in most cases the translation of the original BASIC code is accurate enough, because the speed difference
between one array access or two is very small, but as Patrice mentioned, I also see a disadvantage for some RPN versions,
because there is only one register for indirect addressing unlike the WP-34S, that makes the inner loop more complicated
than neccessary. So a RPN friendly translation of the code would be more fair.

If I remember correctly Patrice already tried this with about 30% speed advantage compared to the actual RPN versions.
IMHO using this version would be really fair, because it's RPN friendly but also close to the original algorithm. Too
bad that we haven't used this version from the beginning.

I like the BITMAP solution a lot, but how can I use it in the benchmark list without putting many other calculators at
a disadvantage? Please take into consideration that the BITMAP version would be also much faster on some other non HP
calculators and the comparability of the results will suffer, due to the use of different algorithms.

The BITMAP solution has earned an own article.

patrice · 10-19-2014, 05:35 PM

Hi XerXes,
I didn't want to bother you with all this, as I have no time to conduct a benchmark by myself. The simple knowledge of what was possible was enough for me.

At best I can give a hand to help optimize a program.

For thoses interested in seeing how bad I speak english Smile

, here is a présentation I gave in HHC 2011 https://www.youtube.com/watch?v=g0sQA0H6...SU30gorNFw about what should nit be done and about bitmap.
and here is the slide I used HHC 2011 HP15C Benchmark.ppt

Claudio L. · 10-20-2014, 01:07 PM

(10-17-2014 08:02 PM)patrice Wrote: There is a problem with the benchmark, the way it is done. It is biaised in more than one way from the beginning.

1) Algorithm chosen: is it not the most powerful. It prevent some calculators from showing their full potential. Ex: hp16C wp34s

2)THe reference source code (BASIC) is poorly written. It use a double array access in the inner loop when a single array access is enough.
- Problem, C code is allowed and C compilers are smart enough to detect that one of the array access is a constant inside the loop and optimize it. Is it fair ?
- Problem, C/RegVars code is directly using a single array access, it is an advantage. Is it fair ?

I think the array access issue is minor, and doesn't fundamentally change the algorithm (you still have to explore the same number of nodes and do exactly the same tasks.
And if a particular language has the ability to do optimization and use register variables with a simple statement or compiler switch, I personally think it's fair to include it as long as it is marked as such (Xerxes did a great job at that). Same thing with the CPU clock, turbo modes, etc. Many implementations are optimized for the particular language/calculator, but they don't change the algorithm. They solve the same problem through the same path to a solution.

(10-17-2014 08:02 PM)patrice Wrote: - Problem, calculators are features poor. hp15c, hp16C and wp34S and many others show poor performance because they stick to the BASIC code and spend their time swaping array pointers in the innier loop. Is it fair ?

You have seen that the wp34s can be 3 times faster, that is huge. Is it fair not to show it in the benchmark?

Every program using the double array access should be rewriten to single array access, just to be fair with C/RegVars code.

Perhaps they should all use it, but that will have a relatively small impact, and requires extra storage to "cache" that value. I'd say if a calculator has extra registers and can do it, why not. Having extra registers available is a feature of the calculator and is representative of what the calculator can do.
Bit manipulations, on the other hand, is a very specialized subset of features that not all calculators have and is not representative of what the calculator can do at solving problems.
I'd think a lot of problems will require arrays and indexes, but not so many can be solved using bit manipulations, so it gives a better idea of what to expect from a particular calculator/language combination in the real world.
For fun, I coded a non-optimized bitmap solution (no idea if it's similar to yours, I made it up from scratch) in userRPL on the 50g, and it runs in 75 seconds instead of the 90 from the list. It's split in several routines and does heavy use of subroutine calls done by name (ugly and slow), so it's not exactly the fastest it could be, but enough to prove my point: many other calculators could benefit greatly from a bitmap approach, even the assembly and C versions could get a boost from that. Including your solution in the list would mean we have to throw away all other results and run a bitmap solution on all of them.
It's not really about being fair/unfair, it's a question of it being comparable or not to the others, and of it being representative of other problems.
I think you should open a new thread and call it the "N-Queens Calculator Competition", (emphasis on "competition" versus "benchmark") where we don't care about comparison or real-world usage, just get the job done with the fastest algorithm optimized the best you can.
I think a lot of people will start coding their own algorithms and optimizing them for different machines. It could be fun.

Claudio

Claudio L. · 10-20-2014, 10:05 PM

(10-18-2014 01:36 AM)xerxes Wrote: ... I also see a disadvantage for some RPN versions,
because there is only one register for indirect addressing unlike the WP-34S, that makes the inner loop more complicated
than neccessary. So a RPN friendly translation of the code would be more fair.

And don't forget that the userRPL version for the HP28/48/49/50 is using lists, which is known to be very slow. It could very well leave all 8 values on the stack and use PICK/UNPICK instead of GET/PUT, taking advantage of the unlimited stack that is the most distinctive feature of this calculator series.
But people trying to solve a problem would probably use lists, so it is representative of the normal use. For the same reason people wouldn't normally use a bitmap solution, or be concerned with the number of indirect addressing done in a loop (making the BASIC version and its port for all targets also representative of normal use).
On the other hand, people using C would instantly get the C/regvars optimization without knowing, so it is representative as well.
I think your article is the most serious attempt ever made to have a benchmark for calculators, and wouldn't call it flawed in any way. For a cross-language, cross-platform benchmark with so many different hardware targets, it's nearly impossible to track every optimization on every target.
The fact that in 2014 people are still talking about it means overall it's a great benchmark.
I used it myself to compare newRPL speed to traditional RPL, so it's actually a development tool too.

xerxes · 10-21-2014, 09:38 PM

(10-20-2014 10:05 PM)Claudio L. Wrote: And don't forget that the userRPL version for the HP28/48/49/50 is using lists, which is known to be very slow. It could very well leave all 8 values on the stack and use PICK/UNPICK instead of GET/PUT, taking advantage of the unlimited stack that is the most distinctive feature of this calculator series.

I didn't noticed that before, thank you for your advice and kind words.

patrice · 10-21-2014, 09:49 PM

(10-18-2014 01:36 AM)xerxes Wrote: Accurate benchmarking seems to be a very hard task.

Indeed

, That's why I didn't try to do a benchmark by myself.

(10-18-2014 01:36 AM)xerxes Wrote: I like the BITMAP solution a lot, but how can I use it in the benchmark list without putting many other calculators at
a disadvantage? Please take into consideration that the BITMAP version would be also much faster on some other non HP
calculators and the comparability of the results will suffer, due to the use of different algorithms.

It's like doing a benchmark/contest with vehicles from point A to point B, most of the cars will take the road.
And when you test a 4WD, why take the bumpy track shortcut should not be allowed?
And finally when you put you hands on an ugly amphibious vehicle, why shouldn't it just cross the river ?
It is slow and ugly but it can do things that a Ferrari can't.

(10-18-2014 01:36 AM)xerxes Wrote: The BITMAP solution has earned an own article.

Is it on line already ? URL ?

I propose a set of rules where I take into account that calculators can be poor in resources.

- make a program that build a solution from the rules. You are free to choose any algorithm as long as it solve the problem.
- the result is the solution found as a number of 8 figures. it is valid as long as it is a solution.
- Ideally, after displaying a solution, a simple branch in the program will lead to a second solution
- do not count the number of steps
- the goal is to be as fast as possible.

patrice · 10-21-2014, 10:51 PM

Hi Claudio,

(10-20-2014 01:07 PM)Claudio L. Wrote: I think the array access issue is minor, and doesn't fundamentally change the algorithm (you still have to explore the same number of nodes and do exactly the same tasks.

Take the HP-15C: The single indirection swap in the inner loop imply a mostly 33% overhead. For you, I don't know, but for me, it is NOT MINOR.

(10-20-2014 01:07 PM)Claudio L. Wrote: I think you should open a new thread and call it the "N-Queens Calculator Competition", (emphasis on "competition" versus "benchmark") where we don't care about comparison or real-world usage, just get the job done with the fastest algorithm optimized the best you can.

It just append that a benchmark IS a competition about being the fastest

Here is an optimized code for the 15C: the algorithm is exactly the same as in the benchmark. It simply use available features more efficiently.

Code:

42,21,11    LBL A

   42 34    CLEAR REG

       8    8

   44 .0    STO .0        

   44  1    STO 1        

42, 6,.1    ISG .1        

   43  7    DEG

   44  8    STO 8

42, 6, 0    ISG 0        

42,21, 1    LBL 1        

   45  0    RCL 0

   44 25    STO I

           STO 9

           ISG 9

           DEG

42, 6,.1    ISG .1        

42,21, 2    LBL 2        

   45  8    RCL 8

45,30,24    RCL - (i)

   43 20    x=0?

   22  3    GTO 3        

   43 16    ABS

   45  9    RCL 9

45,30,25    RCL - I

43,30, 5    TEST 5        

   22  3    GTO 3        

42, 5,25    DSE I        

   22  2    GTO 2

42, 6, 0    ISG 0        

   43  7    DEG

   45  0    RCL 0        

   44 25    STO I

   45 .0    RCL .0

43,30, 5    TEST 5        

   22  4    GTO 4        

42, 4,24    X<> 8        

        STO (I)

   22  0    GTO 1

42,21, 3    LBL 3        

42, 5, 8    DSE 8        

   22  1    GTO 1

   45  0    RCL 0

   44 25    STO I

   45 24    RCL (I)

   44  8    STO 8

42, 5, 0    DSE 0        

   22  3    GTO 3

42, 5, 1    DSE 1        

42, 6, 0    ISG 0

   43  7    DEG

42, 6,.1    ISG .1        

   43  7    DEG

   22  0    GTO 1

42,21, 4    LBL 4        

   45 .0    RCL .0

   44 25    STO I

   43 35    CLX

42,21, 6    LBL 6        

       1    1

       0    0

      20    *

45,40,24    RCL + (I)

42, 5,25    DSE I

   22  6    GTO 6

   45 .1    RCL .1

   43 32    RTN

registres

0    line poiunter

1-8    queens

.0    Size

.1    counter

it saves about 35% of runtime. It don't think it is minor.

Claudio L. · 10-22-2014, 01:10 PM

(10-21-2014 10:51 PM)patrice Wrote: Take the HP-15C: The single indirection swap in the inner loop imply a mostly 33% overhead. For you, I don't know, but for me, it is NOT MINOR.

30% seems a lot for anybody. When I watched your presentation, I understood that 30% was also including all your other optimizations. But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means. But on other models I don't think that it will be a big impact. If I remember correctly for the C/Regvars, the change in speed was less than 10%, and that included disabling all compiler optimizations, not just that indirection.

(10-21-2014 10:51 PM)patrice Wrote: It just append that a benchmark IS a competition about being the fastest

What I meant is the goals are different. The objective of the benchmark is not to compete, but to measure and compare. People try to compete (naturally), but that's not what the benchmark is about.
On the benchmark you try to eliminate the user factor (to compare machines). On the competition the user factor is all that matters (it's who does it faster).
For newRPL, for example, I'm not trying to "beat" any particular measures or compete against anything or anybody, yet I'm using this same benchmark just to see where it falls in the "market".

(10-21-2014 10:51 PM)patrice Wrote: Here is an optimized code for the 15C: the algorithm is exactly the same as in the benchmark. It simply use available features more efficiently.

it saves about 35% of runtime. It don't think it is minor.

And that one should go on the list, as long as it represents what a user would do to solve that problem with the machine and as long as it's still the same algorithm. If your optimizations are too obscure (like the bitmap method) they are not representative of normal use, but using "decrease and skip if not zero" style instructions is OK in my opinion (it's common sense).

Claudio

patrice · 10-22-2014, 09:15 PM

(10-22-2014 01:10 PM)Claudio L. Wrote:
(10-21-2014 10:51 PM)patrice Wrote: Take the HP-15C: The single indirection swap in the inner loop imply a mostly 33% overhead. For you, I don't know, but for me, it is NOT MINOR.
30% seems a lot for anybody. When I watched your presentation, I understood that 30% was also including all your other optimizations. But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means.

The inner loop is 20 steps, when optimized, it is only 15 steps.
15/20= 0.75= 1 -25% : it saves 25% in the inner loop
20/15=1.33= 1 +33% : this is 33% overhead for the inner loop

(10-22-2014 01:10 PM)Claudio L. Wrote: But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means. But on other models I don't think that it will be a big impact. If I remember correctly for the C/Regvars, the change in speed was less than 10%, and that included disabling all compiler optimizations, not just that indirection.

All RPN models are using the same kind of technology and the impact will be essentially the same. What are you comparing the C/Regvars with ?

(10-22-2014 01:10 PM)Claudio L. Wrote:
(10-21-2014 10:51 PM)patrice Wrote: It just append that a benchmark IS a competition about being the fastest
What I meant is the goals are different. The objective of the benchmark is not to compete, but to measure and compare. People try to compete (naturally), but that's not what the benchmark is about.
On the benchmark you try to eliminate the user factor (to compare machines). On the competition the user factor is all that matters (it's who does it faster).

What can be the meaning of a comparison when one of the compared thing can be 40% faster? I think the only thing it show is that both can solve the problem.

I think that it is by showing people efficient programing technics that you will teach to use them
If you have the knowledge that lists are slow on RPL, you are free to find another way to solve the problem.

Claudio L. · 10-22-2014, 10:36 PM

(10-22-2014 09:15 PM)patrice Wrote: All RPN models are using the same kind of technology and the impact will be essentially the same. What are you comparing the C/Regvars with ?

With the "structured" C version that's on the benchmark, with no compiler optimizations.

(10-22-2014 09:15 PM)patrice Wrote: What can be the meaning of a comparison when one of the compared thing can be 40% faster? I think the only thing it show is that both can solve the problem.

I think it still does offer a decent comparison, but as I said previously, since your version uses the same algorithm it should be on the list. I wouldn't put the bitmap solution on the list, but your optimizations seem reasonable and are applicable to other problems.

(10-22-2014 09:15 PM)patrice Wrote: I think that it is by showing people efficient programing technics that you will teach to use them
If you have the knowledge that lists are slow on RPL, you are free to find another way to solve the problem.

Difference is that I don't think it's a competition, so RPL with lists is perfectly fine with me, and I think it represents how people use RPL. If somebody else wants to optimize it to use the stack, that's fine too.
My observation was not criticism, it went to show that RPN machines are not being unfairly treated, as other models are also in the same position. I'm sure a careful review will reveal that many other platforms/languages could also get a fair amount of optimization, but that doesn't necessarily invalidate the benchmark.

And please notice I have nothing to do with Xerxes, or this benchmark. I never submitted an entry (although somebody else submitted results using my code for the C/Regvars and ARM assembly for the 50g, so you could say I'm the author of those two). I just think its fine the way it is. Perfect? No, but still very useful and informative.

PS: On second thought, if I accept that you are right and this is a competition, would you say I "won"? Should I get at least a T shirt or something for authoring the fastest times on the list, undefeated since 2008? Just joking, of course... (L size, please)

xerxes · 10-23-2014, 12:29 PM

(10-21-2014 10:51 PM)patrice Wrote: Here is an optimized code for the 15C: the algorithm is exactly the same as in the benchmark. It simply use available features more efficiently.

Code:

42,21,11 LBL A 42 34 CLEAR REG 8 8 44 .0 STO .0 44 1 STO 1 42, 6,.1 ISG .1 43 7 DEG 44 8 STO 8 42, 6, 0 ISG 0 42,21, 1 LBL 1 45 0 RCL 0 44 25 STO I STO 9 ISG 9 DEG 42, 6,.1 ISG .1 42,21, 2 LBL 2 45 8 RCL 8 45,30,24 RCL - (i) 43 20 x=0? 22 3 GTO 3 43 16 ABS 45 9 RCL 9 45,30,25 RCL - I 43,30, 5 TEST 5 22 3 GTO 3 42, 5,25 DSE I 22 2 GTO 2 42, 6, 0 ISG 0 43 7 DEG 45 0 RCL 0 44 25 STO I 45 .0 RCL .0 43,30, 5 TEST 5 22 4 GTO 4 42, 4,24 X<> 8 STO (I) 22 0 GTO 1 42,21, 3 LBL 3 42, 5, 8 DSE 8 22 1 GTO 1 45 0 RCL 0 44 25 STO I 45 24 RCL (I) 44 8 STO 8 42, 5, 0 DSE 0 22 3 GTO 3 42, 5, 1 DSE 1 42, 6, 0 ISG 0 43 7 DEG 42, 6,.1 ISG .1 43 7 DEG 22 0 GTO 1 42,21, 4 LBL 4 45 .0 RCL .0 44 25 STO I 43 35 CLX 42,21, 6 LBL 6 1 1 0 0 20 * 45,40,24 RCL + (I) 42, 5,25 DSE I 22 6 GTO 6 45 .1 RCL .1 43 32 RTN registres 0 line poiunter 1-8 queens .0 Size .1 counter

Hi Patrice, IMHO this version fulfils the requirements of being fair and accurate enough in sense of comparing the same algorithm,
whats the goal of the n-queens benchmark. Thank you for your effort and showing whats possible.