The Museum of HP Calculators

HP Forum Archive 20

[ Return to Index | Top of Index ]

34s: Anyone have an instruction timing table?
Message #1 Posted by gene wright on 18 July 2011, 8:13 p.m.

This used to be done by the user community in the PPC days.

What I'm wondering is whether such a table for opcode instruction timing exists. Something like:

+ : 10 ms

- : 11 ms

etc. The reason this might be handy is if you then saw something in the table like this:

Multiply : 20 ms Divide : 500 ms

if you understand what I mean. Without such a list, there might be a few operations on the 34s that are out of kilter and MUCH slower than one would expect. If roll up takes 10x as long as roll down, then perhaps it would be best to either try optimizing the roll up code more or doing four roll down instructions.

So... anyone have something like this yet?

      
Re: 34s: Anyone have an instruction timing table?
Message #2 Posted by Paul Dale on 18 July 2011, 8:54 p.m.,
in response to message #1 by gene wright

Not that I'm aware of. About the only way to figure out timings would be to write a small program that looped lots. I'd also expect some variation based on arguments.

I do know that all the logarithms (and functions that use logarithms) are slow.

- Pauli

            
Re: 34s: Anyone have an instruction timing table? - example of a "slow" function
Message #3 Posted by gene wright on 18 July 2011, 9:13 p.m.,
in response to message #2 by Paul Dale

It appears that the ROUND function is particularly slow. Any chance :-) this function could be reviewed for some optimization? :-) And, of course, please don't take this as any criticism at all!

===================

Jake found:

If I store a long decimal value in R05 and take the little program

LBL C

RCL 05

x<>X (acting as a NOP)

Roll down

1

+

GTO C

and starting with zero in X and run it for roughly 5 seconds, it counts up to 5613.

If I then replace the x<>X with ROUND, things get interesting.

Setting the display to FIX 0, the count after ~5 seconds is 3463.

With FIX 1, it reached 3330.

With FIX 2, it reached only 47.

With FIX 5, the count reached 46.

With FIX 11, the count was about the same.... 46.

And all the ones in between 3 and 11 seemed to count only up to 45-47 each.

The ROUND function surely makes the 34S run at a much slower pace.

                  
Re: 34s: Anyone have an instruction timing table? - example of a "slow" function
Message #4 Posted by Paul Dale on 19 July 2011, 12:44 a.m.,
in response to message #3 by gene wright

It will be fast in everything except FIX mode :-)

The reason being that it does a 10^x internally which requires logs. I'm guessing multiplication by 0 and 1 are special cased.

I might be able to do something with it to speed up the FIX case too.

- Pauli

                  
Re: 34s: Anyone have an instruction timing table? - example of a "slow" function
Message #5 Posted by Paul Dale on 19 July 2011, 12:50 a.m.,
in response to message #3 by gene wright

And a fix is in. Should be much faster once built (rev 1256 or later) -- probably later this evening. Let me hope it works properly still :-)

x<>X isn't a good NOP here, it goes through a very different code path than ROUND. ABS or +/- are better for the NOP.

- Pauli

Edited: 19 July 2011, 8:17 a.m. after one or more responses were posted

                        
Great! A timing table would allow focused optimization :-)
Message #6 Posted by gene wright on 19 July 2011, 8:07 a.m.,
in response to message #5 by Paul Dale

Thanks. That's great.

Perhaps some of us (yep, I know... suggest and you volunteer) should try this type of thing with other instructions?

It would help point out areas that would benefit from optimization or instructions that just seem much slower than "normal".

Will test the fix in the program shortly. Thanks again.

                              
Re: Great! A timing table would allow focused optimization :-)
Message #7 Posted by Paul Dale on 19 July 2011, 8:15 a.m.,
in response to message #6 by gene wright

Quote:
yep, I know... suggest and you volunteer

You be learning :-)

Quote:
It would help point out areas that would benefit from optimization or instructions that just seem much slower than "normal".

Anything involving logarithms (which includes powers) will be slow. Most other things seem to be fast enough. A quick scan over the source code will identify problematic functions easily enough.

Even if slow functions are identified, there is no guarantee that they can be sped up. Some functions just can't and others will get too much larger to avoid the problematic subroutines. This one was nice and easy -- no space gain, an acceptable portability loss and a good speed improvement.

- Pauli

Edited: 19 July 2011, 8:20 a.m.

                                    
Re: Great! A timing table would allow focused optimization :-)
Message #8 Posted by gene wright on 19 July 2011, 8:54 a.m.,
in response to message #7 by Paul Dale

I understand.

However, if we can find them through example code like the stuff for ROUND, then you may be able to adjust them. You might not, but at least it could be looked at. :-)

                        
Re: 34s: Anyone have an instruction timing table? - example of a "slow" function
Message #9 Posted by Jake Schwartz on 19 July 2011, 10:24 a.m.,
in response to message #5 by Paul Dale

Quote:
x<>X isn't a good NOP here, it goes through a very different code path than ROUND. ABS or +/- are better for the NOP.

...yeah, well, that was what came into my head at that moment :-) I think it was a sufficient measuring method anyway, since the loop with x<>X still ran very fast as compared to ROUND. I'll use ABS next time.

Jake

                              
The real NOP instruction?
Message #10 Posted by gene wright on 19 July 2011, 5:36 p.m.,
in response to message #9 by Jake Schwartz

Actually, why not just use NOP as the NOP?

:-)

                                    
Re: The real NOP instruction?
Message #11 Posted by Marcus von Cube, Germany on 19 July 2011, 5:42 p.m.,
in response to message #10 by gene wright

Because it is not a monadic operation like ABS or ROUND that modifies the X register and sets LastX. NOP is simply ignored by the execution engine and therefore carries considerably less overhead.

                                          
Re: The real NOP instruction?
Message #12 Posted by gene wright on 19 July 2011, 6:13 p.m.,
in response to message #11 by Marcus von Cube, Germany

I think the real thing Jake was doing with this instruction was to simply see how the loop count was affected by replacing a NOP with the ROUND instruction at various FIX settings.

X<>X worked, since it just essentially did nothing, as would NOP since it really does do nothing. :-)

The "trouble" remains that ROUND slows down considerably if FIX is much greater than 2 or 3. :-(

I understand that will stay that way if it has to call 10^x which uses the log routines which are (relatively compared to the blinding speed the 34s shows in other areas) slow.

The reason it was a concern to us in the first place is that we have gotten used to having the 34s blow away any other machine when it is running something but the little code we were testing here was SLOWER than existing machines.

We don't like the 34s to be slower. ;-)

(Again, just in case my text here is not clear... none of this is in any way meant to be critical of anything. We're all just trying to help find places to tweak for improvements!).

                                                
Re: The real NOP instruction?
Message #13 Posted by Paul Dale on 19 July 2011, 6:31 p.m.,
in response to message #12 by gene wright

Quote:
X<>X worked, since it just essentially did nothing, as would NOP since it really does do nothing. :-)

Both of these go via very different code paths than a monadic function like round. NOP takes no arguments and does nothing with last X and doesn't even call a worker routine. x<>X goes through the commands with arguments path which again don't bother with last X but instead decodes the argument. To get a representative idea of the timing, it will be best to use as similar a function as possible. In this case, however, ROUND is so slow it probably doesn't matter much.

As for being slower, I've not tested but I'll take your word for it. The 10^x/log is gone from the code path so that clearly wasn't the big expensive operation :-( Digging a bit deeper ends up in some code in the decimal library we're using (which I haven't been bothered to figure out how exactly it works). Unless I suddenly get motivated to fix the library (which isn't all that likely), ROUND is going to stay slow. Better slow and correct than fast and wrong.

- Pauli

                        
Pauli, Round still seems slow.
Message #14 Posted by gene wright on 19 July 2011, 11:03 a.m.,
in response to message #5 by Paul Dale

I did Jake's test with build 1257 and it still drops from counts in the thousands at Fix < 3 or 4 down to counts of 50-60 at Fix 5 or higher.

                        
Pauli check ROUND yourself please...
Message #15 Posted by gene wright on 19 July 2011, 8:26 p.m.,
in response to message #5 by Paul Dale

You did think a "fix" was in that should speed up the ROUND instruction.

Can you do some checks using Jake's example above?

                              
Re: Pauli check ROUND yourself please...
Message #16 Posted by Paul Dale on 19 July 2011, 11:28 p.m.,
in response to message #15 by gene wright

I implemented what I thought would be the fix and it didn't work. This stuff happens, my idea as to where the problem was was incorrect.

I'm not planning on digging further into the underlying problem at the moment. The slowness comes from decQuantizeOp() in decNumber/decNumber.c but it isn't clear to me where.

- Pauli


[ Return to Index | Top of Index ]

Go back to the main exhibit hall