Benchmarks?

The Museum of HP Calculators

HP Forum Archive 11

Benchmarks?
Message #1 Posted by Patrick on 28 Mar 2003, 4:01 p.m.

In a recent post about the Solver differences amongst various models, it was mentioned that certain models were faster than others. That made me notice that there are no actual benchmarks available on this site.
I realize that benchmarking is a difficult science, but perhaps as a group we could come up with something reasonable and practical. Maybe Dave would consider adding the results to his features lists.
Some comments:
- The comparison would only really be useful for long running operations, either programs or solver / integrate like functions. I personally don't think the speed of keyboard operations is all that critical, although the speed with which the 48G responds to x! is breathtaking after using the 15C for a while. Actually, I take that back... it takes WAY longer to find AND execute x! on the 48G than it does on the 15C!
- Solver speed seems to be a very reasonable benchmark since it uses a bunch of the machine's capabilities, if it weren't for the fact that it does not appear on that many models.
- For programming, I'm not sure how we'd address the RPN vs RPL issue. In many ways this is apples to oranges.


Re: Benchmarks?
Message #2 Posted by Jim L on 28 Mar 2003, 5:11 p.m.,
in response to message #1 by Patrick

There are these.

Re: Benchmarks?
Message #3 Posted by Patrick on 29 Mar 2003, 12:22 a.m.,
in response to message #2 by Jim L

Oops.
Didn't see these... my bad.
However, do I believe that the HP-11C is at 6% of the 9100A? Yikes. Then again, the 41C is only at 13% so maybe...
There are a few missing machines in the list. I suppose the 15C is pretty much the same as the 11C -- no change in the processor between these two?

Re: Benchmarks?
Message #4 Posted by Chan Tran on 28 Mar 2003, 10:10 p.m.,
in response to message #1 by Patrick

While I found that the 48GX is very fast compared to older calculator. The problem with the 48 is that it can be very slow with the stack.

[Long] "Stack speed" 48 series & 49G
Message #5 Posted by James M. Prange on 30 Mar 2003, 4:59 a.m.,
in response to message #4 by Chan Tran

The problem with the 48 is that it can be very slow with the stack
Well, the RPL stack can be very deep. I don't think that I'd call it slow if I restricted myself to using only 4 levels. On a 48GX, 4 ROLL or 4 ROLLD takes only about 21 ticks (0.0026 seconds). Can we blink that fast?
I have noticed that ROLL or ROLLD over a lot of levels can be relatively slow, about 186 ticks (0.023 second) on a 48GX if I have the numbers 1 through 1000 on the stack and execute 1000 ROLL or 1000 ROLLD. How long does that take on a non-RPL calculator?
For reference, here's my general purpose timing program, which I used for the above and for the total times in the programs below. Note that the "correction value" varies by model. In these particular calculators today, I used these values for correction, but the best value varies among units and even over time (Temperature? Battery condition? Phase of the moon?). To find the value, I put the number 1 on the otherwise empty stack, temporarily disable the last stack, arguments, and commands saves, run the timer program, and adjust the value until it *usually* returns "Ticks: 0" and "0_s". Well, on a 48 series that is; on a 49G I have to settle for *usually* +/- 10 ticks (0.0012 seconds). This program isn't 100% repeatable, but I expect that it's about the best that can be done in UserRPL, and plenty good enough for my purposes. If I care to get picky about routines that are very nearly the same (or very fast), then I run them in loops. Note that this program times the evaluation of the object on the stack; if that's a name, the time will include whatever's needed for name resolution.
48SX Checksum (using 104 below): # C2F4h 48GX Checksum (using 72 below): # 5F0h 49G Checksum (using 252. below): # CF3Eh Bytes: 136 %%HP: T(3)A(R)F(.); @ Header for ASCII download. \<< MEM \-> t @ Force GC, use MEM result for dummy 't'. \<< TICKS 't' STO @ Get start time, store it in 't'. EVAL @ Evaluate the object. TICKS @ Get stop time. RCWS @ Get current wordsize. 64. STWS @ Set wordsize to 64. SWAP t - @ Elapsed time. B\->R @ Convert binary to real. @ Correction for time to do TICKS 't' STO. @104 48SX value 72 @ 48GX value @252. 49G value - "Ticks" \->TAG @ Tag result. DUP 8192. / '1_s' * @ Also show in seconds. ROT STWS @ Restore original wordsize. \>> \>>
"Garbage collection" (but I still say that it should be called "memory packing") gets slower as the stack gets deeper, and often slows down drastically if the stack (which is really a stack of pointers) has many pointers into a large composite that's in temporary memory (that is, the composite isn't stored in a global or port variable). An example of such a pointer into a composite would be to build a list on the stack and use GET to extract an element from it. It "looks like" the element is on the stack, but really the pointer is into the list, which is still in temporary memory. The original list is still referenced and kept in temporary memory for the sake of the pointer until the element is dropped, stored in a global or port variable, combined into another list or a vector, or has the NEWOB command executed on it. (This also affects how much free memory there is.) The garbage collection routine automatically runs when the unused memory gets too low. If the garbage collection routine runs and finds a pointer into a composite, then it has to check all of the following objects in the composite one at a time to find the end of the composite and its size field so that it knows exactly which block of memory to move.
Usually, when speed is the most important criterion, I recommend using the stack instead of variables, and if that gets unmanageable, then local variables in preference to global variables. But it can be a huge advantage to store a large list in a global variable (to get it out of temporary memory) before taking it apart if there's an expectation that a garbage collection may occur with the "list elements" on the "stack". For example, compare the execution times for the following programs. Note that MEM forces a garbage collection (to make it easy to determine how much memory is available), and it's the easiest way to get an idea of how long garbage collection takes. In the following programs I've included internal timing routines for the MEM command itself. All of the timings are with the stack empty and with last stack, arguments, and commands saving enabled. Wordsize is 64. The 49G is in "approximate" mode. Note that the "correction value" varies by model; they were found experimentally and may not be "best" for all calculators. Also note that the times would vary with repeated trials.
This first one is just to time MEM on a nearly empty stack.
48SX Checksum (using 32 below): # 6255h 48GX Checksum (using 16 below): # 85AFh 49G Checksum (using 189. below): # F96Dh Bytes: 84.5 %%HP: T(3)A(R)F(.); @ Header for ASCII download. \<< TICKS @ Get the start time. MEM @ Force a garbage collection. TICKS @ Get the finish time. ROT - @ Elapsed time. B\->R @ Convert binary to real number. @ Correction for time to execute TICKS. @32 48SX value 16 @ 48GX value @189. 49G value - "Ticks" \->TAG @ Tag result. DUP 8192. / '1_s' * @ Also show in seconds. \>> 48SX: 378 ticks (0.046 second) for MEM, 1510 ticks (0.18 second) total. 48GX: 254 ticks (0.031 second) for MEM, 1031 ticks (0.13 second) total. 49G: 594 ticks (0.072 second) for MEM, 1738 ticks (0.21 second) total.
Now with the reals 1 through 1000 on the stack:
48SX Checksum (using 32 below): # C99Fh 48GX Checksum (using 16 below): # 2E65h 49G Checksum (using 189. below): # 62B9h Bytes: 111.5 %%HP: T(3)A(R)F(.); @ Header for ASCII download. \<< 1. 1000. @ Place the reals 1 through 1000 on the stack. FOR n n NEXT TICKS @ Get the start time. MEM @ Force a garbage collection. TICKS @ Get the finish time. ROT - @ Elapsed time. B\->R @ Convert binary to real number. @ Correction for time to execute TICKS. @32 48SX value 16 @ 48GX value @189. 49G value - "Ticks" \->TAG @ Tag result. DUP 8192. / '1_s' * @ Also show in seconds. \>> 48SX: 12751 ticks (1.56 seconds) for MEM, 56681 ticks (6.92 seconds) total. 48GX: 8866 ticks (1.08 seconds) for MEM, 39169 ticks (4.78 seconds) total. 49G: 9052 ticks (1.10 seconds) for MEM, 39057 ticks (4.77 seconds) total.
No list involved.
Now we'll build a list and explode it onto the stack:
48SX Checksum (using 32 below): # 428Dh 48GX Checksum (using 16 below): # A577h 49G Checksum (using 189. below): # CBB8h Bytes: 121.5 %%HP: T(3)A(R)F(.); @ Header for ASCII download. \<< 1. 1000. @ Place the reals 1 through 1000 on the stack. FOR n n NEXT DUP \->LIST @ Put the numbers into a list. LIST\-> @ Explode the list onto the stack. DROP @ Discard the count. TICKS @ Get the start time. MEM @ Force a garbage collection. TICKS @ Get the finish time. ROT - @ Elapsed time. B\->R @ Convert binary to real number. @ Correction for time to execute TICKS. @32 48SX value 16 @ 48GX value @189. 49G value - "Ticks" \->TAG @ Tag result. DUP 8192. / '1_s' * @ Also show in seconds. \>> 48SX: 1818259 ticks (221.96 seconds) for MEM, 1870881 ticks (228.38 seconds) total. 48GX: 1255937 ticks (153.31 seconds) for MEM, 1292199 ticks (157.74 seconds) total. 49G: 1180957 ticks (144.15 seconds) for MEM, 1216584 ticks (148.51 seconds) total.
"Looks like" the same thing on the stack for MEM, but because of the list, it's much slower. Oh well, that gives us time to start a fresh pot of coffee.
But if the list is stored in a global variable before we explode it:
48SX Checksum (using 32 below): # C0E2h 48GX Checksum (using 16 below): # 5AB7h 49G Checksum (using 189. below): # 4C07h Bytes: 156.5 %%HP: T(3)A(R)F(.); @ Header for ASCII download. @ Note well! This will purge a global variable named 'TEMP'. If you @ already have something named 'TEMP', then choose a different name! \<< 'TEMP' @ Place the name on the stack. 1. 1000. @ Place the reals 1 through 1000 on the stack. FOR n n NEXT DUP \->LIST @ Put the numbers into a list. OVER STO @ Store the list in a global variable. RCL @ Put the list back on the stack. LIST\-> @ Explode the list onto the stack. DROP @ Discard the count. TICKS @ Get the start time. MEM @ Force a garbage collection. TICKS @ Get the finish time. ROT - @ Elapsed time. B\->R @ Convert binary to real number. @ Correction for time to execute TICKS. @32 48SX value 16 @ 48GX value @189. 49G value - "Ticks" \->TAG @ Tag result. DUP 8192. / '1_s' * @ Also show in seconds. 'TEMP' PURGE @ Purge the variable. \>> 48SX: 6556 ticks (0.80 second) for MEM, 68400 ticks (8.35 seconds) total. 48GX: 4508 ticks (0.55 second) for MEM, 67779 ticks (8.27 seconds) total. 49G: 4996 ticks (0.61 second) for MEM, 51629 ticks (6.30 seconds) total.
Much better, even with the extra overhead of handling the global variable. I'm willing to trade the extra size for speed.
And no, using a local variable doesn't have this effect; local variables are still in temporary memory. Try:
48SX Checksum (using 32 below): # C035h 48GX Checksum (using 16 below): # 5590h 49G Checksum (using 189. below): # 4178h Bytes: 144 %%HP: T(3)A(R)F(.); @ Header for ASCII download. \<< 1. 1000. @ Place the numbers 1 through 1000 on the stack. FOR n n NEXT DUP \->LIST @ Put the numbers into a list. \-> TEMP @ Store the list in a local variable. \<< TEMP @ Put the list back on the stack. LIST\-> @ Explode the list onto the stack. DROP @ Discard the count. TICKS @ Get the start time. MEM @ Force a garbage collection. TICKS @ Get the finish time. ROT - @ Elapsed time. B\->R @ Convert binary to real number. @ Correction for time to execute TICKS. @32 48SX value 16 @ 48GX value @189. 49G value - "Ticks" \->TAG @ Tag result. DUP 8192. / '1_s' * @ Also show in seconds. \>> \>> 48SX: 1820005 ticks (222.63 seconds) for MEM, 1872915 ticks (228.63 seconds) total. 48GX: 1257259 ticks (153.47 seconds) for MEM, 1293734 ticks (157.93 seconds) total. 49G: 1181800 ticks (144.26 seconds) for MEM, 1217617 ticks (148.63 seconds) total.
The local variable didn't help a bit.
These timings were done were on a 48SX Version D, a 48GX Version R, and a 49G Revision #1.19-6.
Another technique is to force a garbage collection with MEM DROP just before these conditions occur, in the hope that garbage collections won't be needed at particularly bad times.
By the way, there's supposed to be a new 49G Revision 1.19-7 flash ROM already written, just waiting for HP to allow it to be released. The plan is (was?) that this will have an improved garbage collection routine, particularly when it comes to handling this issue with composite objects.
I hope that someone (maybe just RPL users?) will find this helpful or interesting. I enjoyed playing around with the calculators, and some things that I already "sort of knew" have soaked into my brain a bit deeper. The biggest surprise for me was how well the 49G did.
Regards,
James

Re: [Long] "Stack speed" 48 series & 49G
Message #6 Posted by Karl Schneider on 1 Apr 2003, 1:24 a.m.,
in response to message #5 by James M. Prange

James --
Good Lord, what an effort!
I suggest that the post be preserved in the Articles section.

Re: [Long] "Stack speed" 48 series & 49G
Message #7 Posted by James M. Prange on 2 Apr 2003, 5:53 a.m.,
in response to message #6 by Karl Schneider

Thank you.
It really wasn't that much of an effort; as I wrote, it was playing around. All very simple, straightforward RPL programming. Probably the biggest effort was adding comments for the benefit of those not familiar with RPL whilst avoiding adding typos to the programs. Just in case I did make typos, or someone manually keying in a program in made a typo, I included the outputs from the BYTES command. As you may have noticed, the programs share a good deal of code, so the various programs were a matter of editing a previous program, not of writing an entirely new program. And of course, I didn't actually key in the programs on each calculator; I transferred them via their serial ports.
After posting, I wished that I had included the 28 series calculators. But of course, that would entail keying them in; no serial port available. And the 28 series doesn't have a built-in TICKS command, I'd have to use SYSEVAL, and with a different address for each ROM version. The 28 series also lacks the BYTES or any other built-in checksum type of command, though I do have one copied from a book that I use for verifying that a program is keyed in correctly. And of course, on the 28C, with its very limited memory, I'm pretty sure that I wouldn't be able to put so many numbers on the stack, even with the last stack, arguments, and commands saves disabled. Still, I may decide to add something to the Articles section, and if so, I'll include the 28 series, but using a shallower stack.
I wrote:
The original list is still referenced and kept in temporary memory for the sake of the pointer until the element is dropped, stored in a global or port variable, combined into another list or a vector, or has the NEWOB command executed on it. (This also affects how much free memory there is.)
I should've mentioned that there are exceptions to this. If the element is an integer value from -9 through 9, the composite isn't kept in temporary memory for the sake of the element. These numbers (and many other objects) each have their own address in ROM; I guess that what's on the stack when one of these numbers (and perhaps other objects with ROM addresses?) are on the stack as elements, is a pointer to the ROM address, rather than a pointer into the composite in temporary memory.
For example, if you replace:
1. 1000. @ Place the reals 1 through 1000 on the stack. FOR n n NEXT DUP \->LIST @ Put the numbers into a list. LIST\-> @ Explode the list onto the stack.
in one of the slow programs with:
1. 1000. @ Place the real number zero onto the stack 1000 FOR n @ times. 0. NEXT 1000. \->LIST @ Combine the zeros into a list. LIST\-> @ Explode the list onto the stack.
then you'll find that it runs much faster.
Regards,
James

[ Return to Index | Top of Index ]

Go back to the main exhibit hall