The Museum of HP Calculators

HP Forum Archive 17

[ Return to Index | Top of Index ]

35s - how fast?
Message #1 Posted by Thomas Radtke on 14 July 2007, 5:00 a.m.

Sorry for abusing this noble forum once again for a 35s question: Has this machine been benchmarked already in terms of speed?

Thank you :-)

      
Re: 35s - how fast?
Message #2 Posted by Massimo Gnerucci (Italy) on 14 July 2007, 5:29 a.m.,
in response to message #1 by Thomas Radtke

Read through Gene's review, in the last pages you'll find the answer... ;-)

Greetings,
Massimo

Edited: 14 July 2007, 5:30 a.m.

            
Re: 35s - how fast?
Message #3 Posted by Thomas Radtke on 14 July 2007, 5:36 a.m.,
in response to message #2 by Massimo Gnerucci (Italy)

Stupid me, thanks a lot, Massimo!

Edit: My 32SII does the looping test in 15 seconds. Twice as fast as the 35s? I must have overlooked something.

Edited: 14 July 2007, 5:43 a.m.

                  
Re: 35s - how fast?
Message #4 Posted by Eric Smith on 14 July 2007, 2:25 p.m.,
in response to message #3 by Thomas Radtke

The 33s and 35s use a GeneralPlus (formerly SunPlus) microcontroller with a 6502 core, which can run at up to 4 MHz, but is probably running slower in th calculator.

The 32SII used an HP Saturn core at about 650 KHz, if memory serves. The Saturn core was designed to do BCD arithmetic very efficiently; fixed point BCD addition or subtraction takes only a little more than one clock per digit, as does shifting. And of course floating point is performed in software by use of a lot of fixed point adds and shifts.

The 6502 takes many more cycles to do the same thing. It takes at least 3 cycles to do a binary add of a byte in memory to the accumulator, so to add two 15-digit floating point mantissas together (after they've been aligned) will require a code sequence something like the following (which is completely untested), assuming that the operands and result are stored in zero page in packed form:

ADDM:   LDX #7
        CLC
L1:     LDA OP1,X
        ADC OP2,X
        STA OP1,X
        DEX
        BPL L1

That takes about 139 cycles on the 6502, while on the Saturn the equivalent takes about 19 cycles (with operands in the 64-bit processor registers).

Also, the 33s and 35s firmware is written mostly (or perhaps entirely) in C. The 6502 isn't a very good target architecture for C, so that doesn't result in efficient code. If the arithmetic routines are written in C, they may be much worse than hand-coded routines. In particular, the compiler is unlikely to infer the use of the 6502's decimal mode.

                        
Re: 35s - how fast?
Message #5 Posted by Thomas Radtke on 14 July 2007, 2:55 p.m.,
in response to message #4 by Eric Smith

Thanks for giving some insights!

The museum benchmarks give more reasonable figures, so hopefully most meaningful applications won't run slower on the 35s than on the pioneer.

BTW, I have in mind implementing the error function which I often use and already implemented on the TI-59, PSION LZ and Sharp 1500 (the fastest!). At least, the 35s shouldn't evaluate it slower than the TI ;-)

                        
Re: 35s - how fast?
Message #6 Posted by Andrés C. Rodríguez (Argentina) on 15 July 2007, 2:50 p.m.,
in response to message #4 by Eric Smith

There was and old saying which was more or less...

"Software becomes slower more rapidly than hardware becomes faster"

In this case, it is not only software, but also architecture (specialized vs. general purpose).

However, I like (mostly) the 35s, slow as it may be.

                        
Re: 35s - how fast?
Message #7 Posted by hugh steers on 17 July 2007, 6:44 p.m.,
in response to message #4 by Eric Smith

hi eric,

the choice of the 6502 architecture i find perplexing. i understand that hp might not have had the luxuary to change from the 33s, but that doesn't explain the 33s. the compiler point is important and i think that there are better choices than the 6502.

presumably there is not an ARM slow, cheap or low power enough to fit the bill. good compilers were built for the old 8086 architecture which might make an alternative. dust down your old copies of turbo C, and return (const char FAR*)FuManchu; well perhaps not.

but seriously, unless they're doing something really funky, i would expect the biggest limitation is a 64k address space. with 32k RAM, then you've only got 32K rom. this rules out adding a lot of extra function. which is why stuff might be missing already.

so better than segments might be our old friend the 68000, eg 16MHz dragonball like the palm had. flat architecture with mature compilers.

one thought; is there an architure licensing cost. for example is the 6502 now effectively free? when all others would require, at least some, license.

also, i had an idea about your ADD code.

Gene's article mentiones 15 decimal internal precision and an 8 byte mantissa (without signs). if i had to write the code in C for the 6502 or someting like that, i'd implement a base 100 decimal system with 2 digits per byte stored in binary. since without leveraging the decimal instructions of the 6502, arithmetic will be a shift and mask extravaganza or else write the math code by hand.

so with a base 100, you need a spare "padding" nybble that means 8 bytes gives you only 15 digits. which is what you've got, so maybe this is what they actually do.

im using the same idea in hplua but with base 10,000 each "digit" of 0 to 9999 stored in 16 bits. the idea is that i get to use 16x16->32 multiply and replace divide constants with inverted mul constants BUT the hit is that i waste 3 nybbles in this base. this isn't so bad because my floats are 16 bytes (compared to 35s 12 bytes). i also take a hit converting each 16 bit binary to and from decimal for IO, but this is not signigicant overall.

                              
Re: 35s - how fast?
Message #8 Posted by Paul Dale on 17 July 2007, 7:07 p.m.,
in response to message #7 by hugh steers

Quote:
but seriously, unless they're doing something really funky, i would expect the biggest limitation is a 64k address space. with 32k RAM, then you've only got 32K rom. this rules out adding a lot of extra function. which is why stuff might be missing already.

If I'm remembering correctly, the address space is way over 64k. There was quite a bit of mask rom in the CPU.

Quote:
Gene's article mentiones 15 decimal internal precision and an 8 byte mantissa (without signs). if i had to write the code in C for the 6502 or someting like that, i'd implement a base 100 decimal system with 2 digits per byte stored in binary. since without leveraging the decimal instructions of the 6502, arithmetic will be a shift and mask extravaganza or else write the math code by hand.

There are more space efficient packing mechanisms for decimals: http://www2.hursley.ibm.com/decimal/dbover.html. All part of the IEEE-854 compliant decNumber library http://www2.hursley.ibm.com/decimal. The actual computations are carried out in base 10^n (n defaulting to 3, again from memory), so things are still relatively fast.

Using 12 bytes for reals that would easily fit into 8 is bording on criminal. Using 37 bytes for each register is just wanton wastefulness :-)

- Pauli

                                    
Re: 35s - how fast?
Message #9 Posted by hugh steers on 18 July 2007, 9:59 a.m.,
in response to message #8 by Paul Dale

the idea of using 10 bits for 3 digits is denser than Packed BCD, as you've pointed out. but i don't think things would still be relatively fast. i would expect this to be slower than a PBCD implementation because its more complicated. for example, one of the mantissa digits is stored in the exponent.

nevertheless, it does get close to binary efficiency, ie 16 decimal digits for 8 bytes.

                                          
Re: 35s - how fast?
Message #10 Posted by Paul Dale on 18 July 2007, 6:19 p.m.,
in response to message #9 by hugh steers

Quote:
the idea of using 10 bits for 3 digits is denser than Packed BCD, as you've pointed out. but i don't think things would still be relatively fast. i would expect this to be slower than a PBCD implementation because its more complicated. for example, one of the mantissa digits is stored in the exponent.

Yes, of course, it must be slower than a pure PBCD implementation. In use, however, the performance loss doesn't seem to be such an issue. You unpack the numbers once, perform all your operations in what amounts to a PBCD format and repack at the end.

Quote:
nevertheless, it does get close to binary efficiency, ie 16 decimal digits for 8 bytes.

Yes, this was the bit that most surprised me. The packing is *very* efficient and not too far from a pure binary equivalent. Plus it includes all the IEEE nicities like denormalised, NaNs, infinities and proper rounding.

Pauli

      
Re: 35s - how fast?
Message #11 Posted by Ed Look on 17 July 2007, 1:53 p.m.,
in response to message #1 by Thomas Radtke

Allow me to further the abuse-

I see there is no expandable memory capability on the 35S. So, can it be assumed still, as was the case on the 33S, that if one has too many programs, they might not all fit?


[ Return to Index | Top of Index ]

Go back to the main exhibit hall