Extending the precision of Woodstock or Saturn based calculators

08092017, 07:17 AM
(This post was last modified: 08092017 07:31 AM by Alejandro Paz(Germany).)
Post: #1




Extending the precision of Woodstock or Saturn based calculators
I was wondering if someone made any inroads in extending the precision of the algorithms implemented in the woodstock or (could also be nut) saturn based machines.
Let me explain: Say the word is 32 nibbles long instead of 14 or 16. I now that it needs a new processor, so to say. And the P register and handling needs extension, and so on. Note: I am aware of the extended precision done to the 41, I read something about it here in the forums. But it not is implemented with extra long working registers, nor has more than 13 digits. What are your thoughts about it ? See it as an experiment. I am working on this parallel version of the woodstock core, and was wondering if one could achieve better precision by extension of the word and fixing of the algorithms. I wanted to test with square root, I think it is quite simple and I have an over look of it, it doesn't need extra routines (at least the saturn version). 

08092017, 08:32 AM
Post: #2




RE: Extending the precision of Woodstock or Saturn based calculators
I suspect it would be better to produce new algorithm for the higher precision. Nobody is going to make an advanced NUT processor when there are a plethora of adequate processors available already.
Pauli 

08092017, 10:00 AM
Post: #3




RE: Extending the precision of Woodstock or Saturn based calculators
(08092017 08:32 AM)Paul Dale Wrote: I suspect it would be better to produce new algorithm for the higher precision. Nobody is going to make an advanced NUT processor when there are a plethora of adequate processors available already. Uh, really? Greetings, Massimo +×÷ ↔ left is right and right is wrong 

08092017, 11:04 AM
Post: #4




RE: Extending the precision of Woodstock or Saturn based calculators
The Newt is a fantastic piece of work. It is replicating the NUT with minimal extensions. The registers are the same size, the maths routine likewise. Sure, it's got lots of registers and heaps of memory but the precision isn't increased.
It also isn't a commodity processing  i.e. the price is high. Pauli 

08092017, 01:22 PM
Post: #5




RE: Extending the precision of Woodstock or Saturn based calculators
(08092017 11:04 AM)Paul Dale Wrote: The Newt is a fantastic piece of work. It is replicating the NUT with minimal extensions. The registers are the same size, the maths routine likewise. Sure, it's got lots of registers and heaps of memory but the precision isn't increased. It perfectly fits the scope it was developed for. I was questioning your Nobody is going to make an advanced NUT processor ;) Greetings, Massimo +×÷ ↔ left is right and right is wrong 

08102017, 05:58 AM
Post: #6




RE: Extending the precision of Woodstock or Saturn based calculators
Quote:I suspect it would be better to produce new algorithm for the higher precision. Nobody is going to make an advanced NUT processor when there are a plethora of adequate processors available already. I am trying to develop such a processor, a better Saturn if you want. FPGAs are very flexible ! I have other algorithms for ARM/MIPS processors. 

08102017, 07:42 AM
Post: #7




RE: Extending the precision of Woodstock or Saturn based calculators  
08102017, 07:48 AM
Post: #8




RE: Extending the precision of Woodstock or Saturn based calculators
(08102017 05:58 AM)Alejandro Paz(Germany) Wrote:Quote:I suspect it would be better to produce new algorithm for the higher precision. Nobody is going to make an advanced NUT processor when there are a plethora of adequate processors available already. Several years ago, I made similar investigations. Not really to get higher precision, but to provide higher speed and/or higher memory capacity. For the HP41, I considered (in my Emu41) to implement a floatingpoint unit (using a unused Nut opcode), with the goal to simply rewrite the system math routines. I ever made it (and will no more try) but some traces of my thoughts can still be found in my Emu41 sources (nutcpu.c). For the HP71, I once considered to increase the width or the A address field from 5 to 6 nibbles, to increase the memory space. But the compatibility problems with the existing firmware were too important, and I gave up. I also considered a FPU extension. All I did in my Saturn emulation was (as you know) to add the R5R7 registers (allowed by the opcode map) and increase the stack depth, although the HP71 firmware was not changed to take benefit of it. Fell free to reuse these ideas! JF 

08102017, 12:22 PM
Post: #9




RE: Extending the precision of Woodstock or Saturn based calculators
Quote:For the HP71, I once considered to increase the width or the A address field from 5 to 6 nibbles, to increase the memory space. But the compatibility problems with the existing firmware were too important, and I gave up. I remember you talked about the extra R registers, I also wondered about the free slots, so to say, in the opcode map. The increased stack would remove the need for that software stack implementation, it allows for extra 16 levels if I'm not mistaken. I just wonder how many times other software uses it (besides the ROM). I don't want to add too many unnecessary opcodes but I think there are a couple of tricks being used that could benefit for an extra opcode here and there, like checking if the sign digit is a 9. 

08112017, 03:07 AM
Post: #10




RE: Extending the precision of Woodstock or Saturn based calculators
A single opcode to do the RPL end sequence might be worthwhile. It would save two decode and execute cycles very often.
Pauli 

08112017, 06:04 AM
Post: #11




RE: Extending the precision of Woodstock or Saturn based calculators
And I'd really like a higher resolution 48 , not kidding. I have this nice 160x104 LCD sitting on my desk. Together with some 160x160, and that nice Sharp memory display at 400x240 (like what the DM42 has).
That is always the problem how to upgrade something without breaking everything that works. On the speed: I talk about the 1LF2 (HP71): One of the issues with the Saturn implementation is that with 4 bit memory width, you need as many cycles as nibbles to fetch the opcode. And then one extra cycle per calculated nibble. With say 16 bit wide memory one could fetch many opcodes in one cycle, most in two. Extending the width of the internal alu to say 16 bits, that of course poses other "hurdles" on implementation but it is doable, one could execute word opcodes in 4 cycles, plus fetch. A 64 bit parallel ALU could perform word opcodes in 1 cycle, needs 881 LUTs in a MachXO2, 4 A..D registers. Not bad. There are a couple possibilities for incrementing the speed, rising the clock is one of them, but not the best without addressing memory bandwidth. The HP48 uses byte accesses, readbeforewrite for RAM. Lots of work, if one wants 

08112017, 04:09 PM
Post: #12




RE: Extending the precision of Woodstock or Saturn based calculators
(08112017 03:07 AM)Paul Dale Wrote: A single opcode to do the RPL end sequence might be worthwhile. It would save two decode and execute cycles very often. That opcode was already done on the 49G+/50G internal Saturn emulator. Other new opcodes worth adding are MOVEDN and MOVEUP for the memory copying routines, those are responsible for most of the speedup of the 50g vs 49g. 

09142017, 11:38 AM
Post: #13




RE: Extending the precision of Woodstock or Saturn based calculators
I have been busy doing the "extending the precision" of the algorithms, I coded a simulator for an extended version of the Saturn. I did change the encoding, P has now 5 bits instead of 4. And there are some extra registers.
Calculating the square root of 5 needs as you can guess double the amount of executed opcodes as in the original Saturn. All the code to the experiment, called Parallel Neptune Core, can be found here. The source file of the sqrt function with the equivalent Saturn code follows: Code:


09142017, 11:47 AM
(This post was last modified: 09142017 11:50 AM by Alejandro Paz(Germany).)
Post: #14




RE: Extending the precision of Woodstock or Saturn based calculators
In the repository here there is a Parallel Saturn core, written in verilog. It is still a work in progress. I just coded, after much thought, a prefetching BUS controller.
The Parallel Saturn should improve the throughput in three different ways: Increasing memory bandwidth for fetch: I opted for a 16 bit memory width, some opcodes are 2 nibbles long, 2 optimally positioned opcodes could be fetched at once. This condition is extra checked. Increasing memory bandwidth for data access: again 16 bit accesses should improve transfers. Alignment issues arise here and readbeforewrite cycles are needed. Increasing the width of the ALU path: Using 64 bit registers at once, while limits the maximum frequency on the target FPGA (<10 MHz), should provide abundant improvement on long executing opcodes. The ALU is mostly coded, the bus controller is partially coded, no data accesses yet. The rest is a rehash of my nibbleserial (but fully working) 1LF2 implementation. Let's see if I can fully realize this other project . Even at 2 MHz, it should be many times faster than a Yorke, let's see. 

09232017, 07:48 AM
Post: #15




RE: Extending the precision of Woodstock or Saturn based calculators
Oddly enough, I was looking at extended precision math just the other week  on a Z80.
Extended precision data was held in data registers, and the number of registers to use was the main variable (if set to zero, only the Stack registers are used). I appreciate you guys are talking about CPU level instructions. I'm sill thinking about the higher levels (like how to display the result). Extending the precision by a single register should give proof of concept, check the new math routines, yet not slow down processing too much. Start with divide by 3, and square root of 5, before jumping to : asin(acos(atan(sin(cos(tan(1/9)))))) 

09242017, 07:57 AM
(This post was last modified: 09242017 08:06 AM by Alejandro Paz(Germany).)
Post: #16




RE: Extending the precision of Woodstock or Saturn based calculators
Quote:Start with divide by 3, and square root of 5, before jumping to : asin(acos(atan(sin(cos(tan(1/9)))))) That's why I used 5 as an example for the square root . Anecdotally, some TI calculators like the TI82, TI83, TI84, TI85 and TI86 among others use a Z80 as main processor. They can be divided in two groups lower and upper end. Both groups use very similar routines, the difference lays in the precision of the algorithms used, really minimal, like 24 digits and larger exponents. But, the interesting part is that the numbers are stored in memory and used in place. The Z80 has a full complement of BCD friendly opcodes add, sub, daa (for addition and subtraction), and 4 bit shifts between accumulator and memory !, and many 16 bit pointers. All these resources are very well exploited in the mentioned models. The whole math group of routines in the Saturn takes like 4 kbytes of memory, in the case of the Z80 (like in the TI8x but not the 89) it takes, if I'm not mistaken, like 10 kbytes with the inflexibility of having the numbers always fixed in memory. One could tailor such routines to use BC, DE, and HL but that means only 6 digits. I think that the way the TIs handle the whole is quite clever. That is a point where the Z80 actually excels in comparison with other more modern processors. Always talking about packed BCD. Nowadays one can achieve quite a bit more performance using base 100 (like I did), base 10000 (like the unix command bc) or greater bases like newRPL and others. Something doable when you have a relatively fast division and multiplication instructions, something totally lacking in the Z80. (The Z180 has a mul instruction) and memory constraints are not as severe as they once were !. Another anecdote: Some like 20 years ago, I decided to use the H8/300H (A then Hitachi processor much like but not compatible with the MC68k) as the basis for my handheld calculator. This processor has 8 32 bit registers, as you can imagine I used the registers to temporarily contain the fractions of the floating point registers as I performed calculations, limited to 16 packed BCD digits. My target speed was like 8 MHz and that made kind of sense. Today, I'd go with a greater base, it just makes more sense with a RISC processor. I also developed an AVR based BCD four function package with 16 digits precision. The AVR doesn't help at all with BCD and unrolled routines needed quite a bit of space. 

09282017, 12:40 PM
(This post was last modified: 09282017 12:42 PM by Alejandro Paz(Germany).)
Post: #17




RE: Extending the precision of Woodstock or Saturn based calculators
I have updated the Parallel Saturn core. At this point many instructions have been implemented. Some instructions are still missing like the ones dealing with device configuration, shutdown, interrupts. Memory read is partially implemented and memory write is still missing. But there is enough to let the synthesizer give us an idea of how big and fast it is. Big and slow for a MachXO27000ZE1
Code:
90 % of this FLGA is pretty big, and the maximum speed is well, slow. There are a couple of important points to consider: The 64bit computations, ADD, SUB, Logic and a mux are performed between two consecutive flanks of the clock. I haven't attempted any optimization here beyond latching both source arguments (Line 241..248, saturn_alru.v). The slowest path goes through the subtract unit, as expected. But clock speed doesn't tell the whole story, if we do not know how many clocks are needed for an opcode. Small opcodes like P=n take 4 clocks to complete but fetch can take up to 3 clocks. While the perfetcher does a pretty good work it can be greatly improved. Jumps take 2 clocks plus fetch. An unaligned opcode needs at least 2 extra clocks for fetch. For comparison, the nibble serial implementation on the same FPGA can achieve 16 MHz. But opcodes need many more cycles: Code:
27 cycles @ 16 Mhz = 1.68 us 7 cycles @ 3.2 Mhz = 2.18 us 9 cycles @ 16 Mhz = 0.5625 us 5 cycles @ 3.2 Mhz = 2.18 us [/code] I think that without improving the prefetcher it will be difficult to see any gain . Fetch and subsequent execute and streamlining of the fetch state machine. At least one clock can be dropped as ST_INIT is not really needed, imho. The ALU path can also be extended to two cycles allowing to double the frequency. 

« Next Oldest  Next Newest »

User(s) browsing this thread: 1 Guest(s)