FORTH for the PC-G850(V)(S)
11-06-2022, 01:56 PM (This post was last modified: 11-22-2022 05:34 PM by robve.)
Post: #1
 robve Senior Member Posts: 360 Joined: Sep 2020
FORTH for the PC-G850(V)(S)
Recently started a new Forth project. This time for the Sharp PC-850(V)(S). Perhaps this is interesting to some of you.

Repository: https://github.com/Robert-van-Engelen/Forth850

It's about 8K in size plus 8K free space or more, written from scratch in Z80 assembly and Forth itself. It's an ANS/standard compliant, directly threaded code (DTC) Forth implementation. Words are case insensitive.

No file IO and no floating point yet (no syscalls are published as far as I know), but may look into floating point later, e.g. to use a Z80 32 bit float library, if it fits.

Why write from scratch? Well, firstly well-known Z80 Forth implementations like CamelForth, eForth and Jupiter Ace aren't ANS Forth compliant and lack features. Secondly, it's more fun to write from scratch. The Moving Forth article was helpful, but I've decided to use DE as TOS, BC as IP (instruction pointer) and RP (return stack pointer) in memory. It's faster that way, since DE (TOS) and HL (scratch) are exchangeable in only 4 cycles and HL is a versatile register pair. Even faster would be to implement subroutine threaded code (STC) to execute Z80 code directly without DTC fetch+execute, but that will require 50% more space on average for Forth programs. Nice for speed, but perhaps not so much for usability on a small 32K machine.

With a bit of time tinkering, I came up with efficient Z80 code for Forth words, more efficient in some cases than CamelForth, eForth and Jupiter Ace (the latter isn't particularly fast by the way). If someone can point to more efficient Z80 Forth DTC implementations, then please let me know.

For example, my signed 16x16->16 multiplication takes only 51 cycles per bit versus 62 cycles per bit published by Zilog. I'm using the ASZ80 macro assembler with the CODE macro defining a Forth word "*" in machine code with label "star":

Code:
                CODE *,star                 push de                 ; save TOS                 exx                     ; save bc with ip                 pop de                  ; n2->de                 pop bc                  ; n1->bc                 ld hl,0                 ; 0->hl                 ld a,c                  ; c->a low order byte of n1                 ld c,b                  ; b->c save high order byte of n1                 ld b,8                  ; 8->b loop counter 1$: rra ; 4 ; loop, a>>1->a set cf jr nc,2$        ;  7    ;   if cf=1 then                 add hl,de       ; 11    ;     hl+de->hl 2$: sla e ; 8 ; rl d ; 8 ; de<<1->de djnz 1$         ; 13(51); until --b=0                 ld a,c                  ; c->a high order byte of n1                 ld b,8                  ; 8->b loop counter 3$: rra ; 4 ; loop, a>>1->a set cf jr nc,4$        ;  7    ;   if cf=1 then                 add hl,de       ; 11    ;     hl+de->hl 4$: sla e ; 8 ; rl d ; 8 ; de<<1->de djnz 3$         ; 13(51); until --b=0                 push hl                 ; save hl with product as TOS                 exx                     ; restore bc with ip                 pop de                  ; pop new TOS                 JP_NEXT                 ; continue

EDIT: even faster multiplication is possible. To calculate the high order byte we do not need to iterate over all 8 bits of the high order multiplier stored in register c, but only over the nonzero bits. We also can ignore the lower order result stored in register e. This reduces the max loop iteration cycle time to 32 and 33 per bit. Furthermore, the second loop only runs until the last bit of register c is shifted out. If register c is zero, the second loop does not execute thereby saving hundreds of cycles. We also use jp instead of jr to improve and balance the cycle time per bit:
Code:
                CODE *,star                 push de                 ; save TOS                 exx                     ; save bc with ip                 pop de                  ; n2 -> de                 pop bc                  ; n1 -> bc                 ld hl,0                 ; 0 -> hl                 ld a,c                  ; c -> a low order byte of n1                 ld c,b                  ; b -> c save high order byte of n1                 ld b,8                  ; 8 -> b loop counter 1$: rra ; 4 ; loop, a >> 1 -> a set cf jr nc,2$        ;  7    ;   if cf = 1 then                 add hl,de       ; 11    ;     hl + de -> hl 2$: sla e ; 8 ; rl d ; 8 ; de << 1 -> de djnz 1$         ; 13(51); until --b = 0                 ld a,h                  ; h -> a do high order, low order is done                 jr 5$; jump to shift c and loop 3$:             add d           ;  4    ; loop, a + d -> a 4$: sla d ; 8 ; d << 1 -> d 5$:             srl c           ;  8    ;   c >> 1 -> c set cf and z if no bits left                 jr c,3$; 12/7(32); until cf = 0 repeat with addition jp nz,4$        ;   10(33); until c = 0 repeat without addition                 ld h,a                  ; a -> h                 push hl                 ; save hl with product as TOS                 exx                     ; restore bc with ip                 pop de                  ; pop new TOS                 JP_NEXT                 ; continue

At the heart of Forth parsing are my CHOP and TRIM words in assembly that exploit special Z80 ops, which I found to be more suitable and efficient than CamelForth's approach:

Code:
; CHOP          c-addr u1 char -- c-addr u2 ;               truncate string up to matching char ;               leaves string if char not found ;               char=0x20 (bl) chops 0x00 to 0x20 (white space)                 CODE CHOP,chop                 ld a,e                  ; char->a                 exx                     ; save bc with ip                 ex af,af'               ; save a with char                 pop bc                  ; pop u1->bc                 ld e,c                  ;                 ld d,b                  ; u1->de                 ld a,c                  ;                 or b                    ; test bc=0, 0->cf                 jr z,2$; if bc=0 then not found pop hl ; push hl ; c-addr->hl ex af,af' ; restore a with char cp 0x20 ; jr z,3$                 ; if a=0x20 then find white space                 or a                    ; 0->cf not found ;               find char in string                 cpir                    ; repeat until a=[hl++] or --bc=0                 jr nz,2$; if match then 1$:             ccf                     ; complement cf when found, to correct cpi bc-- 2$: ex de,hl ; u1->hl sbc hl,bc ; u1-bc-cf->hl push hl ; save hl as TOS exx ; restore bc with ip pop de ; pop new TOS JP_NEXT ; continue ; find white space in string 3$:             cp (hl)         ;  7    ; loop to compare a to [hl]                 cpi             ; 16    ;   hl++,bc--                 jr nc,1$; 7 ; if [hl]<a then found jp pe,3$        ; 10    ; until bc=0                 jr 1$; not found ; TRIM c-addr1 u1 char -- c-addr2 u2 ; trim initial chars ; char=0x20 (bl) trims 0x00 to 0x20 (white space) CODE TRIM,trim ld a,e ; char->a exx ; save bc with ip pop bc ; u1->bc pop hl ; c-addr1->hl 1$:             ex af,af'               ; save a                 ld a,c                  ;                 or b                    ;                 jr z,3$; if bc<>0 then ex af,af' ; restore a 2$:             cpi             ; 16    ;   loop                 jr nz,4$; 7 ; while a=[hl++],--bc jp pe,2$        ; 10    ;   until b=0 3$: push hl ; save hl as 2OS push bc ; save bc as TOS exx ; restore bc with ip pop de ; pop new TOS JP_NEXT ; continue 4$:             cp 0x20                 ;                 jr nz,5$; if char=0x20 then dec hl ; cp (hl) ; inc hl ; jr nc,1$                ;   if [hl-1]<=0x20 then keep trimming 5$: inc bc ; correct bc++ for cpi match dec hl ; correct hl-- for cpi match jr 3$                   ; finalize trimming

My case-insensitive dictionary search FIND-WORD should speed up Forth compilation by quite a bit:

Code:
; FIND-WORD     c-addr u -- c-addr 0 | xt 1 | xt -1 ;               search dictionary for matching word ;               leaves execution token and 1=immediate or -1=not immediate ;               leaves c-addr and 0 when not found                 CODE FIND-WORD,findword                 ld a,d                  ;                 or a                    ; test d=0 high order byte of u                 jp nz,zero_next         ; if u is too large then set new TOS to 0                 sla e                   ; shift u to compare w/o immediate bit                 jp c,zero_next          ; if u is too large then set new TOS to 0                 jp z,zero_next          ; if u=0 then set new TOS to 0                 push de                 ; save de with 2*u                 exx                     ; save bc with ip                 pop bc                  ; pop 2*u->bc                 pop de                  ; pop c-addr->de                 ld hl,(context+3)       ; CONTEXT->hl                 jr 3$; start searching ; loop over dictionary 1$:             pop de                  ; restore de with c-addr 2$: pop hl ; 10 ; loop, restore hl with lfa 3$:             ld a,(hl)       ;  7    ;                 inc hl          ;  6    ;                 ld h,(hl)       ;  7    ;                 ld l,a          ;  4    ;   [hl]->hl follow link at hl=lfa                 or h            ;  4    ;                 jr z,6$; 7 ; if hl=0 then not found push hl ; 11 ; save hl with lfa inc hl ; 6 ; inc hl ; 6 ; hl+2->hl with nt (nfa) ld a,(hl) ; 7 ; word length add a ; 4 ; shift away immediate bit cp c ; 4 ; test a=c word length match (both shifted) jr nz,2$        ; 12(95);   if lengths differ then continue searching ;               compare string to word                 push de                 ;   save de with c-addr                 inc hl                  ;   hl++ point to nfa chars                 ld b,c                  ;   2*u->b                 srl b                   ;   u->b word length (nonzero) ;               loop over word chars 4$: ld a,(de) ; 7 ; loop cp (hl) ; 7 ; compare [de]=[hl] jr z,5$         ; 12/7  ;     if mismatch then                 and 0xdf        ;    7  ;       make upper case                 cp 'A           ;    7  ;                 jr c,1$; 7 ; if a<'A' then continue search cp 'Z+1 ; 7 ; jr nc,1$        ;    7  ;       if a>'Z' then continue search                 xor (hl)        ;    7  ;                 and 0xdf        ;    7  ;       case insensitive compare [de]=[hl]                 jr nz,1$; 7 ; if mismatch then continue search 5$:             inc de          ;  6    ;     de++ point to next char of c-addr                 inc hl          ;  6    ;     hl++ point to next char of word                 djnz 4$; 13(51/102);until --b=0 ; found a matching word pop de ; discard saved c-addr ex (sp),hl ; save hl with xt as 2OS, restore hl with lfa inc hl ; inc hl ; hl+2->hl with nt (nfa) bit immediate_bit,(hl) ; test immediate bit of [hl] word length exx ; restore bc with ip jp nz,one_next ; set new TOS to 1 if word is immediate jp mone_next ; set new TOS to -1 ; not found 6$:             push de                 ; save de with c-addr as 2OS                 exx                     ; restore bc with ip                 jp zero_next            ; set new TOS to 0

The Z80 is alright for Forth (I've used the Z80 back in the 80s and loved it), but not as good as the 6809 and the ESR-L (PC-E500) that have two stack registers. What a luxury compared to the Z80!

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
11-08-2022, 09:23 PM
Post: #2
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: FORTH for the PC-G850(V)(S)
Currently testing the Forth850 for Forth standard compliance and performance.

The NQUEENS benchmark runs in 0.865 seconds.

This is measured in "fast mode", looping NQUEENS 100 times with a stopwatch.

In "slow mode", the NQUEENS benchmark runs in a little longer in 0.95 seconds.

This is the same Forth NQUEENS program as benchmarked on the PC-E500.

The PC-G850 runs at 8MHz, whereas the PC-E500 runs at 2.3MHz with NQUEENS taking 3.47 seconds. Since 8/2.3 = 3.48, I am happy to see that the performance of these implementations are comparable when adjusting for clock speed. The Z80 runs slightly faster, but that is attributable to the fact the the PC-E500 has a 20 bit address space, which requires 16<->20 bit conversions in Forth when dealing with addresses stored 16 bit Forth cells.

If it matters, the "slow mode" version of Forth850 uses ~300 bytes less of the 8K Forth core, which doesn't seem critical to me to save space. So the "fast" version is probably the best way to go to release once testing is completed.

Both slow and fast versions perform stack over/underflow detection and BREAK key interrupt detection. Without these, NQUEENS would run in 0.84 seconds. But that version won't be as usable to play with since it will crash when the stack over/underflows.

Will release Forth850 soon when testing is completed, so anyone can try it out and rerun the benchmark.

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
11-09-2022, 07:45 PM
Post: #3
 Sukiari Member Posts: 180 Joined: Dec 2014
RE: FORTH for the PC-G850(V)(S)
This is incredible, thank you. It would be amazing to see a new generation of these kinds of machines, hopefully projects like yours will drum up more interest and eventually we see something maybe crowd funded.
11-10-2022, 09:36 PM
Post: #4
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: FORTH for the PC-G850(V)(S)
Forth850 is now available: https://github.com/Robert-van-Engelen/Forth850

It has 294 words in an 8K executable, including 2xx and Dxx words for double cells, 256 byte string buffers PAD, TIB and TMP, string operations, exceptions THROW and CATCH, VALUE, 2VALUE, FORGET, VOCABULARY, simple graphics with DRAW and VIEW, port IO with OUT and INP.

It's super fast by design and written from scratch in Z80 assembly (see the README in the repository for details). Not only does the nqueens benchmark run in only 0.865 seconds, compiling Forth source is speedy too without noticeable delays.

A nice feature of Forth850 is that the size of the free dictionary space is user-definable. This is done by setting USERxxxx in the Monitor to the size of the Forth system. At minimum the system requires 9K e.g. USER23FF to accommodate the 8K executable with minimal stacks and dictionary space. The largest is USER75FF with about 21K of free dictionary space.

I've included a forth850.wav file to load the Forth 850 binary via a cassette interface. I will look into serial later. I need to acquire a serial cable first.

Things that will be nice to add (in progress):
• an additional extended "full version" with additional built-in Forth words
• with an improved command line editor, like Forth500
• ability to load Forth source code from the TEXT editor
• serial IO to communicate via RS232 and load source code

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
11-11-2022, 01:55 AM
Post: #5
 xerxes Member Posts: 146 Joined: Jun 2014
RE: FORTH for the PC-G850(V)(S)
(11-10-2022 09:36 PM)robve Wrote:  It's super fast by design and written from scratch in Z80 assembly

Indeed! 5 times faster than C is really impressive. The only more efficient language on a pocket computer is the Pascal compiler
of the PB-2000C with 1.27 seconds at 0.91 MHz or clock speed adjusted 0.144 seconds at 8.0 MHz. Thank you for the test.

Calculator Benchmark
11-12-2022, 12:28 AM (This post was last modified: 11-12-2022 06:55 PM by robve.)
Post: #6
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: FORTH for the PC-G850(V)(S)
Well, nothing can or will always go 100% according to plan! During a last cleanup effort of the source code to post on the repository, I accidentally copied a DROP line. As a result, entering a Forth command produced <ERR-13>.

I was pulling my hair out to figure this one out. I had spent three days to carefully test everything during my free hours before I ran into this today to test an addition of new GETKEY word.

It's fixed now

The PC-G850 Monitor helped me a lot to check my code and set breakpoints. The PC-G850 is a great little machine!

Does anyone have suggestions for a small Z80 single precision (32 bit) floating point library? BCD or IEEE754 or something else?

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
11-12-2022, 06:26 PM (This post was last modified: 11-12-2022 06:55 PM by robve.)
Post: #7
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: FORTH for the PC-G850(V)(S)
With the DRAW word you can create new characters to display with text.

For example, define GREMIT (graphic emit) to display 8x6 pixel characters:

Code:
: GREMIT   CREATE   DOES>     6 DRAW     28 EMIT \ cursor right ;

To display $$\alpha$$ and $$\beta$$ we just need to define their pixels, which is simpler in binary with %:

Code:
GREMIT alpha   %00111000 C,   %01000100 C,   %01000100 C,   %00111000 C,   %01000100 C,   %00000000 C, GREMIT beta   %11111110 C,   %00100001 C,   %01001001 C,   %01001001 C,   %00110110 C,   %00000000 C,

So ." result " alpha ." =" beta displays result α=β

The VIEW word extracts pixel patterns. With this word you can move pixels and OR- or XOR-pixels to DRAW.

The cursor's code uses VIEW and DRAW to invert the cursor on screen. KEY displays the cursor. I've written a version of EMIT to support control characters, which the PC-G850 does not support natively.

Lots of possibilities to enhance Forth programs.

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
11-13-2022, 10:15 PM (This post was last modified: 11-13-2022 10:26 PM by robve.)
Post: #8
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: FORTH for the PC-G850(V)(S)
Forth850 can be extended "on the fly" with your own Z80 machine code using the PC-G850(V)(S) built-in Z80 Assembler.

To clarify, here are the steps. A BEEP example follows below.
• first define the word in Forth850 with NFA, <name>
• HERE HEX . displays the ORG xxxx address to specify in the assembly code
• write the assembly in the TEXT editor with this ORG xxxxH address
• run the Assembler, which adds the code to the <name> you defined in Forth850
• write down the number of bytes assembled, this is important
• in Forth850, DECIMAL <number> ALLOT makes the code part of <name> so that HERE HEX . displays the next address after the code (use negative -nnnn ALLOT to undo ALLOT when making code changes, or pick a larger ALLOT value to reserve space for assembly code additions)
• IMPORTANT: rerun the Assembler, because the first assembly was overwritten by Forth850 (the hold area is HERE) before you ALLOTed
A couple of (obvious) points to keep in mind when writing machine code for Forth850:
• register BC holds the IP (instruction pointer) and should not be modified, i.e. save it somewhere to restore it before returning
• register DE holds the TOS (top of stack) and can be used and set
• the stack can be used to push/pop Forth parameters
• register IY holds the address of "next" and should not be modified
• return from the code with JP (IY), never use RET

Example BEEP word.

In Forth850:
Code:
NFA, BEEP HERE HEX . 20FA

In TEXT editor (line numbers not shown):
Code:
      ORG 20FA       di              ;       di              ; disable interrupts       ld hl,0000h     ;       xor a           ; loop: out (18h),a     ; loop, out audio port wait: dec l           ;   loop       jr nz,wait      ;   until --l=0       cpl             ; switch on/off       dec h           ;       jr nz,loop      ; until --h=0       ei              ; enable interrupts       jp (iy)         ; next

Assemble the code, which is 18 bytes long. Return to Forth850 with BASIC and CALL256 and enter:
Code:
18 ALLOT

Important: run the Assembler again to save the code in the ALLOTed space.

Now BEEP works in Forth850.

Alternatively, the BEEP word can also be defined with HEX codes in Forth850 as follows, which takes more effort but with the same result:
Code:
NFA, BEEP       ( -- )   HEX   F3 C,         \       di              ;   F3 C,         \       di              ; disable interrupts   21 C, 0 ,     \       ld hl,0000h     ;   AF C,         \       xor a           ;   D3 C, 18 C,   \ loop: out (18h),a     ; loop, out audio port   D2 C,         \ wait: dec l           ;   loop   20 C, FD C,   \       jr nz,wait      ;   until --l=0   2F C,         \       cpl             ; switch on/off   25 C,         \       dec h           ;   20 C, F7 C,   \       jr nz,loop      ; until --h=0   FB C,         \       ei              ; enable interrupts   FD C, E9 C,   \       jp (iy)         ; next   DECIMAL       \ 18 bytes

A simple example that flips the bytes of the TOS stored in register DE, which takes 5 bytes of machine code:
Code:
NFA, FLIP HERE HEX . xxxx 5 ALLOT
Code:
      ORG xxxxH       ld a,e       ld e,d       ld d,a       jp (iy)

As always when writing assembly, if something is seriously wrong with it then we may crash and have to start over. In the worst case we have to install Forth850 again when the dictionary is damaged by "random POKEs". However, it is often not necessary to reset the machine when asked for MEMORY CLEAR (Y/N) just say N (NO) and give it another try after fixing the problem.

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
11-14-2022, 07:35 PM
Post: #9
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: FORTH for the PC-G850(V)(S)
A really nice thing about the PC-G850(V)(S) is that the ROM and RAM addresses are known and included in the excellent PC-G850V(S) User Guide translated by Jack W. Hu.

So let's go forth and put our "Pokecon" to some good use by defining a new Forth850 word TEXT that reads the TEXT editor area to evaluate Forth. In this way we can write Forth code in the built-in TEXT editor and then read it back into Forth:
• write Forth code with line numbers in the TEXT editor
• in Forth enter TEXT to read the source code
The TEXT area start- and end-pointers are located in RAM at &H7973-4 and &H7975-6, respectively. The BASIC and TEXT code has a &HFF marker at TEXT start and end. The internal BASIC/TEXT line format is: hi lo len data... &H0D where
• hi is high order byte of line number, hi is &HFF at end
• lo is low order byte of line number
• len is length of the line up to and including the &H0D byte
Keeping this in mind, the TEXT word is defined as follows, using EVALUATE to evaluate each line in the TEXT area:
Code:
: TEXT   $7973 @ 1+ >R BEGIN R> \ -- addr DUP C@$FF <> WHILE     2+ DUP C@ SWAP 1+   \ -- len addr     2DUP + >R     SWAP 1- EVALUATE   REPEAT   DROP ;

A minor caveat of using EVALUATE for each line: .( and ( cannot span multiple lines, i.e. a ) must be on the same line.

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
11-29-2022, 07:56 PM
Post: #10
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: FORTH for the PC-G850(V)(S)

I've written new IEEE 754 floating point routines from scratch in Z80 for Forth850. I wasn't happy with the Z80 floating point libraries I found on GitHub and elsewhere, which appear to have sizable code bases and are heavy on memory access and use. The new Z80 floating point routines I wrote operate with Z80 registers only, including the shadow registers, except for a one push-pop pair to pass a value from a shadow register pair to a register pair. The shadow registers are available for use with the PC-G850 (i.e. not used by interrupts). So let's put them to work.

The new math.asm Z80 library:
• IEEE 754 single precision floating point addition, subtraction, negation, multiplication, division, integer <-> float conversion, string <-> float conversion
• "memoryless" using registers only (+shadow), at most one push+pop per flop
• optimized for speed and reduced code size (no loop unrolling)
• extensively tested
But lacks (at this time):
• no INF/NAN values; routines return error condition (cf set)
• truncated 24 bit mantissa instead of IEEE 754 unbiased "banker's rounding" (floating point string parsing and output use rounding)

I may extend the library to support INF/NAN and banker's rounding to make it compliant with IEEE 754. Right now, the only observable difference is the rounding (lack thereof). Consistency of the numerical treatment is important.

The Forth code for the Sine plot shown in the picture:
Code:
3.14159265E0 2CONSTANT PI PI 72E0 F/ 2VALUE SCALE \ free to change e.g.: PI 144E0 F/ TO SCALE : SINE   PAGE   144 0 DO     I DUP S>D D>F SCALE F*     FSIN 24E0 F*     F>D D>S 24 + POINT   LOOP   GETKEY DROP ;
Trig functions in Forth:
Code:
: FSIN  ( r1 -- r2 ) \ only valid for -2pi <= r1 <= 2pi   2DUP 2DUP 2DUP F* FNEGATE   2ROT 2ROT   15 2 DO     5 PICK 5 PICK     F* I I 1+ *     S>D D>F F/     2DUP 2ROT F+ 2SWAP   2 +LOOP   2DROP 2SWAP 2DROP ; : FCOS  ( r1 -- r2 ) 1.5707962E0 2SWAP F- FSIN ; : FTAN  ( r1 -- r2 ) 2DUP FSIN 2SWAP FCOS F/ ;

I've updated the forth850-full version with the new features. The binary is about 10K, so there is plenty of space left on the machine. I've tested the implementation extensively and will continue to do so in the near future.

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
12-07-2022, 03:07 AM
Post: #11
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: FORTH for the PC-G850(V)(S)
(11-29-2022 07:56 PM)robve Wrote:  I may extend the library to support INF/NAN and banker's rounding to make it compliant with IEEE 754. Right now, the only observable difference is the rounding (lack thereof).

I rewrote part of the Z80 IEEE 754 math core today to add two proper IEEE 754 rounding modes to choose from (at compile time):
• round to nearest, ties to even a.k.a. banker's rounding, typically used by libmath as the default rounding mode (C/C++, Python, Lua, Julia etc.)
• round to nearest, ties to away
The Forth850 update will use round to nearest, ties to even. Correct rounding is applied to addition, subtraction, multiplication, division and all Forth floating point math functions (since they use these core routines.) The string input and output of floats was already properly rounding. The Z80 math library is 67 bytes longer for just a tad over 1K of portable Z80 code and does not suffer from performance degradation or other compromises, i.e. no memory accesses as everything is still kept in Z80 registers.

Adding INF/NAN is should actually be easy with some logic to test these special cases. I could make it an option. But I prefer to have exceptions in Forth to catch floating point overflow, division by zero etc. so signaling NaN and INF exceptions.

With internal proper round to nearest, ties to even the new sine plot is perfectly symmetric on close inspection:

A bit surprised myself that I was able to pull this off, as I was already using all Z80 registers

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
01-01-2023, 05:45 PM
Post: #12
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: FORTH for the PC-G850(V)(S)
Happy to share that Forth850 v1.0 is now available on GitHub.

The basic 8K version has 295 words. The more complete 11K version has 349 words, including IEEE 754 single precision and a more capable line editor (like Forth500). Both versions were extensively tested and verified *). The performance has not changed since the initial release. The n-queens benchmark is solved in 0.865 seconds.

I wrote three versions of a new IEEE 754 single precision math library with standard floating point operations in Z80 assembly solely using Z80 registers, i.e. "memoryless" **). All trig and other math functions are defined in Forth in MATH.FTH.
- math.asm: a truncating version that is small, 960 bytes of code;
- mathr.asm: a proper rounding version (IEEE 754 round to nearest, ties to even) 1085 bytes;
- mathri.asm: like mathr.asm extended with IEEE INF and NAN (Forth850 does not use INF/NAN) and signed zero, but no subnormals (which are evil), 1286 bytes.

It felt quite satisfying to put this all together, as everything fits in Z80 registers, including shadow, but not IX and IY which remain unused. Some effective use of "Z80 tricks" I learned in the 80s

- Rob

*) while this version was extensively tested and verified, there is no guarantee that there are zero bugs. Please report issues so I can fix them.

**) "memoryless" but a PUSH+POP must be used to copy to Z80 shadow registers, so there is at most one PUSH+POP pair per flop (add, sub, mul, div).

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
01-03-2023, 03:32 AM (This post was last modified: 01-03-2023 03:40 AM by F-73P.)
Post: #13
 F-73P Junior Member Posts: 43 Joined: May 2020
RE: FORTH for the PC-G850(V)(S)
Great work!

Did you write routines to convert between strings and floats? I've started writing string-to-double and double-to-string routines and it's quite challenging, requiring extended-precision arithmetic and tables.

The C language combines all the power of assembly language with all the ease-of-use of assembly language
01-03-2023, 06:57 PM (This post was last modified: 01-04-2023 02:30 PM by robve.)
Post: #14
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: FORTH for the PC-G850(V)(S)
(01-03-2023 03:32 AM)F-73P Wrote:  Did you write routines to convert between strings and floats? I've started writing string-to-double and double-to-string routines and it's quite challenging, requiring extended-precision arithmetic and tables.

Yes. For single precision floating point, that is.

I evaluated three strategies with tables. Because single precision floating point decimal exponents are limited to the +/-38 range, a full powers of 10 table is logical to use but an overkill. I found that a table with 12 entries saves memory and is sufficiently accurate, almost as accurate as a full table.

I've documented this in the mathr.asm assembly source and also in mathri.asm (same, but with inf/nan and signed zero) see the fpow10 routine:

Code:
;                 method                                   min bits   mean bits ;                 iterative multiplication by 10:           20.2439     23.4988 ;                 iterative multiplication by 10**4:        21.3502     24.4916 ;                 iterative multiplication by 10**5:        21.3243     24.5500 ;                 iterative multiplication by 10**6:        21.3297     24.5937 ;                 iterative multiplication by 10**7:        21.3253     24.5619 ;                 iterative multiplication by 10**8:        21.3004     24.5260 ;                 iterative multiplication by 10**10:       21.0809     24.4483 ;                 iterative multiplication by 10**16:       20.7694     24.0836 ;                 table lookup with  4 powers of 10:        21.4136     24.6670 ;                 table lookup with  8 powers of 10:        21.9131     25.2981 ;                 table lookup with  9 powers of 10:        21.8907     25.4628 ;                 table lookup with 10 powers of 10:        22.0177     25.5388 ;               * table lookup with 12 powers of 10:        22.0482     25.7529 ;                 table lookup with 16 powers of 10:        22.0004     25.4073 ;                 exponentiation by squaring from bottom:   22.0004     25.3867 ;                 exponentiation by squaring from top:      22.0023     25.3830 ;                 table lookup with 38 powers of 10:        exact       exact ; ;               * = selected and enabled for small code size and high accuracy

where "min bits" means the minimal number of bits guaranteed to be accurate -log(d)/log(2.0) with minimum d of relative errors e between x and y = fabs(x - y)/fabs(y), and "mean bits" means the number of bits accurate on average (relative error) in 10 million random samples: -log(a/SAMPLES)/log(2.0) with a the sum of relative errors. The mantissa has 24 bits. Note that summing relative errors can get over 25 "mean bits", which is when more cases are exact than have errors. I ran the tests in C code (DM me if you want this code).

The method selected (table of 12) is enabled with .if-.endif, while the others are disabled, but could be used. I would believe that with double precision floating point you could use a small table, perhaps as small as 12 but probably larger. Exponentiation by squaring is another option to keep the table small. See my assembly source code for the Z80 routines.

As you also point out, the powers of 10 tables are used by atof (string to float) and ftoa (float to string). The mathr.asm ftoa routine computes ((exponent - bias - 1) * 77 + 77 + 82) / 256 to estimate the decimal exponent given the biased binary exponent. I've seen this elsewhere, but I made an improvement to adjust with +82 to prevent underestimation. With double precision this will need work and more tweaking.

In the atof routine I used a 32 bit mantissa to multiply by 10 to maintain accuracy and produce a 24 bit mantissa from a string of digits without loss or precision up to 31 digits, with rounding to nearest ties to even when over 10 digits or so. For double precision, this needs a 64 bit mantissa.

Beware of pitfalls when parsing floats. Floats can be represented in many different ways, such as 1234567890E-45 and 0.0000000000123456789E45 for example. Rather obvious to look at, but the code should handle it.

I wrote the math code in less than a week in the evenings, but testing took another week or two to make sure there are no bugs and to make some tweaks. For example, I found that the round to nearest ties to even with "sticky bits" didn't work correctly, so I came up with a better way to implement this mechanism without requiring much code or using Z80 registers which I can't, since they are already all in use.

There is a lot more I could comment on. If you have questions, let me know in your reply or DM me.

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
01-05-2023, 07:52 AM (This post was last modified: 01-06-2023 06:42 AM by F-73P.)
Post: #15
 F-73P Junior Member Posts: 43 Joined: May 2020
RE: FORTH for the PC-G850(V)(S)
Brilliant!

I'm using a relatively simple but inefficient approach for converting strings to single-precision floating-point: use multiple-precision arithmetic and tables to find integers s,e such that s x 2^e = user input, where s element of [2^23,2^24) and is rounded up. Then use s,e to obtain the IEEE 754 single-precision encoding.

Its working for the few values I've tested (0.5 -> 0x3F000000, 2 -> 0x40000000 and 2.5 -> 0x40200000). Once I've added higher powers of 10 to the table and parsing exponents I'll try smaller and bigger values.

For floating-point to string I'll use the first algorithm in this paper.

The C language combines all the power of assembly language with all the ease-of-use of assembly language
 « Next Oldest | Next Newest »

User(s) browsing this thread: 1 Guest(s)