Post Reply 
FORTH for the PC-G850(V)(S)
11-06-2022, 01:56 PM (This post was last modified: 05-21-2023 12:41 PM by robve.)
Post: #1
FORTH for the PC-G850(V)(S)
Recently started a new Forth project. This time for the Sharp PC-850(V)(S). Perhaps this is interesting to some of you.

[Image: PC-G850-forth.png]

Repository: https://github.com/Robert-van-Engelen/Forth850

It's about 8K in size plus 8K free space or more, written from scratch in Z80 assembly and Forth itself. It's an ANS/standard compliant, directly threaded code (DTC) Forth implementation. Words are case insensitive.

No file IO and no floating point yet (no syscalls are published as far as I know), but may look into floating point later, e.g. to use a Z80 32 bit float library, if it fits. (EDIT: floating point is included with the latest version).

Why write from scratch? Well, firstly well-known Z80 Forth implementations like CamelForth, eForth and Jupiter Ace aren't ANS Forth compliant and lack features. Secondly, it's more fun to write from scratch. The Moving Forth article was helpful, but I've decided to use DE as TOS, BC as IP (instruction pointer) and RP (return stack pointer) in memory. It's faster that way, since DE (TOS) and HL (scratch) are exchangeable in only 4 cycles and HL is a versatile register pair. Even faster would be to implement subroutine threaded code (STC) to execute Z80 code directly without DTC fetch+execute, but that will require 50% more space on average for Forth programs. Nice for speed, but perhaps not so much for usability on a small 32K machine.

With a bit of time tinkering, I came up with efficient Z80 code for Forth words, more efficient in some cases than CamelForth, eForth and Jupiter Ace (the latter isn't particularly fast by the way). If someone can point to more efficient Z80 Forth DTC implementations, then please let me know.

For example, my signed 16x16->16 multiplication takes only 51 cycles per bit versus 62 cycles per bit published by Zilog. I'm using the ASZ80 macro assembler with the CODE macro defining a Forth word "*" in machine code with label "star":

Code:
                CODE *,star
                push de                 ; save TOS
                exx                     ; save bc with ip
                pop de                  ; n2->de
                pop bc                  ; n1->bc
                ld hl,0                 ; 0->hl
                ld a,c                  ; c->a low order byte of n1
                ld c,b                  ; b->c save high order byte of n1
                ld b,8                  ; 8->b loop counter
1$:             rra             ;  4    ; loop, a>>1->a set cf
                jr nc,2$        ;  7    ;   if cf=1 then
                add hl,de       ; 11    ;     hl+de->hl
2$:             sla e           ;  8    ;
                rl d            ;  8    ;   de<<1->de
                djnz 1$         ; 13(51); until --b=0
                ld a,c                  ; c->a high order byte of n1
                ld b,8                  ; 8->b loop counter
3$:             rra             ;  4    ; loop, a>>1->a set cf
                jr nc,4$        ;  7    ;   if cf=1 then
                add hl,de       ; 11    ;     hl+de->hl
4$:             sla e           ;  8    ;
                rl d            ;  8    ;   de<<1->de
                djnz 3$         ; 13(51); until --b=0
                push hl                 ; save hl with product as TOS
                exx                     ; restore bc with ip
                pop de                  ; pop new TOS
                JP_NEXT                 ; continue

EDIT: even faster multiplication is possible. To calculate the high order byte we do not need to iterate over all 8 bits of the high order multiplier stored in register c, but only over the nonzero bits. We also can ignore the lower order result stored in register e. This reduces the max loop iteration cycle time to 32 and 33 per bit. Furthermore, the second loop only runs until the last bit of register c is shifted out. If register c is zero, the second loop does not execute thereby saving hundreds of cycles. We also use jp instead of jr to improve and balance the cycle time per bit:
Code:
                CODE *,star
                push de                 ; save TOS
                exx                     ; save bc with ip
                pop de                  ; n2 -> de
                pop bc                  ; n1 -> bc
                ld hl,0                 ; 0 -> hl
                ld a,c                  ; c -> a low order byte of n1
                ld c,b                  ; b -> c save high order byte of n1
                ld b,8                  ; 8 -> b loop counter
1$:             rra             ;  4    ; loop, a >> 1 -> a set cf
                jr nc,2$        ;  7    ;   if cf = 1 then
                add hl,de       ; 11    ;     hl + de -> hl
2$:             sla e           ;  8    ;
                rl d            ;  8    ;   de << 1 -> de
                djnz 1$         ; 13(51); until --b = 0
                ld a,h                  ; h -> a do high order, low order is done
                jr 5$                   ; jump to shift c and loop
3$:             add d           ;  4    ; loop, a + d -> a
4$:             sla d           ;  8    ;   d << 1 -> d
5$:             srl c           ;  8    ;   c >> 1 -> c set cf and z if no bits left
                jr c,3$         ; 12/7(32); until cf = 0 repeat with addition
                jp nz,4$        ;   10(33); until c = 0 repeat without addition
                ld h,a                  ; a -> h
                push hl                 ; save hl with product as TOS
                exx                     ; restore bc with ip
                pop de                  ; pop new TOS
                JP_NEXT                 ; continue

At the heart of Forth parsing are my new CHOP and TRIM words in assembly that exploit special Z80 ops, which I found to be better suitable and more efficient than CamelForth's approach to parsing. The CHOP and TRIM words:

Code:
; CHOP          c-addr u1 char -- c-addr u2
;               truncate string up to matching char
;               leaves string if char not found
;               char=0x20 (bl) chops 0x00 to 0x20 (white space)

                CODE CHOP,chop
                ld a,e                  ; char->a
                exx                     ; save bc with ip
                ex af,af'               ; save a with char
                pop bc                  ; pop u1->bc
                ld e,c                  ;
                ld d,b                  ; u1->de
                ld a,c                  ;
                or b                    ; test bc=0, 0->cf
                jr z,2$                 ; if bc=0 then not found
                pop hl                  ;
                push hl                 ; c-addr->hl
                ex af,af'               ; restore a with char
                cp 0x20                 ;
                jr z,3$                 ; if a=0x20 then find white space
                or a                    ; 0->cf not found
;               find char in string
                cpir                    ; repeat until a=[hl++] or --bc=0
                jr nz,2$                ; if match then 
1$:             ccf                     ; complement cf when found, to correct cpi bc--
2$:             ex de,hl                ; u1->hl
                sbc hl,bc               ; u1-bc-cf->hl
                push hl                 ; save hl as TOS
                exx                     ; restore bc with ip
                pop de                  ; pop new TOS
                JP_NEXT                 ; continue
;               find white space in string
3$:             cp (hl)         ;  7    ; loop to compare a to [hl]
                cpi             ; 16    ;   hl++,bc--
                jr nc,1$        ;  7    ;   if [hl]<a then found
                jp pe,3$        ; 10    ; until bc=0
                jr 1$                   ; not found

; TRIM          c-addr1 u1 char -- c-addr2 u2
;               trim initial chars
;               char=0x20 (bl) trims 0x00 to 0x20 (white space)

                CODE TRIM,trim
                ld a,e                  ; char->a
                exx                     ; save bc with ip
                pop bc                  ; u1->bc
                pop hl                  ; c-addr1->hl
1$:             ex af,af'               ; save a
                ld a,c                  ;
                or b                    ;
                jr z,3$                 ; if bc<>0 then
                ex af,af'               ;   restore a
2$:             cpi             ; 16    ;   loop
                jr nz,4$        ;  7    ;     while a=[hl++],--bc
                jp pe,2$        ; 10    ;   until b=0
3$:             push hl                 ; save hl as 2OS
                push bc                 ; save bc as TOS
                exx                     ; restore bc with ip
                pop de                  ; pop new TOS
                JP_NEXT                 ; continue
4$:             cp 0x20                 ;
                jr nz,5$                ; if char=0x20 then
                dec hl                  ;
                cp (hl)                 ;
                inc hl                  ;
                jr nc,1$                ;   if [hl-1]<=0x20 then keep trimming
5$:             inc bc                  ; correct bc++ for cpi match
                dec hl                  ; correct hl-- for cpi match
                jr 3$                   ; finalize trimming

My case-insensitive dictionary search FIND-WORD should speed up Forth compilation by quite a bit:

Code:
; FIND-WORD     c-addr u -- c-addr 0 | xt 1 | xt -1
;               search dictionary for matching word
;               leaves execution token and 1=immediate or -1=not immediate
;               leaves c-addr and 0 when not found

                CODE FIND-WORD,findword
                ld a,d                  ;
                or a                    ; test d=0 high order byte of u
                jp nz,zero_next         ; if u is too large then set new TOS to 0
                sla e                   ; shift u to compare w/o immediate bit
                jp c,zero_next          ; if u is too large then set new TOS to 0
                jp z,zero_next          ; if u=0 then set new TOS to 0
                push de                 ; save de with 2*u
                exx                     ; save bc with ip
                pop bc                  ; pop 2*u->bc
                pop de                  ; pop c-addr->de
                ld hl,(context+3)       ; CONTEXT->hl
                jr 3$                   ; start searching
;               loop over dictionary
1$:             pop de                  ; restore de with c-addr
2$:             pop hl          ; 10    ; loop, restore hl with lfa
3$:             ld a,(hl)       ;  7    ;
                inc hl          ;  6    ;
                ld h,(hl)       ;  7    ;
                ld l,a          ;  4    ;   [hl]->hl follow link at hl=lfa
                or h            ;  4    ;
                jr z,6$         ;  7    ;   if hl=0 then not found
                push hl         ; 11    ;   save hl with lfa
                inc hl          ;  6    ;
                inc hl          ;  6    ;   hl+2->hl with nt (nfa)
                ld a,(hl)       ;  7    ;   word length
                add a           ;  4    ;   shift away immediate bit
                cp c            ;  4    ;   test a=c word length match (both shifted)
                jr nz,2$        ; 12(95);   if lengths differ then continue searching
;               compare string to word
                push de                 ;   save de with c-addr
                inc hl                  ;   hl++ point to nfa chars
                ld b,c                  ;   2*u->b
                srl b                   ;   u->b word length (nonzero)
;               loop over word chars
4$:             ld a,(de)       ;  7    ;   loop
                cp (hl)         ;  7    ;     compare [de]=[hl]
                jr z,5$         ; 12/7  ;     if mismatch then
                and 0xdf        ;    7  ;       make upper case
                cp 'A           ;    7  ;
                jr c,1$         ;    7  ;       if a<'A' then continue search
                cp 'Z+1         ;    7  ;
                jr nc,1$        ;    7  ;       if a>'Z' then continue search
                xor (hl)        ;    7  ;
                and 0xdf        ;    7  ;       case insensitive compare [de]=[hl]
                jr nz,1$        ;    7  ;       if mismatch then continue search
5$:             inc de          ;  6    ;     de++ point to next char of c-addr
                inc hl          ;  6    ;     hl++ point to next char of word
                djnz 4$         ; 13(51/102);until --b=0
;               found a matching word
                pop de                  ;   discard saved c-addr
                ex (sp),hl              ;   save hl with xt as 2OS, restore hl with lfa
                inc hl                  ;
                inc hl                  ;   hl+2->hl with nt (nfa)
                bit immediate_bit,(hl)  ;   test immediate bit of [hl] word length
                exx                     ;   restore bc with ip
                jp nz,one_next          ;   set new TOS to 1 if word is immediate
                jp mone_next            ;   set new TOS to -1
;               not found
6$:             push de                 ; save de with c-addr as 2OS
                exx                     ; restore bc with ip
                jp zero_next            ; set new TOS to 0

The Z80 is alright for Forth (I've used the Z80 back in the 80s and loved it), but not as good as the 6809 and the ESR-L (PC-E500) that have two stack registers. What a luxury compared to the Z80!

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 


Messages In This Thread
FORTH for the PC-G850(V)(S) - robve - 11-06-2022 01:56 PM
RE: FORTH for the PC-G850(V)(S) - robve - 11-08-2022, 09:23 PM
RE: FORTH for the PC-G850(V)(S) - Sukiari - 11-09-2022, 07:45 PM
RE: FORTH for the PC-G850(V)(S) - robve - 11-10-2022, 09:36 PM
RE: FORTH for the PC-G850(V)(S) - xerxes - 11-11-2022, 01:55 AM
RE: FORTH for the PC-G850(V)(S) - robve - 11-12-2022, 12:28 AM
RE: FORTH for the PC-G850(V)(S) - robve - 11-12-2022, 06:26 PM
RE: FORTH for the PC-G850(V)(S) - robve - 11-13-2022, 10:15 PM
RE: FORTH for the PC-G850(V)(S) - robve - 11-14-2022, 07:35 PM
RE: FORTH for the PC-G850(V)(S) - robve - 11-29-2022, 07:56 PM
RE: FORTH for the PC-G850(V)(S) - robve - 12-07-2022, 03:07 AM
RE: FORTH for the PC-G850(V)(S) - robve - 01-01-2023, 05:45 PM
RE: FORTH for the PC-G850(V)(S) - F-73P - 01-03-2023, 03:32 AM
RE: FORTH for the PC-G850(V)(S) - robve - 01-03-2023, 06:57 PM
RE: FORTH for the PC-G850(V)(S) - F-73P - 01-05-2023, 07:52 AM
RE: FORTH for the PC-G850(V)(S) - robve - 05-20-2023, 11:11 PM
RE: FORTH for the PC-G850(V)(S) - robve - 07-31-2023, 01:53 AM



User(s) browsing this thread: 1 Guest(s)