Threaded Mode | Linear Mode

robve · 10-06-2021, 02:41 PM

(10-05-2021 01:02 PM)Klaus Overhage Wrote: In RUN mode: COPY "COM:" TO "E:debugger.fth",A
In Forth500: INCLUDE debugger.fth
Loading debugger...
-- after 90 second --
Loading debugger... Exception #-13

Exception #-13 stands for "undefined word"

The large debugger.fth is the one and original code example that Sébastien wrote for pceForth. It meta-interprets Forth to debug Forth. I made a minor change to it, but have not yet tested it. The debugger.fth file has a disclaimer:

\ Updated to Forth500 but may need some more testing!

When an error occurs, the error occurs in the definition of the first word displayed by WORDS. The definition is incomplete, so it cannot be executed. But WORDS displays all words, including incomplete ones.

There should be a better way by adding some code to display the -13 error message with the word that wasn't found. I will add that to my TODO list. The parsed input from a file is stored in the FIB buffer at position >IN.

Also, INCLUDE does not catch exceptions so the file may still be open and not closed. The file must be closed to read it again, e.g. try 4 CLOSE-FILE to close the fileid (typically 4). I will look into this too, to close automatically.

It would be nice to add more examples and additions to Forth500. For example, implementing SEE would be nice, e.g. to load from see.fth. These words should be optional, otherwise adding these as built-ins eats away free space. On an unexpanded machine the Forth500 free space will be about 7K after adding the floating point words. I don't want to push that down much lower. It was possible to reduce Forth500 below the 20628 bytes that pceForth required. I am pleased with that, because Forth500 adds words and new features (see the changelog in the repo).

- Rob

Helix · 10-06-2021, 11:35 PM

(10-06-2021 02:41 PM)robve Wrote: When an error occurs, the error occurs in the definition of the first word displayed by WORDS. The definition is incomplete, so it cannot be executed. But WORDS displays all words, including incomplete ones.

I've noticed that if there is an error in a definition, the corresponding word cannot be deleted from the dictionary with FORGET. I have to delete the previous valid definition if I want to get rid of this false entry. Is this normal?

(10-06-2021 02:41 PM)robve Wrote: It would be nice to add more examples and additions to Forth500. For example, implementing SEE would be nice, e.g. to load from see.fth. These words should be optional, otherwise adding these as built-ins eats away free space. On an unexpanded machine the Forth500 free space will be about 7K after adding the floating point words. I don't want to push that down much lower. It was possible to reduce Forth500 below the 20628 bytes that pceForth required. I am pleased with that, because Forth500 adds words and new features (see the changelog in the repo).

I'm too in favor of simple systems. When I tried different Forth packages some years ago, I found that F-PC Forth has 1523 words! I think this defeats the purpose of Forth.
Adding optional definitions for those who are interested is a better solution.

robve · (This post was last modified: 10-25-2021 08:52 PM by robve.)

(10-06-2021 11:35 PM)Helix Wrote: I've noticed that if there is an error in a definition, the corresponding word cannot be deleted from the dictionary with FORGET. I have to delete the previous valid definition if I want to get rid of this false entry. Is this normal?

Good point! Yep, this is normal. A word cannot be found (with FORGET, ' tick etc.) if it is hidden. To unhide and delete the last definition, use REVEAL then FORGET. Incomplete definitions are hidden to prevent accidentally running them, which would lead to a crash obviously. WORDS still shows all hidden words but FORGET can't find them.

FORGET implementations in Forth may slightly differ in this respect, but FORGET is considered obsolescent by the standard anyway. However, FORGET is still very useful as an easy way to redo a definition interactively. In general, placing a MARKER is preferred to delete code (see MARKER and ANEW). FORGET was removed by Sébastien from his pceForth (it was commented out). I rewrote parts of it to correct a bug in FORGET that caused a dictionary memory leak.

To go back to the question about improving error reporting, the word that caused the exception should be shown to the user with some context. This should be easy to implement by changing the last part of (ERROR) to report location of the error on the line by showing the line up to and including the word that caused the error:

SOURCE >IN @ UMIN TYPE ." << exception #" S>D (D.) TYPE

where SOURCE returns a pointer and size to the input buffer (TIB or FIB) and >IN is the location in this buffer of the next word after the last word executed. This will report the error in user input and in source files (albeit without a line number alas).

Closing a file after INCLUDE should be done as follows by catching INTERPRET exceptions in a new definition of INCLUDE-FILE:

Code:

: INCLUDE-FILE

    save-input n>r

    to source-id

    begin

      refill

    while

      ['] interpret catch ?dup if

        source-id close-file drop

        n>r restore-input drop

        throw

      then

    repeat

    source-id close-file drop

    nr> restore-input drop

This new definition uses updated SAVE-INPUT and RESTORE-INPUT combined with N>R and NR>, making the code of INCLUDE-FILE and EVALUATE more compact, thus saving some memory.

(10-06-2021 11:35 PM)Helix Wrote: I'm too in favor of simple systems. When I tried different Forth packages some years ago, I found that F-PC Forth has 1523 words! I think this defeats the purpose of Forth.
Adding optional definitions for those who are interested is a better solution.

Wow. 1523 words is way too much and unnecessary for most applications. However, at least we should incorporate the most useful standard word sets in Forth500 and include E500-specific words for graphics, sound and the file system. It is always possible to load extra words from source files.

I'd like to add that at this point anyone interested in this project can suggest improvements and additions to the Forth500 core, as long as there is sufficient space for user programs. Placing the additions in source files to load on demand is probably best.

As a note about standard Forth compliance, I noticed that there is no REQUIRE and REQUIRED implemented yet in Forth500. So I came up with the following quick-and-dirty implementation that simply stores the filename in the dictionary with a trailing blank in the name:

Code:

: REQUIRED

    which-pocket dup>r swap dup>r cmove

    bl 2r@ + c!

    2r@ 1+ find-word nip if r>drop r>drop exit then

    2r@ included

    2r> 1+ (created) ;

: REQUIRE

    parse-name required ;

Adding a trailing blank means that the filename word in the dictionary cannot clash with other words and cannot even be executed by accident. This word will be added after the file was successfully INCLUDED. ANEW and MARKER in the loaded file will also delete the filename, thus the next REQUIRE will load the file again if it was deleted from memory. It's a bit of a hack, but should work I believe. Note that the code above uses a new built-in system word (CREATED), which I added as a replacement of (LINK) and (NAME).

Updating the code and testing all of the additions and improvements will take a bit of time, but hopefully not too long.

I also would like to run the NQUEENS benchmark as suggested by xerxes. To this end, based on my understanding of the NQUEENS benchmarks I changed the NQUEENS Forth code slightly to use "nicer" standard Forth constructs, such as VALUE and by using POSTPONE to compile inlined versions of RCLAA and STOAA instead of calling them (disclaimer: this is yet untested on my end):

Code:

ANEW _NQUEENS_

 8 CONSTANT RR

 0 VALUE SS

 0 VALUE XX

 0 VALUE YY

 CREATE AA RR 1+ ALLOT

 : RCLAA POSTPONE AA POSTPONE + POSTPONE C@ ; IMMEDIATE

 : STOAA POSTPONE AA POSTPONE + POSTPONE C! ; IMMEDIATE

 : NQCORE

   0 TO SS

   0 TO XX

   BEGIN

     1 +TO XX RR XX STOAA

     BEGIN

       1 +TO SS

       XX TO YY

       BEGIN YY 1 > WHILE

         -1 +TO YY

         XX RCLAA YY RCLAA - DUP

         0= SWAP ABS XX YY - = OR IF

           0 TO YY

           BEGIN XX RCLAA 1- DUP XX STOAA 0= WHILE

             -1 +TO XX

           REPEAT

         THEN

       REPEAT

     YY 1 = UNTIL

   RR XX = UNTIL

 ;

 : NQUEENS

   ( STARTTIMER )

   NQCORE

   ." S=" SS

   ( DISPLAYTIMER ) CR

 ;

NQCORE should be run in a loop to execute multiple times to get a manual stopwatch timing (the E500 has no RTC or system clock).

There are still a few speed and code size optimizations to make in Forth500, which can affect this benchmark. Safety versus performance is an important consideration. For example, I opted to test for the BREAK key (+15 CPU cycles) and to check stack overflows (+19 CPU cycles) but only when necessary and not too frequently. BREAK key tests are only done when a colon definition is called and in loops. Stack overflow checks are done only in loops and when interpreting Forth code from an input source to keep the overhead low. I also reduced the overhead of stack checking to just 19 cycles with some coding tricks. Removing these tests speeds things up, but at the cost of possible runaway programs when a coding mistake was made.

Inspiration for picking up pceForth to create an updated version Forth500 came from SuperForth and Forth for HP-71B. Back in the 80s my first encounter and tryout with Forth was SuperForth for the QL by Garry Jackson. I learned the language, studied the Forth implementation in detail and wrote some stuff, but found the lack of file access to load files wanting. It supported blocks, which even at that time it felt like a huge step back to ancient times. So no blocks in Forth500, but blocks can be added from a source file if necessary.

There will be more to come soon, hopefully in a couple of days when I'm back to work on this project.

- Rob

Helix · 10-07-2021, 11:48 PM

(10-07-2021 05:55 PM)robve Wrote: Good point! Yep, this is normal. A word cannot be found (with FORGET, ' tick etc.) if it is hidden. To unhide and delete the last definition, use REVEAL then FORGET. Incomplete definitions are hidden to prevent accidentally running them, which would lead to a crash obviously. WORDS still shows all hidden words but FORGET can't find them.

Thank you! I still have a lot to learn about Forth. Smile

robve · 10-08-2021, 01:04 PM

(10-05-2021 01:02 PM)Klaus Overhage Wrote: Thank you Helix for BINTOTXT.EXE. From Forth500.bin it directly generates the 71k byte text file required for the MBSharpNotepad. And with your tip in the OPEN command to replace the parameter C with L, I can now use your original BASIC program except for this small change. The runtime has surprisingly remained at 18 minutes, it is probably given by MBSharpNotepad.

A 71KB file takes some time to load, but I'm surprised it takes 18 minutes. Have you set SIO to 9600 baud? If this can't be done faster, then I'm sticking with the cassette transfer method that takes 90 seconds.

(10-05-2021 01:02 PM)Klaus Overhage Wrote: Next I tried to load the file debugger.fth from the folder "additions".

In RUN mode: COPY "COM:" TO "E:debugger.fth",A
In Forth500: INCLUDE debugger.fth
Loading debugger...
-- after 90 second --
Loading debugger... Exception #-13

The dictionary search is not optimized in the original code. As a consequence the loading and compilation of Forth takes some time and the program you are loading is not small. The original pceForth code compares the word length and if equal compares the word's names. I've made the comparison case insensitive. This adds only a few cycles with some clever bit bashing in assembly and won't add overhead that is noticeable, because the chars compared typically differ in their lower 5 bits that are checked first:

Code:

                mv      (!el),il                ; Set the counter

lbl4:           mv      il,[x++]                ; Read next character of the current word string

                mv      a,[y++]                 ; Read next character of the searched string

                sub     a,il                    ; Compare the characters

                jrz     lbl5

; CASE-INSENSITIVE FIND-WORD (COMMENT OUT FOR CASE-SENSITIVE FIND-WORD)

                test    a,$1f                   ; If not the same 32-byte block ASCII offset, no match

                jrnz    lbl4a

                add     a,il                    ; Restore character

                or      a,$20                   ; Make it lower case

                cmp     a,'a'                   ; If less than 'a', no match

                jrc     lbl4a

                cmp     a,'{'                   ; If greater than 'z', no match

                jrnc    lbl4a

                sub     a,il                    ; Compare the characters again,

                test    a,$c0                   ; but this time with a case-insensitive match

                jrz     lbl5

; END CASE-INSENSITIVE FIND-WORD

However, the dictionary search can be optimized, like most Forth implementations. For example, the HP-71b limits searching based on the word length, thus checks dictionary entries for words of the same length only. Other implementations use trees or hashing. There are also simple and practical ways to speed up dictionary search, which I will try. For starters, comparing the length and the first character simultaneously to check a dictionary entry will speed things up.

- Rob

Klaus Overhage · 10-08-2021, 05:36 PM

Thank you for all the detailed information. Large files are slower with MBSharpNotepad, but editing and loading normal source texts, whether BASIC or Forth, is really easy with it. I tried the example on strings at the end of manual.md on two PC-E500s. Both times most of it worked only strlower and strupper not:

name type
John Doe
name strlower
Exception #-4
name type
John Doe
name strupper
Exception #-4
name type
éohn Doe

Exception #-4: "stack underflow"

Is Forth500 only generated from the assembler source code Forth500.s or are there other Forth source codes that are part of Forth500?

robve · 10-08-2021, 07:16 PM

(10-08-2021 05:36 PM)Klaus Overhage Wrote: I tried the example on strings at the end of manual.md on two PC-E500s. Both times most of it worked only strlower and strupper not:

There is a typo in the manual. I had tested these examples on my E500, but I remember adding this example later to the manual but then changed it (bad idea), unfortunately causing this issue. There is a missing DUP and a missing SWAP in strlower/strupper. A missing SWAP can unfortunately cause a dictionary overwrite. Here is the corrected version:

Code:

Additional words to convert characters and string buffers to upper and lower case:

    : toupper   ( char -- char ) DUP 'a '{ WITHIN IF $20 - THEN ;

    : tolower   ( char -- char ) DUP 'A '[ WITHIN IF $20 + THEN ;

    : strupper  ( string len -- ) 0 ?DO DUP I + DUP C@ toupper SWAP C! LOOP DROP ;

    : strlower  ( string len -- ) 0 ?DO DUP I + DUP C@ tolower SWAP C! LOOP DROP ;

For example:

    name strupper name TYPE ↲

    JOHN DOE OK[0]

(10-08-2021 05:36 PM)Klaus Overhage Wrote: Is Forth500 only generated from the assembler source code Forth500.s or are there other Forth source codes that are part of Forth500?

Everything is generated from the single Forth500.s file. This file implements the entire dictionary. Words are defined in machine code or in "compiled" Forth. I don't like such a large monolithic file like this, but the dictionary linkage across all word definitions is essential.

Eventually it would be nice to add Forth source files for extra words, such as for the SEE word to view compiled Forth definitions.

Right now, Forth500 has over 456 built-in words that cover a large portion of the optional standard Forth word sets, not yet counting the 63 words in the float and float-ext word sets to be added soon. All this still fits in about 20KB.

- Rob

robve · 10-10-2021, 01:38 AM

(10-08-2021 01:04 PM)robve Wrote: However, the dictionary search can be optimized, like most Forth implementations. For example, the HP-71b limits searching based on the word length, thus checks dictionary entries for words of the same length only. Other implementations use trees or hashing. There are also simple and practical ways to speed up dictionary search, which I will try. For starters, comparing the length and the first character simultaneously to check a dictionary entry will speed things up.

A quick update for those interested in this project, or in Forth, or in the E500's CPU.

The new FIND-WORD assembly code listed further below runs about twice as fast as the old FIND-WORD code (the version shown in the previous post). This means that case-insensitive dictionary searches in Forth500 should speed up quite a bit. Loading and compiling a Forth source file is largely determined by dictionary search speed.

The new CPU cycle stats compared to the old FIND-WORD, expressed in CPU cycles per word compared:

mismatching length: old = 54 cycles, new = 34 cycles
matching length but first characters differ: old = 108 cycles, new = 48 cycles
matching words, character-by-character comparison: old = 53 cycles, new = 43 cycles

The cost of a word length mismatch is 34 cycles. If the length matches, the cost of a first character mismatch is 48 cycles total (i.e. including the length match).

Assuming a directory size of 519 words (expected with Forth500), this means that a full dictionary search takes 23ms to 32ms or slightly longer, depending on the word being searched:
34x519/768KHz = 23ms
48x519/768KHz = 32ms

For example, an integer value 123 in the Forth source input matches the length of all 3-character words, but matches none of the words that start with a 1 thus taking 48x519 cycles to complete or 32ms. Explanation: all words, including integers, are first searched in the dictionary before pushed on the stack or compiled as an integer.

The new FIND-WORD assembly, annotated with CPU cycles (disclaimer: this may not be the final version):

Code:

find_word:      dw      to_body

                db      $09

                db      'FIND-WORD'             ; ( c-addr u -- 0 0 | xt 1 | xt -1 )

find_word_xt:   local

                mv      (!gl),a                 ; (gl) holds the string length (length < 64 checked next)

                mv      il,64                   ; Compare the string length

                sub     ba,i                    ; to the max of 63 characters

                popu    ba                      ; BA holds the string address

                pushu   x                       ; Save IP

                jrnc    lbl6                    ; String too long?

                mv      y,!base_address

                add     y,ba                    ; Y holds the string address

                mv      (!fl),[y++]             ; (fl) holds the first character of the string to search

                mv      (!yi),y                 ; (yi) holds the string address + 1

                mv      (!zi),y                 ; Set 2nd byte of (zi) to base address segment $b

                mvw     (!zi),[!last_xt+3]      ; (zi) holds the 20 bit LAST address

;               LOOP OVER DICTIONARY

lbl1:           mv      y,(!yi)         ; 5     ; Y holds the string address + 1

                mv      il,(!gl)        ; 4     ; IL holds the string length

                                        ; =9 cycles

;               NEXT WORD IN THE DICTIONARY

lbl2:           mv      x,(!zi)         ; 5     ; X holds the address of the dictionary entry

                or      (!zi),(!zi+1)   ; 6     ; Check if the address of the dictionary entry is zero

                jrz     lbl6            ; 2/3   ; Dictionary entry address is zero?

                mvw     (!zi),[x++]     ; 7     ; (zi) holds the previous dictionary link address

                mv      ba,[x++]        ; 5     ; A holds the word length and B holds the first character

;               COMPARE STRING LENGTHS

                sub     a,il            ; 3     ; Compare string lengths

                test    a,$7f           ; 3     ; Check string lengths, ignore immediate bit, keep smudge bit to force mismatch

                jrnz    lbl2            ; 2/3   ; String lengths are not the same?

                                        ; =33 cycles +1 for jump if the length does not match

;               COMPARE FIRST CHARACTERS

                ex      a,b             ; 3     ; B holds immediate bit to save for later, A holds first character

                xor     a,(!fl)         ; 4     ; Compare first characters

                jrz     lbl4            ; 2/3   ; First characters match?

                test    a,$df           ; 3     ; Check if case insensitive bits match

                jrnz    lbl2            ; 2/3   ; Case insensitive characters differ?

                                        ; =33+14=47 cycles +1 for jump if the length does not match and the first character did not match

                mv      a,(!fl)         ; 3     ; A holds the first character of the string to search

                or      a,$20           ; 3     ; Make it lower case (if A is a letter, checked next)

                cmp     a,'a'           ; 3

                jrc     lbl2            ; 2/3   ; A is not a letter?

                cmp     a,'{'           ; 3

                jrnc    lbl2            ; 2/3   ; A is not a letter?

                dec     il              ; 3     ; Decrement string length

                jrz     lbl5            ; 2/3   ; String length is zero?

                                        ; =47+22=69 cycles if the length matched and the first character matched

;               LOOP OVER STRINGS TO COMPARE

lbl3:           mv      a,[x++]         ; 4     ; A holds the next charater of the word

                mv      (!el),[y++]     ; 6     ; (el) holds the next character of the string to match

                xor     a,(!el)         ; 4     ; Compare characters

                jrz     lbl4            ; 2/3   ; Characters match?

                test    a,$df           ; 3     ; Check if case insensitive bits match

                jrnz    lbl1            ; 2/3   ; Case insensitive characters differ?

                mv      a,(!el)         ; 3     ; A holds the next character of the string to match

                or      a,$20           ; 3     ; Make it lower case (if A is a letter, checked next)

                cmp     a,'a'           ; 3     ; A is not a letter?

                jrc     lbl1            ; 2/3

                cmp     a,'{'           ; 3     ; A is not a letter?

                jrnc    lbl1            ; 2/3

lbl4:           dec     il              ; 3     ; Decrement string length

                jrnz    lbl3            ; 2/3   ; String length is not zero?

                                        ; =43 cycles for each subsequent character matched

;               FOUND A MATCHING WORD IN THE DICTIONARY

lbl5:           add     ba,ba                   ; Check immediate bit stored in B

                mv      ba,x                    ; BA holds the execution token

                popu    x                       ; Restore IP

                pushu   ba                      ; Save new 2OS execution token

                mv      ba,-1                   ; Set new TOS to -1, word is not immediate

                jrnc    lbl7                    ; Immediate bit is unset?

                mv      ba,1                    ; Set new TOS to 1, word is immediate

                jr      lbl7

;               NOT FOUND

lbl6:           popu    x                       ; Restore IP

                sub     ba,ba                   ; Set TOS to zero

                pushu   ba                      ; Set 2OS to zero

lbl7:           jp      !cont__

The new code is only one byte longer when assembled to binary than the old code!

- Rob

Klaus Overhage · 10-10-2021, 09:11 AM

I am able to read and understand assembly language. I have already written smaller assembly routines and can certainly learn a lot from you. At the moment, however, I would first like to report on my experiences from the point of view of a Forth500 user.

Your implementation of toupper and tolower shown above does not work for me. It throws exception # -13: undefined word with toupper as the first entry in the dictionary. The old implementation with the addition of DUP and SWAP works:

Code:

: toupper   ( char -- char ) DUP [CHAR] a [CHAR] { WITHIN IF $20 - THEN ;

: tolower   ( char -- char ) DUP [CHAR] A [CHAR] [ WITHIN IF $20 + THEN ;

: strupper  ( string len -- ) 0 ?DO DUP I + DUP C@ toupper SWAP C! LOOP DROP ;

: strlower  ( string len -- ) 0 ?DO DUP I + DUP C@ tolower SWAP C! LOOP DROP ;

I was also able to run the RC4 cipher program from the wikipedia entry for Forth without any problems.
(see https://en.wikipedia.org/wiki/Forth_(pro..._language)

When using WORDS, it happened to me that a BREAK via the ON key crashed the computer. I think that's what happens when the ON button bounces. And there is always an exception #-28 for "user interrupt", which is not so nice. Is it possible to use another key especially for WORDS, for example C-CE, for normal exit without exception? That would help a lot if you just want to look at the new words.

robve · (This post was last modified: 10-10-2021 04:48 PM by robve.)

(10-10-2021 09:11 AM)Klaus Overhage Wrote: And there is always an exception #-28 for "user interrupt", which is not so nice. Is it possible to use another key especially for WORDS, for example C-CE, for normal exit without exception?

Good idea! This can also improve a break from FILES.

With respect to your issue with BRK from WORDS, a debounce loop is used. I'm curious what the problem could be. I have not had this problem. Perhaps the timing of the second BRK bounce exceeded the debounce timing, implemented as follows:

Code:

break__:        local

                pre_on

lbl1:           mv      il,$ff                  ; Test if the break

lbl2:           test    ($ff),$08               ; key was intentionally

                jrnz    lbl1                    ; released

                dec     i                       ; (break action is triggered

                jrnz    lbl2                    ; when the break key is released)

                pre_off

                endl

                mv      il,-28                  ; User interrupt

test ($ff),$08 sets the z flag if BRK is not pressed. The debounce time is 4.3ms (13x255 cycles), which is rather short. A typical debounce time is 20ms or longer. Increasing the timing to 20ms should help. Also the inner jrnz lbl1 was changed to reset the debounce counter when a key bounce/hit reoccurs.

With respect to Forth source file loading time, a relatively large file such as debugger.fth should take no more than about 30 seconds to compile with the new FIND-WORD. This can be further reduced to a couple of seconds, but this requires a redesign of the dictionary. A simple approach is the HP-71b implementation, which does not offer WORDS (or similar). This simplifies the search, because the order of dictionary words does not need to be preserved across the entire dictionary, only the relative order of words with the same name length. A hybrid approach could work well: limit WORDS to only list the user-defined words (and words loaded with INCLUDE). Built-in words are searched by name length to speed up compilation. This hybrid approach works with FORGET and MARKER, and does not require memory to store trees. Adding a small index table to search built-in words suffices. However, WORDS will not show the built-in words.

- Rob

Helix · 10-10-2021, 01:36 PM

I'm not at all an expert in assembly language, but I find the explanations on how the system works always interesting.

(10-10-2021 09:11 AM)Klaus Overhage Wrote: When using WORDS, it happened to me that a BREAK via the ON key crashed the computer.

I have no crash with my Sharp. A Break just causes an exception error.

robve · (This post was last modified: 10-16-2021 01:04 AM by robve.)

(10-10-2021 01:36 PM)Helix Wrote:
(10-10-2021 09:11 AM)Klaus Overhage Wrote: When using WORDS, it happened to me that a BREAK via the ON key crashed the computer.

I have no crash with my Sharp. A Break just causes an exception error.

I believe what may have happened is that the missing SWAP in the strupper example overwrote the start of the dictionary that contains the break logic. This caused instability. My bad to leave out the SWAP in the example.

I'll take this opportunity for a quick update.

I spent a bit of time to redesign the core Forth interpreter assembly to improve execution speeds. It looks feasible to accelerate Forth500 as follows:

- colon call and return sequence (docol__xt + doret__xt): 22% faster
- fetch-execute (cont__): 13% faster
- deferred word vectoring (dodefer__xt): 23% faster
- constant fetch (docon__xt): 16% faster
- does> execution (does__xt): 17% faster

The redesign uses a RAM register to extend 16 bit addresses to 20 bit by presetting the 3rd byte (high order byte) to the 11th segment $b of the memory address space (the CPU is little endian). This is cheaper than the current method of converting a 16 bit register to a 20 bit register. These 16 to 20 bit conversions happen a lot, because Forth500 cells are 16 bit when the machine is 20 bit.

The register assignments remain the same as before:
20 bit register X holds the IP (instruction pointer)
20 bit register U holds the SP (stack pointer)
20 bit register S holds the RP (return stack nointer)
16 bit registers BA (A low and B high) hold the TOS (top of stack)

Other registers available:
20 bit register Y
16 bit register I, assigning IL (I low) also sets IH (I high) to zero

Internal RAM is addressed as (N) with 8 bit N. Internal RAM can hold 8, 16 and 20 (24) bit values to load/store to/from registers and to/from external RAM.

To cover 16 bit to 20 bit addresses, we load a 16 bit address into a RAM "register", say (yi) and (yi+1) (two bytes internal RAM). We set and keep (yi+2) to $b (segment). To get the 20 bit address we simply load X from (yi).

The changes to the core Forth500 execution words are summarized in this outline:

Code:

yi:             equ     $36

zi:             equ     $39

ps:             equ     $b                      ; 11th segment

base_address:   equ     $b0000                  ; 11th segment address

;-------------------------------------------------------------------------------

                org     $b9000                  ; $b0000 or $b1000 or $b9000 ...

;-------------------------------------------------------------------------------

                pre_off

boot:           ;...

                mv      (!yi+2),!ps             ; Store segment in 3rd byte

                mv      (!zi+2),!ps             ; Store segment in 3rd byte

                ;...

;-------------------------------------------------------------------------------

docol__xt:      mv      i,x             ; 2     ; I holds the IP

                pushs   i               ; 6     ; Push IP (return address)

                pmdf    (!yi),3         ; 4     ; Set new IP

                mv      x,(!yi)         ; 5     = 17 cycles

;---------------

interp__:       pre_on                  ; cycles = 7 + 13 = 20

                test    ($ff),$08       ; 5     ; Is break pushed?

                pre_off

                jrnz    break__         ; 2/3   ; Break was pushed

;---------------                        ; cycles = 13

cont__:         mvw     (!yi),[x++]     ; 7     ; Set (yi) to new execution token

                jp      (!yi)           ; 6     ; Execute new token

;-------------------------------------------------------------------------------

break__:        ;...

;-------------------------------------------------------------------------------

doret__xt:      mvw     (!yi),[s++]     ; 7     ; Pop IP (return address)

                mv      x,(!yi)         ; 5     ; X holds the IP

                mvw     (!yi),[x++]     ; 7     ; Fetch new execution token

                jp      (!yi)           ; 6     = 25 cycles versus 30

;-------------------------------------------------------------------------------

dovar__xt:      pushu   ba              ; 4     ; Save old TOS

                pmdf    (!yi),3         ; 4     ; Set new TOS

                mv      ba,(!yi)        ; 4     ; to the address of the data

                jr      !cont__         ; 3     = 15+13 cycles versus 15+15

;-------------------------------------------------------------------------------

docon__xt:      pushu   ba              ; 4     ; Save TOS

                mv      ba,[(!yi)+3]    ; 12    ; Set new TOS

                jr      !cont__         ; 3     = 19+13 cycles versus 23+15

;-------------------------------------------------------------------------------

dodefer__xt:    mvw     (!yi),[(yi)+3]  ; 14

                jp      (!yi)           ; 6     = 20 cycles versus 26

;-------------------------------------------------------------------------------

does__xt:       pushu   ba              ; 4     ; Save TOS

                pmdf    (!yi),3         ; 4     ; Set new TOS

                mv      ba,(!yi)        ; 4     ; to the address of the data

                mvw     (!yi),[s]       ; 7     ; The CALL does__xt return short address is the execution token

                mv      i,x             ; 2     ; I holds the IP

                mv      [s],i           ; 5     ; Push old IP

                mv      x,(!yi)         ; 5     ; Set new IP

                jp      !cont__         ; 4     = 35+13 versus 43+15

The pieces of this puzzle nicely fall in place, which is satisfying. I've used some of the more exotic instructions, such as PMDF (pointer modify) that operates on internal RAM 20 bit addresses, and JP (yi) to jump to the 20 bit address in (yi).

A colon-return sequence is reduced to 66 cycles from 79: 4 JP docol__xt + 17 (docol__xt) + 20 (interp__) + 25 (doret__xt). This is the execution overhead of a word defined as a colon definition and includes a check for a BREAK key press to interrupt execution. A colon definition internally in the dictionary starts with a JP docol__xt. A constant starts with JP docon__xt, a variable starts with JP dovar__xt.

A word fetch-execute overhead is reduced to 13 cycles from 15. This is the fetch-execute overhead of words defined in assembly, by fetching them as 16 bit addresses to execute by jumping to their machine code located at a 20 bit address.

I want to first roll out the floating-point addition, fully working and tested, for the next Forth500 update to the repo in two weeks (or so, because I need to make time for this). I will focus later on implementing further optimizations to speed up Forth500, e.g. using the outline above.

PS (edit): from the details of the CPU technical manual, PMDF may not perform the operation on a 20 bit pointer stored in internal RAM but rather on a single byte internal RAM pointer to internal RAM. Oops. This raises the cycle count to 71 from 79 by using inc x three times. Still a worthwhile speed improvement to consider.

- Rob

Klaus Overhage · (This post was last modified: 10-26-2021 07:39 PM by Klaus Overhage.)

How to use LOADM, UUENCODE, UUDECODE and the rediscovered SAVEM to load FORTH500.

If you have a PC-E500 with 256k RAM and somehow managed to load the expanded version of FORTH500, you can use the following command to save a copy as a file on the RAM disk:
SAVE M "E:FORTH500", &B0000, &B480B
SAVE M "Filename", start address, end address
The end address results from:
Start address + program length - 1, i.e. &B0000 + 18444 - 1 = &B480B

The file is reloaded later with LOADM "FORTH500". The two spellings LOADM and LOAD M or SAVEM and SAVE M each have the same effect.

The file created with SAVEM is 16 bytes longer than the program length: The "FORTH500" file has 18460 bytes. These 16 bytes are at the beginning of the file and contain, among other things, the program length and the start address. I tried to load FORTH500.BIN and FORTH500.OBJ with LOADM and got (luckily) "I/O error" as an answer. Only SAVEM generates the header required for LOADM. It would be nice if there was also a PC program for this ...

With the help of the UUCODE program mentioned by Helix from http://www.it-pulse.eu/sharp-pc-e500s, the file FORTH500.UUE attached below was created from the file saved with SAVEM. It went like this:

Load UUENCODE.BAS into the PC-E500 and change a line:
Before: 130 FNAME $ = "UUENCODE"
After: 130 FNAME $ = "UUENCODE.MMM"
Without this change, a file is saved on the ramdisk, which can't be loaded or deleted. Then run the program that has just been changed with RUN:
UUENCODE SELF-DECODER
DATA_FILE = 'UUENCODE.MMM' OK?
(Y / N) = Y <Return>
success
There is now a new file UUENCODE.MMM with 1760 bytes. The BASIC program currently in use can be deleted with NEW.
LOAD M "UUENCODE.MMM"
CALL &BE000 "FORTH500
uuencode V1.0 by E.Kako
after 15 seconds.
encoded.
There is now a new file FORTH500.UUE with 26303 bytes, which is attached below.

FORTH500 can now also be installed in a PC-E500 as follows:
COPY "COM:" TO "E: FORTH500.UUE", A
duration: 6 minutes 48 seconds
Load UUDECODE.BAS into the PC-E500 and change a line:
Before: 130 FNAME $ = "UUDECODE."
After: 130 FNAME $ = "UUDECODE.MMM"
Then run the program that has just been changed with RUN:
UUENCODE SELF-DECODER
DATA_FILE = 'UUDECODE.MM'
OK? (Y / N) = Y <Return>
success
There is now a new file UUDECODE.MMM with 1446 bytes. The BASIC program currently in use can be deleted with new.
LOAD M "UUDECODE.MMM"
CALL &BE000"FORTH500.UUE
uudecode V1.1 by E.Kako
filename = 'E: FORTH500.'
after approx. 3 minutes
decoded.
There is now a new file "FORTH500" with 18460 bytes, which can be loaded with LOADM "FORTH500" and run with CALL &B0000.

Klaus Overhage · 10-28-2021, 03:11 PM

It is not necessary at all to transfer the 26k byte file FORTH500.UUE to the PC-E500 in order to convert it into the program file with UUDECODE. Both steps can be done together:

On the PC-E500, load UUDECODE into the area for machine language programs as shown above.
LOAD M "UUDECODE.MMM"

On the PC, open FORTH500.UUE in the terminal program (MBSharpNotepad for me) and prepare for sending to the Sharp.

Start UUDECODE on the PC-E500 with the parameter COM: and start sending on the PC.
CALL &BE000"COM:
uudecode V1.1 by E.Kako
filename = 'E: FORTH500.'
after approx. 6 minutes 52 seconds
decoded.

This way you get to the file "FORTH500" 3 minutes faster and you need 26K less space in the RAM drive!

robve · 10-29-2021, 04:03 AM

(10-28-2021 03:11 PM)Klaus Overhage Wrote: Start UUDECODE on the PC-E500 with the parameter COM: and start sending on the PC.
CALL &BE000"COM:
uudecode V1.1 by E.Kako
filename = 'E: FORTH500.'
after approx. 6 minutes 52 seconds
decoded.

This way you get to the file "FORTH500" 3 minutes faster and you need 26K less space in the RAM drive!

Wow! That's almost 3x faster than the 18 minutes with the conversion program in Basic Smile

I'm almost finished with the next Forth500 update (I had to take a break for travel.) The floating point implementation works, except that I have some trouble with one system call to convert decimal values stored in strings to the internal BCD 20 digit float format. These "function driver" system calls are very lightly documented. I spent more time debugging the function driver calls by comparing memory dumps than actually writing assembly code.

- Rob

Helix · 10-29-2021, 11:46 PM

(10-26-2021 05:45 PM)Klaus Overhage Wrote: The file created with SAVEM is 16 bytes longer than the program length: The "FORTH500" file has 18460 bytes. These 16 bytes are at the beginning of the file and contain, among other things, the program length and the start address. I tried to load FORTH500.BIN and FORTH500.OBJ with LOADM and got (luckily) "I/O error" as an answer. Only SAVEM generates the header required for LOADM. It would be nice if there was also a PC program for this ...

What would be the goal of such a PC program? Even with an adequate header, a binary file cannot be transferred, because only ASCII files are accepted for serial cable.

Quote:With the help of the UUCODE program mentioned by Helix from http://www.it-pulse.eu/sharp-pc-e500s, the file FORTH500.UUE attached below was created from the file saved with SAVEM. It went like this:

Thank you for experimenting with the uuencode and uudecode programs. Currently I am satisfied with the method I use for loading Forth500, but it can be useful later.

Klaus Overhage · 10-30-2021, 07:09 AM

Helix, please, can you write a DOS program that reads the Forth500.bin file, determines its length and first writes the following 16 bytes and then appends a copy of Forth500.bin and names the result "Forth500" without an extension. If robve provides an update, I and whoever want could then use the DOS program UUENCODE.EXE from the uucode.zip archive on http://www.it-pulse.eu/sharp-pc-e500s to convert "Forth500" into the file "FORTH500.UUE".

The 16 bytes are:
255 0 6 1 16 Low High 0 0 0 11 255 255 255 0 15

At the moment, Forth500.bin is 18444 bytes long. The low byte is 12 and the high byte is 72: 72 * 256 + 12 = 18444

I loaded some of the other programs from it-pulse with the help of UUDECODE and looked at their first 16 bytes. It seems that most bytes always have the same value. The 9th to 11th byte contains low mid high of the start address, which is why they have the values 0 0 11 for &B0000 for the expanded version of Forth500.

Helix · 10-30-2021, 09:12 PM

You can try this DOS program.

Klaus Overhage · (This post was last modified: 11-01-2021 08:10 AM by Klaus Overhage.)

Thank you very much Helix, your program BINTOPCE.EXE works wonderfully in the DOSBox.

bintopce forth500.bin
File forth500 has been created.

uuencode forth500
UUencoding file forth500
....

The file FORTH500.UUE created in this way has a size of 26038 bytes. It is therefore slightly smaller than the file created on the PC-E500 with UUENCODE.MMM, which is because UUENCODE.MMM adds a character as a checksum at the end of each line. This is what the DOS program does when you call it with the -l parameter:
uuencode -l forth500
With uuencode -? it shows the list of all parameters. UUDECODE.MMM does not matter whether with or without a checksum , it always works:
LOADM "UUDECODE.MMM"
CALL &BE000"COM:
after 6 minutes 48 seconds for the version without checksums per line
decoded.
LOADM "FORTH500"
CALL &B0000
** Welcome to Forth! 45556 bytes free **

Now I'm really looking forward to the new version with its many extensions that robve is currently working on. Hopefully he cracks the secret of the last system call. I love Forth500, it is great for learning Forth and trying out examples from books or the internet. To do this, I put the PC-E500 on a stand bought for the HP-71B and glued HP-41/71 rubber feet under the Sharp computer at the appropriate places.

Helix · (This post was last modified: 11-01-2021 01:09 AM by Helix.)

I've tried to load forth500 with this method, and I confirm it works! Furthermore, the transfer is very fast: with my cable and Hterm, forth500.uue was loaded in only 1 minute and 4 seconds! I think I've set a new record. Big Grin

On the other hand, the entire sequence of operations took me more than 7 minutes. Wink

It's interesting, anyway.

I agree that learning Forth on this small machine is very enjoyable.
Very clever stand!