HP-71 BASIC funny quirks
01-27-2023, 08:55 AM
Post: #1
 J-F Garnier Senior Member Posts: 751 Joined: Dec 2013
HP-71 BASIC funny quirks
In a recent thread, I pointed out a little annoyance of the HP-71B BASIC: small constant values in programs are decompiled in STD format, e.g. 1E-10 is decompiled as 0.0000000001

Spending some more time on it (including a study of the source code), I found these other rather strange effects:

1. In STD mode, results are displayed without scientific notation if they completely fit within 12 signs (plus possible sign and decimal point):
e.g. 0.123456789012 is display as ".123456789012"
but 0.0123456789012 causes a switch to SCI mode and is displayed as "1.23456789012E-2"
This is different from the constant values in a program that can have more than 12 signs, for instance:
10 C=.123456789012
11 C=.0123456789012
12 C=.00123456789012
or even more weird enter "15 C=1.23456E-10" and see "15 C=.000000000123456"

2. Related to the point above, there is this real bug of the HP-71B 1BBBB:
enter "50 C=1.234567E-10" and see "50 C=.000000"
The bug is referenced as 1148-6 in the HP-71B bug list. It has been "fixed" on the 71B 2CDCC, but the fix is more a workaround to correct the visible effect of the bug and doesn't implement what was, probably, the initial intent and so produces this unexpected effect:

3. Enter the program line "21 C=1E-10", and get:
21 C=.0000000001
now enter "22 C=1E-11", and get:
22 C=.00000000001 on the version 1BBBB as expected according to the 71B logic,
but surprisingly we get:
22 C=1.E-11 on the HP-71B version 2CDCC.

A few more harmless quirks:

4. A value such as 1E12 is displayed as "1.E12" if it's a numeric result , but as "1.E+12" with an explicit exponent sign if it's a constant in a program:

5. Leading zeros are ignored but not trailing zeros:
enter "25 C=000123.123000"
and see:
25 C=123.123000

6. Key in line "30 C=1E10"
and take note of the program size with CAT,
then fetch line 30, see it displayed as "30 C=10000000000" and validate it again by ENDLINE,
the program size, reported by CAT, is now increased by four bytes.
You can revert to the previous size by entering the line again as "30 C=1E10"

J-F
01-27-2023, 02:56 PM
Post: #2
 Albert Chan Senior Member Posts: 2,101 Joined: Jul 2018
RE: HP-71 BASIC funny quirks
(01-27-2023 08:55 AM)J-F Garnier Wrote:  2. Related to the point above, there is this real bug of the HP-71B 1BBBB:
enter "50 C=1.234567E-10" and see "50 C=.000000"
The bug is referenced as 1148-6 in the HP-71B bug list. It has been "fixed" on the 71B 2CDCC, but the fix is more a workaround to correct the visible effect of the bug and doesn't implement what was, probably, the initial intent ...

Hi, J-F Garnier

What do you mean a "fix" (with quotation mark), and not a fix?
What is decompiler's initial intent?

Perhaps a stupid question ...

Why do we even need a decompiler, if program already human readable?
Why not just saved the BASIC program, the way we type it in?
01-27-2023, 04:21 PM (This post was last modified: 01-28-2023 08:57 AM by J-F Garnier.)
Post: #3
 J-F Garnier Senior Member Posts: 751 Joined: Dec 2013
RE: HP-71 BASIC funny quirks
(01-27-2023 02:56 PM)Albert Chan Wrote:  What do you mean a "fix" (with quotation mark), and not a fix?
What is decompiler's initial intent?

I put fix into quotes because it seems to me (looking at the source code of 1BBBB and actual 2CDCC code) that it doesn't fix the problem but just hides it "under the carpet".

The initial intent was very likely to have a constant value embedded in program lines looks exactly as the STD output for the same number. This is how previous machines, such as the HP-75, are doing.
But there was a bug in rev. 1BBBB, that I can explain in these words: the code is combining the exponent of the number and the number of significant digits to check if the number fits in the 12-digit FIX format. The bug is that it wrongly combines a BCD number (the number's exponent) and a binary number (the number of digits). A classic bug when programming on HP processors (Classic, Nut, Saturn). So the test is inefficient and this explains that we can get constants in a FIX-like format and more than 12 digits.

For the curious people out there, the code is in the Expression Decompile module (SD&EXD):

The 2CDCC "fix" just replaces the opcode LCHEX 012 by LCHEX 00F at 05EEE.

Quote:Perhaps a stupid question ...

Why do we even need a decompiler, if program already human readable?
Why not just saved the BASIC program, the way we type it in?

Not a stupid question at all.
On the contrary, it is the opportunity for me to elaborate.

The HP BASIC program lines are not directly readable, they are in token form, moreover the numeric expressions are stored in an internal RPN form. This explains why extra parentheses you may type in an expression are not visible when reading back the line, because internally there are no parenthesis stored, they are reconstructed when the line is displayed, and this process was called decompilation by HP, even if it's a combination of detokenisation (for keywords) and RPN-to-algebraic conversion (for internal RPN-coded expressions).

The benefit of the tokens and RPN forms is a more compact and faster executing code.

J-F

Attached File(s) Thumbnail(s)

01-27-2023, 05:31 PM
Post: #4
 Valentin Albillo Senior Member Posts: 912 Joined: Feb 2015
RE: HP-71 BASIC funny quirks
J-F Garnier Wrote:The HP BASIC program lines are not directly readable, they are in token form, moreover the numeric expressions are stored in an internal RPN form.

The earlier HP-85 computer's BASIC dialect did exactly that, tokenized keywords and algebraic expressions internally converted to RPN-like form upon entering the line, and stored that way, to be converted back to algebraic form upon listing or otherwise outputting the line as text.

This gave problems at times because of the reconstructed line (de-tokenized and with expressions converted back to algebraic form) not exactly matching what you entered, and also because the restored lines typically used more bytes, which could cause Insufficient Memory-like errors while listing the lines in situations of very low memory available.

Quote:The benefit of the tokens and RPN forms is a more compact and faster executing code.

Yes, and I was delighted when I got my SHARP PC-1211 back in the early 80's and saw that it tokenized the keywords so that INPUT or DEGREES, say, would only use 1 byte of RAM instead of the 5 or 7 required by the usual BASIC dialects of the time.

The tokenization also meant faster program execution, and SHARP pointed out the memory advantages in their brochure "Con memoria equivalente a 4 Kbytes" (i.e. you could do with the built-in 1.4 Kbytes what would require 4 Kbytes using non-tokenizing BASIC dialects.)

Regards.
V.

All My Articles & other Materials here:  Valentin Albillo's HP Collection

01-27-2023, 05:50 PM
Post: #5
 Massimo Gnerucci Senior Member Posts: 2,519 Joined: Dec 2013
RE: HP-71 BASIC funny quirks
(01-27-2023 05:31 PM)Valentin Albillo Wrote:  Con memoria equivalente a 4 Kbytes

This translates verbatim in italian.

:)

Greetings,
Massimo

-+×÷ ↔ left is right and right is wrong
01-27-2023, 10:35 PM
Post: #6
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: HP-71 BASIC funny quirks
(01-27-2023 05:31 PM)Valentin Albillo Wrote:  Yes, and I was delighted when I got my SHARP PC-1211 back in the early 80's and saw that it tokenized the keywords so that INPUT or DEGREES, say, would only use 1 byte of RAM instead of the 5 or 7 required by the usual BASIC dialects of the time.

The tokenization also meant faster program execution, and SHARP pointed out the memory advantages in their brochure "Con memoria equivalente a 4 Kbytes" (i.e. you could do with the built-in 1.4 Kbytes what would require 4 Kbytes using non-tokenizing BASIC dialects.)

Regards.
V.

One of the interesting design choices by SHARP for all of their early Pocket Computers in the 80s was to preallocate variables A-Z and A$-Z$ in memory (before Casio copied the same approach.) Hence, running a BASIC program will never produce an out-of-memory runtime error for these variables. The downside is that A$-Z$ are the same as A-Z and strings can only can contain up to 7 characters. The A(n) (or @(n)) array provides extra space, which can produce an out-of-memory error and so does the use of variables AA-ZZ,A0-Z9 and AA$-ZZ$,A0$-Z9$. Casio requires DEFM, so it won't run out of memory at runtime but this approach lacks SHARP's flexibility to dynamically extend array A(n) as needed by the program. But there is also DIM on most SHARPs.

I'm curious which BASIC dialects/machines you're referring to that stored executable programs in non-tokenized form in memory? The inventors of BASIC used a semi-compiled tokenized format for storing programs for example Tokenization with one byte is quite common (e.g. the C64) or two bytes per token. Some BASIC dialects retain program spacing by storing space "tokens", others remove extra spacing altogether to save space.

Fully compiled BASIC instead of tokenized for interpretation is a different matter altogether, of course.

Some Sharp models use two bytes per token to extend the range of tokens. PC-1360 also uses two bytes whereas PC-1350 always uses one byte. This can be a problem when CLOADing PC-1350 programs into a PC-1360 as one may get an error when a tokenized line no longer fits the BASIC line length limit.

Fortunately, we have better programming languages these days.

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
01-28-2023, 02:55 AM
Post: #7
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: HP-71 BASIC funny quirks
(01-27-2023 04:21 PM)J-F Garnier Wrote:  The HP BASIC program lines are not directly readable, they are in token form, moreover the numeric expressions are stored in an internal RPN form.

The benefit of the tokens and RPN forms is a more compact and faster executing code.

I'm not sure if I buy that RPN tokenization of BASIC algebraic expressions results in a noticeable faster execution speed.

Let's take a closer look. By contrast, a "standard" BASIC interpreter uses operator precedence parsing (an improved form of the simplistic shunting yard algorithm) to evaluate expressions, which is known to be a simple and efficient algorithm. It scans each token only once from the left to the right in the input. Similar to LR(1) parsing, an operator precedence parser uses a stack to decide to reduce or shift a token on the operator stack. A reduction applies an operator to its operands (operands are on the parameter stack, like RPN). The only extra overhead is pushing and popping a byte (the operator token) from an operator stack and consult a small table with operator precedences to decide if an operator should be applied or should be applied later. For example a+b*c results in a and b on the parameter stask and + on the operator stack. Then * is compared to +. Since * has higher precedence, + is not yet applied and c is pushed on the stack. Then the operator stack is unwound by applying * and +. Some happens with parenthesis. What is this extra cost of a push/pop per operator? Perhaps tens of extra CPU cycles. Much less than applying the operator to a BCD floating point number, which takes hundreds or even thousands of CPU cycles on an 4 or 8 bit machine with floating point in software rather than floating point hardware.

Secondly, I'm not sure if RPN tokenization is worth it with respect to BASIC program complexity. It adds a lot more complexity to the tokenization and decompilation steps. This can slow down program editing, i.e. to store and recall a program line to edit. Nor makes it a program a lot smaller (operations and operands are the same, just reordered in memory), unless a lot of parenthesis are present and needed in the algebraic formulation of an expression, since RPC does use them, of course. And most expressions in BASIC programs are rather short, like one or two operators applied to two or three operands, such as X = X+1 for example.

So why does the HP-71B use RPN tokenized BASIC? It's a wonderfully over engineered machine. That's why. And that's why I like it

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
01-28-2023, 04:03 AM
Post: #8
 brouhaha Senior Member Posts: 318 Joined: Dec 2013
RE: HP-71 BASIC funny quirks
Either shunting yard or operator precedence parsing of expressions is fine, or even recursive descent, but it's far from the whole story. Even ignoring overall statement syntax, dealing with variable names, which have to be looked up, is slower than dealing with variable addresses assigned at statement entry time (even if a hash table is used at runtime), and dealing with numeric values that weren't parsed at statement entry is also very time consuming. And even though the rest of the parsing is fairly efficient, it's still usually a win to do it at entry time rather than every time the statement is executed.

Many years ago I had empirical data on 8-bit machines (6502 and Z80) that even just crude keyword tokenization like Microsoft BASIC interpreters used significantly sped up BASIC interpreters. I don't hink I have any of my old notes on that.

What HP-71B BASIC does, and Apple II Integer BASIC (by Woz, not by Microsoft) does, goes far beyond the simple string tokenization of Microsoft BASIC, and yields an even bigger execution speedup, because the symbol table is constructed, no variables have to be searched for at runtime, and numeric constants are already in native binary form. This is a huge win over the Microsoft style keyword tokenization, not just "over engineered".

It's not just the little microprocessors that benefit from this, either. These techniques were developed for speeding up interpreters on mainframes in the 1960s and minicomputers in the 1970s. HP 2000 Timeshared BASIC, implemented on HP's 2116, 2100, and 21MX 16-bit minicomputers. used similar techniques, though not quite as advanced as the HP-71B or Apple Integer BASIC, and it was an enormous win.
01-28-2023, 12:39 PM
Post: #9
 Albert Chan Senior Member Posts: 2,101 Joined: Jul 2018
RE: HP-71 BASIC funny quirks
Thank you all for your informed explanations. I am learning a lot!

If I understand correctly, the process of tokenizing HP71B program, we lose the source.
That's why de-compiler was needed, to reconstruct the source, as close as possible.
This save memory, at the cost of slower program listing. (reconstruction takes time)

(01-27-2023 04:21 PM)J-F Garnier Wrote:  The initial intent was very likely to have a constant value embedded in program lines looks exactly as the STD output for the same number ... The bug is that it wrongly combines a BCD number (the number's exponent) and a binary number (the number of digits). A classic bug when programming on HP processors (Classic, Nut, Saturn). So the test is inefficient and this explains that we can get constants in a FIX-like format and more than 12 digits.

Still, I was a bit unclear.
If the goal is STD mode for all numbers, why is the test necessary?

No test, no bug?
01-28-2023, 02:51 PM
Post: #10
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: HP-71 BASIC funny quirks
(01-28-2023 04:03 AM)brouhaha Wrote:  Either shunting yard or operator precedence parsing of expressions is fine, or even recursive descent, but it's far from the whole story. Even ignoring overall statement syntax, dealing with variable names, which have to be looked up, is slower than dealing with variable addresses assigned at statement entry time (even if a hash table is used at runtime), and dealing with numeric values that weren't parsed at statement entry is also very time consuming. And even though the rest of the parsing is fairly efficient, it's still usually a win to do it at entry time rather than every time the statement is executed.

I think I need to clarify a few points to complement my reply.

Operator precedence parsing overhead is indeed peanuts compared to everything else that can be, and should be, optimized in a BASIC interpreter, including variable lookup.

There is more nuance with respect to variable allocation and lookup.

BASIC interpreters that support subroutines with local variables and arguments passed by value or by reference typically have reduced performance. Locals require an extra stack or stack space. Local and non-local variable lookup in subroutines require lookups similar to dynamic scoping (which is largely abandoned in favor of lexical scoping in modern programming languages.) Alternatively, the extra lookup cost of locals can be eliminated with a stack to save non-locals at subroutine entry and restore them at subroutine exit, but this moves the extra cost to the allocation and deallocation side of locals at the entry and exit points. Again, there's a lot more to it than I want to discuss here. Variable addresses are not efficiently determinable at runtime by interpreters. By contrast, compilers determine local and non-local variable addresses for efficient execution, an important factor in the speed of compiled BASIC.

Back in the early 80s I did a fair amount of work with the Philips P2000 home computer, Apple IIe (which has actually two BASIC interpreters, a simple integer BASIC by Wozniak and AppleSoft BASIC) and occasional C64. Both AppleSoft and MS BASIC on the P2000 were much more developed, while still being compact. Back then I was curious what the differences were so I looked at the memory management of these interpreters, as well as discovered various forms of operator precedence parsing, before reading about it in college (later I taught programming languages and compilers for two decades to undergraduate and graduate students. It's not difficult to write a BASIC interpreter or a compiler for C as a project in a few weeks these days.)

Putting BASIC aside for a moment, for speed, Pascal on the Apple II was nice, but requires swapping floppy discs to edit the program, compile, and finally assemble it. Also Sweet16 on the Apple was great, also by Wozniak. I still have all his assembly listings that I studied back then to learn 6502 as well as "Programming the 6502" and "Programming the Z80" by the unforgettable Rodnay Zaks.

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
01-28-2023, 05:03 PM
Post: #11
 J-F Garnier Senior Member Posts: 751 Joined: Dec 2013
RE: HP-71 BASIC funny quirks
(01-28-2023 12:39 PM)Albert Chan Wrote:  If I understand correctly, the process of tokenizing HP71B program, we lose the source.
That's why de-compiler was needed, to reconstruct the source, as close as possible.
This save memory, at the cost of slower program listing. (reconstruction takes time)

The delays in line editing are the main drawback of this approach.
Actually, the 71B is quite efficient, the delay is noticeable but not too long.
Series 80 machines and the HP-75C, despite their faster 8-bit CPU, were less efficient, and delays of one second or so were not unusual for a moderately complex line.

Quote:
(01-27-2023 04:21 PM)J-F Garnier Wrote:  The initial intent was very likely to have a constant value embedded in program lines looks exactly as the STD output for the same number ...
If the goal is STD mode for all numbers, why is the test necessary?
No test, no bug?

For STD mode, a test has to be done somewhere to choose between FIX format (w/o exponent) and SCI.
What may be surprising is that the decompile code duplicates the STD formatting process, that is already existing for DISP and PRINT statements.
I guess the reason is that the input number is in a different format so a conversion was anyway needed, and the choice was not to convert the embedded constants to the internal standard form used by all math code then apply the STD string formatting code, but instead directly convert from the embedded constant format to STD string format.

J-F
01-28-2023, 07:19 PM
Post: #12
 Valentin Albillo Senior Member Posts: 912 Joined: Feb 2015
RE: HP-71 BASIC funny quirks
(01-28-2023 12:39 PM)Albert Chan Wrote:  If I understand correctly, the process of tokenizing HP71B program, we lose the source.
That's why de-compiler was needed, to reconstruct the source, as close as possible.
This save memory, at the cost of slower program listing. (reconstruction takes time)

An interesting approach was the one used in the Sinclair ZX Spectrum.

When the user entered a BASIC line from the command prompt (possibly including numeric constants,) the line was immediately checked for correct syntax (thus avoiding storing faulty lines in program memory, as many contemporary BASIC dialects did,) which was a one-time process and made checking syntax at runtime utterly innecessary, thus significantly speeding up program execution.

Once the line passed the syntax check, it was also stored in tokenized form (forced by the user interface,) and if a numeric literal was detected, it was converted on the fly to its 5-byte internal binary form, which was then inserted in the line just after the source literal, so both the text literal and the internal binary form were included in the program line and thus the source wasn't lost and would be available verbatim for listings without wasting time in decompilation, while the binary form would be immediately available for computations without further interpretation or conversion from the literal at runtime.

In other words, the best of both worlds: keyword tokenization, no decompilation, no run-time syntax checks and no run-time numeric conversions from source code to binary form.

V.

All My Articles & other Materials here:  Valentin Albillo's HP Collection

 « Next Oldest | Next Newest »

User(s) browsing this thread: 1 Guest(s)