Post Reply 
HP-42S Compiler for Niklaus Wirth's PL/0 Language
02-21-2024, 12:35 PM
Post: #8
RE: HP-42S Compiler for Niklaus Wirth's PL/0 Language
I've used Flex and Bison for many years, including use in the microassembler packaged with Nonpareil. Most of this was in C, and I haven't yet converted any of those to use the native C++ support in Flex or Bison. I wasn't previously aware of RE/flex, but it looks quite appealing since much of my development is now in C++. Thanks for pointing it out!

I've also in the last yew years used some PEG parsers, including pyparsing for Python, and PEGTL for C++. I've used PEG for some of my more recent, non-calculator-related programs. pyparsing has some good support for ignoring whitespace between tokens (which can of course be disabled), but unfortunately PEGTL does not.

For those not familiar with PEG (Precedence Expression Grammars):

PEG is interesting in that it's generally used with a single rule set that does both tokenization (as might be done by flex) along with parsing (as might be done with bison). While one could use two layers of PEG, one for scanning and one for parsing, I've never seen that done.

Debugging a PEG parser can be challenging because PEG doesn't detect shift/shift or shift/reduce conflicts. PEG uses the first matching production (hence the "Precedence"). Once it finds that first matching production, it does not care about, nor warn about, ambiguity, because the ordering always will disambiguate cases where multiple productions could match. This is simultaneously a blessing and a curse. Trying to adapt LR or LALR rgrammars for any non-trivial language to PEG, or writing a PEG grammar from an LR or LALR mindset, yields much frustration, because the syntax of productions is basically the same, but there's that huge difference in semantics. If you're accustomed to Yacc or Bison, it takes a lot of getting used to. For instance, a grammar to parse C style unsigned integers in decimal, octal with a leading zero, or hexadecimal with a leading "0x", doesn't work if you write it as:

dec_lit: [0-9]+
oct_lit: 0[0-9]+
hex_lit: 0x[0-9a-fA-F]+
lit: dec_lit | oct_lit | hex_lit

That will interpret intenxed octal literals as decimal, and intended hex literals as a decimal 0 with the x and subsequent digits not consumed as part of lit.

If you reverse the order of the alternatives in the lit production, then it will work as desired.

You may be able to guess how I learned this.
:-)
Find all posts by this user
Quote this message in a reply
Post Reply 


Messages In This Thread
RE: HP-42S Compiler for Niklaus Wirth's PL/0 Language - brouhaha - 02-21-2024 12:35 PM



User(s) browsing this thread: 1 Guest(s)