monic part 5: writing C programs on the calculator
01-01-2021, 01:35 AM (This post was last modified: 12-26-2022 11:58 PM by F-73P.)
Post: #1
 F-73P Junior Member Posts: 43 Joined: May 2020
monic part 5: writing C programs on the calculator
Previous: monic part 4: arbitrary-precision rational arithmetic

Next: monic part 6: the mC programming language and compiler

I've started writing an editor and translator for a subset of the C programming language. I'm not sure what the proper computer science term for the translator is. Given a program, e.g. find the factorial of the positive integer on stack level 1:

Code:
 double product=1.0; double n; int main(void) {  n = level1; //pop value on stack level 1 into n  do {   product = product*n;   n--;   } while (n>1.0); }

the translator produces a sequence of codes. When the program is run the codes are fed to a switch statement, which executes the corresponding instruction. Does this make the translator a compiler, or is it an interpreter?

The C language combines all the power of assembly language with all the ease-of-use of assembly language
01-01-2021, 11:20 AM (This post was last modified: 01-01-2021 11:23 AM by grsbanks.)
Post: #2
 grsbanks Senior Member Posts: 1,219 Joined: Jan 2017
RE: monic part 5: writing C programs on the calculator
(01-01-2021 01:35 AM)F-73P Wrote:  the translator produces a sequence of codes. When the program is run the codes are fed to a switch statement, which executes the corresponding instruction. Does this make the translator a compiler, or is it an interpreter?

More like an interpreter. An interpreter executes the "translated" program as it is converting it (as does your system). A compiler transforms the original into something else that you can store and then run at a later date once the "translation" is complete.

A couple of optimisations of your code, BTW. You can't do "n--" if n is a floating point number. Also, this assumes that the value of n has been checked for validity and that it is in fact a positive integer value (expressed as a floating point).

Code:
int main(void) {  double product = 1.0;  unsigned int n = (unsigned int)level1;  for(; n > 1;) product *= n--; }

There are only 10 types of people in this world. Those who understand binary and those who don't.
01-05-2021, 03:50 AM
Post: #3
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: monic part 5: writing C programs on the calculator
(01-01-2021 01:35 AM)F-73P Wrote:  the translator produces a sequence of codes. When the program is run the codes are fed to a switch statement, which executes the corresponding instruction. Does this make the translator a compiler, or is it an interpreter?

Very nice work, judging from your description!

This is not a trivial accomplishment.

I assume you are lexing/parsing the source code just once, so this is a compiler that produces (byte) codes. Your virtual machine executes them in a loop with a switch statement. This approach is in principle similar to the JVM. Just to compare, I wrote a small C compiler that produces JVM class files from "mini C" source code.

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
01-06-2021, 04:28 AM (This post was last modified: 02-01-2021 06:08 AM by F-73P.)
Post: #4
 F-73P Junior Member Posts: 43 Joined: May 2020
RE: monic part 5: writing C programs on the calculator
(01-01-2021 11:20 AM)grsbanks Wrote:  You can't do "n--" if n is a floating point number.

"n--" compiles without errors for n of type double on the compilers I tried (and gives n-1.0), so I assume it's supported by the standard they are using. However it's not good programming style, as "n--" could be mistaken for the predecessor of the floating-point number, which in general is not equal to n-1.0, so your suggestion is better.

(01-05-2021 03:50 AM)robve Wrote:  I wrote a small C compiler that produces JVM class files from "mini C" source code.

Great work! I also like the manual, comprehensive and very professional.

I wrote the code before I read anything about compilers. The front end consists of (what I now know is called) a lexical analyser and the parser is just Dijkstra's shunting yard algorithm.

I'm now wondering whether I should:

a) Continue with my ad hoc approach. This is the easiest option but there is no syntax or semantic checking, so it will never be a complete programming environment.

b) Learn about grammars, abstract syntax trees and top-down parsing and write the parser by hand. This sounds very difficult.

c) Find an open source compiler and try to port it across.

My preference is for option b), despite the difficulties. In that case I think I should choose a programming language with a simpler grammar than C as the source language.

The C language combines all the power of assembly language with all the ease-of-use of assembly language
01-06-2021, 06:34 AM
Post: #5
 mpark Junior Member Posts: 25 Joined: Sep 2018
RE: monic part 5: writing C programs on the calculator
Option b, definitely. You wrote a compiler without knowing anything about compilers-- imagine what you'll accomplish once you know what you're doing!
01-06-2021, 09:59 AM
Post: #6
 vaklaff Member Posts: 104 Joined: Dec 2019
RE: monic part 5: writing C programs on the calculator
(01-06-2021 06:34 AM)mpark Wrote:  Option b, definitely. You wrote a compiler without knowing anything about compilers-- imagine what you'll accomplish once you know what you're doing!

I too would recommend b. Tools like flex and bison can help tremendously. It's been a while since I used them last time but IIRC you get by with just cursory knowledge of the grammars theory.
01-08-2021, 02:59 AM
Post: #7
 robve Senior Member Posts: 360 Joined: Sep 2020
RE: monic part 5: writing C programs on the calculator

Definitely option b). By contrast, it will quickly become much more complicated to complete option a) when the language gets more complex.

Option b) is not as difficult to comprehend as you may think.

You could implement this with a hand-written lexical analyzer and write a recursive descent parser to do the translation. Or use flex and bison. The first chapters of the https://en.wikipedia.org/wiki/Compilers:..._and_Tools aka the "(Red) Dragon Book" shows how. Here are some lecture notes to get started.

To limit the syntax to arithmetic syntax, operator precedence parsers are very handy and relatively easy to implement, see https://en.wikipedia.org/wiki/Operator-p...nce_parser

The syntax of Prolog is parsed with operator precedence parsing. The interesting part is that the language syntax itself can be extended on the fly by the user defining his/her prefix, index, and postfix operators.

Also Pakrat parsing has gained some popularity, see https://en.wikipedia.org/wiki/Parsing_ex...on_grammar

Hope this helps!

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
01-12-2021, 08:01 PM
Post: #8
 Jonathan Busby Member Posts: 250 Joined: Nov 2014
RE: monic part 5: writing C programs on the calculator
(01-06-2021 09:59 AM)vaklaff Wrote:  Tools like flex and bison can help tremendously. It's been a while since I used them last time but IIRC you get by with just cursory knowledge of the grammars theory.

I second using Flex and Bison instead of rolling your own parser as it's *much* easier. You should get the book "flex & bison: Text Processing Tools" by by John Levine and published by O'Reilly. The book has a well written introduction to BNF grammars. I skipped that part though as I've read the "Dragon book" and due to my knowledge of theoretical computer science from uni. A fellow HP48 hacker and I were coding a "high-level assembler" for the Saturn processor with the "assembler" being a kind of C and Saturn assembly hybrid. I had bought both the paperback and Kindle editions of the book and it was a really great read -- it made coding the assembler / compiler or whatever-you-want-to-call-it *much* easier than rolling our own as the C99 grammar is very complicated. The project is on hold now as I've had numerous other things to deal with in my Real Life [TM], but I'll eventually get back to it and finish it, hopefully

Regards,

Jonathan

Aeternitas modo est. Longa non est, paene nil.
02-01-2021, 06:36 AM (This post was last modified: 02-03-2021 01:08 AM by F-73P.)
Post: #9
 F-73P Junior Member Posts: 43 Joined: May 2020
RE: monic part 5: writing C programs on the calculator
Thank you to everyone for the links and great advice - option b) was definitely the best choice.

I've ported the source code for the "TINY" compiler presented in Compiler Construction, Principles and Practice to the calculator:

Execution time is not too bad, with 1000! (2568 digits, on stack level 6) calculated in about 6 seconds:

I haven't added scrolling yet so I reduced the font size to see more digits:

The "TINY" language has the following BNF grammar:

and the following extended BNF grammar:

I added bytecode generation to the parser and this was pretty straightforward, except for if-then-else and repeat-until statements; for these I used a stack to keep track of the destination labels and jump-if-false bytecodes for nested statements.

I would now like to implement the "C-" language from the same book, which is more powerful than "TINY" ("C-" adds support for functions and arrays). Here is a factorial program written in "C-":

"C-" has the following BNF grammar:

My impression is that a recursive descent parser is not too difficult to write provided the programming language is small and the grammar is written in extended BNF.

Flex and Bison are of course very powerful and useful tools but my philosophy with the "monic" calculator project has been to minimise the number of tools and libraries used.

So if I can rewrite the grammar for "C-" in extended BNF I can use the "TINY" parser as a guide to write a parser for "C-". If anyone is interested in helping me with this please let me know.

EDIT: Thank you to Jonathan Busby for translating the "C-" BNF grammar above into an ISO/IEC 14977 "standard" (see the Wikipedia article) EBNF grammar, which I will use to write the "C-" parser:

Code:
program = declaration-list ; declaration-list = { declaration } , declaration ; declaration = var-declaration , fun-declaration ; var-declaration = type-specifier , ID , [ "[" NUM "]" ] , ";" ; type-specifier = "int" | "void" ; fun-declaration = type-specifier , ID , "(" , params , ")" , compound-stmt ; params = ( { param , "," } , param ) | "void" ; param = type-specifier , ID [ "[" , "]" ] ; compound-stmt = "{" , local-declarations , statement-list , "}" ; local-declarations = { var-declaration } ; statement-list = { statement } ; statement = expression-stmt | compound-stmt | selection-stmt | iteration-stmt | return-stmt ; expression-stmt = [ expression ] , ";" ; selection-stmt = if , "(" , expression , ")" , statement , [ "else" , statement ] ; iteration-stmt =  while , "(" , expression , ")" , statement ; return-stmt = "return" , [ expression ] , ";" ; expression = { var , "=" } , simple-expression ; simple-expression = { additive-expression , relop } , additive-expression ; relop = "<=" | "<" | ">" | ">=" | "==" | "!=" ; additive-expression = { term , addop } , term ; addop = "+" | "-" ; term = { factor , mulop } , factor ; mulop = "*" | "/" ; factor = "(" , expression , ")" | var | call | NUM ; call = ID , "(" , args , ")" ; args = [ { expression , "," } , expression ] ;

The C language combines all the power of assembly language with all the ease-of-use of assembly language
02-03-2021, 09:32 PM
Post: #10
 Jonathan Busby Member Posts: 250 Joined: Nov 2014
RE: monic part 5: writing C programs on the calculator
(02-01-2021 06:36 AM)F-73P Wrote:  [snip]

EDIT: Thank you to Jonathan Busby for translating the "C-" BNF grammar above into an ISO/IEC 14977 "standard" (see the Wikipedia article) EBNF grammar, which I will use to write the "C-" parser:

Code:
program = declaration-list ; declaration-list = { declaration } , declaration ; declaration = var-declaration , fun-declaration ; var-declaration = type-specifier , ID , [ "[" NUM "]" ] , ";" ; type-specifier = "int" | "void" ; fun-declaration = type-specifier , ID , "(" , params , ")" , compound-stmt ; params = ( { param , "," } , param ) | "void" ; param = type-specifier , ID [ "[" , "]" ] ; compound-stmt = "{" , local-declarations , statement-list , "}" ; local-declarations = { var-declaration } ; statement-list = { statement } ; statement = expression-stmt | compound-stmt | selection-stmt | iteration-stmt | return-stmt ; expression-stmt = [ expression ] , ";" ; selection-stmt = if , "(" , expression , ")" , statement , [ "else" , statement ] ; iteration-stmt =  while , "(" , expression , ")" , statement ; return-stmt = "return" , [ expression ] , ";" ; expression = { var , "=" } , simple-expression ; simple-expression = { additive-expression , relop } , additive-expression ; relop = "<=" | "<" | ">" | ">=" | "==" | "!=" ; additive-expression = { term , addop } , term ; addop = "+" | "-" ; term = { factor , mulop } , factor ; mulop = "*" | "/" ; factor = "(" , expression , ")" | var | call | NUM ; call = ID , "(" , args , ")" ; args = [ { expression , "," } , expression ] ;

Note that the above EBNF grammar has a slight glitch, which is my fault. I accidentally incorrectly translated the "C-" BNF grammar arithmetic expressions into an EBNF grammar which made all the arithmetic expressions change from left-associative to right-associative. Here is the fix :

Code:
additive-expression = term , { addop , term } ; term = factor , { mulop , factor } ;

Hope this doesn't cause any inconvenience or confusion.

N.B. Also, note that I assumed when writing the EBNF grammar that the Monic author would be implementing an LL(1) recursive descent parser, which reads its input from left to right and recurses in a depth-first manner. If this is not the case, then the EBNF grammar will need to be modified.

Regards,

Jonathan

Aeternitas modo est. Longa non est, paene nil.
02-12-2021, 11:01 PM (This post was last modified: 02-17-2021 10:52 PM by Jonathan Busby.)
Post: #11
 Jonathan Busby Member Posts: 250 Joined: Nov 2014
RE: monic part 5: writing C programs on the calculator
Well, I found more absent-minded stupid associativity errors in the EBNF grammar for C- . Here is the corrected grammar :

Code:
program = declaration-list ; declaration-list = declaration , { declaration } ; declaration = var-declaration , fun-declaration ; var-declaration = type-specifier , ID , [ "[" NUM "]" ] , ";" ; type-specifier = "int" | "void" ; fun-declaration = type-specifier , ID , "(" , params , ")" , compound-stmt ; params = ( param , { "," , param } ) | void ; param = type-specifier , ID [ "[" , "]" ] ; compound-stmt = "{" , local-declarations , statement-list , "}" ; local-declarations = { var-declaration } ; statement-list = [ statement , { statement } ] ; statement = expression-stmt | compound-stmt | selection-stmt | iteration-stmt | return-stmt ; expression-stmt = [ expression ] , ";" ; selection-stmt = if , "(" , expression , ")" , statement , [ "else" , statement ] ; iteration-stmt =  while , "(" , expression , ")" , statement ; return-stmt = "return" , [ expression ] , ";" ; expression =  { var , "=" } , simple-expression ; var = ID , [ "[" , expression , "]" ] ; simple-expression = additive-expression , { relop , additive-expression } ; relop = "<=" | "<" | ">" | ">=" | "==" | "!=" ; additive-expression = term , { addop , term } ; addop = "+" | "-" ; term = factor , { mulop , factor } ; mulop = "*" | "/" ; factor = "(" , expression , ")" | var | call | NUM ; call = ID , "(" , args , ")" ; args = [ expression , { "," , expression } ] ;

Sorry for any inconvenience.

Regards,

Jonathan

NOTE #1 : I had also accidentally left out some non-superfluous non-terminals. Also, the "C-" assignment operator ( "=" ) is *right associative*, just like in full blown C. So, if that part of the grammar looks like it needs correcting, then just know that it doesn't because I intentionally made the EBNF version right associative to be fully compatible with the "C-" BNF grammar.

EDIT #1 : Incorporated changes mentioned in later posts into above EBNF grammar.

Aeternitas modo est. Longa non est, paene nil.
02-13-2021, 06:50 PM (This post was last modified: 02-17-2021 10:22 PM by Jonathan Busby.)
Post: #12
 Jonathan Busby Member Posts: 250 Joined: Nov 2014
RE: monic part 5: writing C programs on the calculator
(02-12-2021 11:01 PM)Jonathan Busby Wrote:  Well, I found more absent-minded stupid associativity errors in the EBNF grammar for C- . Here is the corrected grammar :

Code:
program = declaration-list ; declaration-list = declaration , { declaration } ; declaration = var-declaration , fun-declaration ; var-declaration = type-specifier , ID , [ "[" NUM "]" ] , ";" ; type-specifier = "int" | "void" ; fun-declaration = type-specifier , ID , "(" , params , ")" , compound-stmt ; params = ( param , { "," , param } ) | void ; param = type-specifier , ID [ "[" , "]" ] ; compound-stmt = "{" , local-declarations , statement-list , "}" ; local-declarations = { var-declaration } ; statement-list = [ statement , { statement } ] ; statement = expression-stmt | compound-stmt | selection-stmt | iteration-stmt | return-stmt ; expression-stmt = [ expression ] , ";" ; selection-stmt = if , "(" , expression , ")" , statement , [ "else" , statement ] ; iteration-stmt =  while , "(" , expression , ")" , statement ; return-stmt = "return" , [ expression ] , ";" ; expression =  { var , "=" } , simple-expression ; var = ID , [ "[" , expression , "]" ] ; simple-expression = additive-expression , { relop , additive-expression } ; relop = "<=" | "<" | ">" | ">=" | "==" | "!=" ; additive-expression = term , { addop , term } ; addop = "+" | "-" ; term = factor , { mulop , factor } ; mulop = "*" | "/" ; factor = "(" , expression , ")" | var | call | NUM ; call = ID , "(" , args , ")" ; args = [ expression , { "," , expression } ] ;

Sorry for any inconvenience.

Regards,

Jonathan

NOTE #1 : I had also accidentally left out some non-superfluous non-terminals. Also, the "C-" assignment operator ( "=" ) is *right associative*, just like in full blown C. So, if that part of the grammar looks like it needs correcting, then just know that it doesn't because I intentionally made the EBNF version right associative to be fully compatible with the "C-" BNF grammar.

The following can be simplified :

Code:
statement-list = statement , { statement } | empty ;

The simplified version is :

Code:
statement-list = [ statement , { statement } ] ;

Regards,

Jonathan

EDIT #1 : Incorporated above changes into EBNF grammar

Aeternitas modo est. Longa non est, paene nil.
02-14-2021, 11:40 AM
Post: #13
 F-73P Junior Member Posts: 43 Joined: May 2020
RE: monic part 5: writing C programs on the calculator
Great work Jonathan!

I've added "++" and "--" to the grammar, as well as "for" loops. I've modified the scanner accordingly and completed a parser recogniser.

Now for the fun part: adding code generation.

The C language combines all the power of assembly language with all the ease-of-use of assembly language
02-15-2021, 11:00 PM
Post: #14
 Jonathan Busby Member Posts: 250 Joined: Nov 2014
RE: monic part 5: writing C programs on the calculator
(02-14-2021 11:40 AM)F-73P Wrote:  I've added "++" and "--" to the grammar

Just make sure you used the correct associativity and precedence for the post and pre increment and decrement operators. The post increment and decrement operators ( eg. "x++" and "x--" ) are one level of operator precedence up from the pre increment and decrement operators and they're also left associative. For the lower precedence pre increment and decrement operators ( eg. "++x" and "--x" ) note they're right associative. Furthermore, you have to take into account the differences between the two classes of post and pre increment and decrement operators with respect to how they alter an expression. What I mean is that something like "y = x++" will assign x to y and then increment x whereas "y = ++x" will increment x first and then assign it to y. You should consult the C99 grammar which can be found here ( Look in Annex A ). If you already know all of this, then please don't interpret my comments as patronizing you in any way, I just like to be sure

Regards,

Jonathan

Aeternitas modo est. Longa non est, paene nil.
02-17-2021, 09:14 AM (This post was last modified: 02-17-2021 09:15 AM by F-73P.)
Post: #15
 F-73P Junior Member Posts: 43 Joined: May 2020
RE: monic part 5: writing C programs on the calculator

I removed "int" from the type_specifier grammar production and replaced it with "apq" (arbitrary precision rational), "apf" (arbitrary precision float - not supported on the calculator yet) and "int32_t" (maybe?).

I also added "read" and "write" from the TINY programming language, and post increment and decrement operators (just to increment/decrement variables for now, not used in assignments).

There were a few aspects of the original BNF grammar I found confusing:

1) expression_stmt -> expression ; | ;

I couldn't figure out what the ; on its own is for, so I changed the production to

expression_stmt -> expression ;

2) expression -> var = expression | simple expression

At first I thought this was a typo as I couldn't see how you can have e.g.

expression -> var = expression
Replace non-terminal expression with var = expression to obtain:
expression -> var = var = expression

e.g. a=b=100

and so I changed the production to

expression -> var = simple expression | simple expression

But then I saw your comment explaining that = is right-associative and that a=b=100 is indeed legal. So I'll go back and fix that

I added some code generation instructions to the parser and now simple C programs like the factorial program below can be executed:

The C language combines all the power of assembly language with all the ease-of-use of assembly language
02-17-2021, 10:48 PM
Post: #16
 Jonathan Busby Member Posts: 250 Joined: Nov 2014
RE: monic part 5: writing C programs on the calculator

You're welcome

Quote:I removed "int" from the type_specifier grammar production and replaced it with "apq" (arbitrary precision rational), "apf" (arbitrary precision float - not supported on the calculator yet) and "int32_t" (maybe?).

I also added "read" and "write" from the TINY programming language, and post increment and decrement operators (just to increment/decrement variables for now, not used in assignments).

Nice Keep up the good work

Quote:There were a few aspects of the original BNF grammar I found confusing:

1) expression_stmt -> expression ; | ;

I couldn't figure out what the ; on its own is for, so I changed the production to

expression_stmt -> expression ;

This is to allow empty expressions. You need this in cases such as :

Code:
while(x != somevalue) ;

Which could be used for busy waiting ( eg. a "spinlock" )

Quote:2) expression -> var = expression | simple expression

At first I thought this was a typo as I couldn't see how you can have e.g.

expression -> var = expression
Replace non-terminal expression with var = expression to obtain:
expression -> var = var = expression

e.g. a=b=100

and so I changed the production to

expression -> var = simple expression | simple expression

But then I saw your comment explaining that = is right-associative and that a=b=100 is indeed legal. So I'll go back and fix that

Yep In the EBNF grammar it is :

Code:
expression =  { var , "=" } , simple-expression ;

Quote:I added some code generation instructions to the parser and now simple C programs like the factorial program below can be executed:

Nice work! I look forward to seeing how your project evolves

Regards,

Jonathan

Aeternitas modo est. Longa non est, paene nil.
02-18-2021, 06:53 PM
Post: #17
 Jonathan Busby Member Posts: 250 Joined: Nov 2014
RE: monic part 5: writing C programs on the calculator
(02-17-2021 10:48 PM)Jonathan Busby Wrote:
Quote:There were a few aspects of the original BNF grammar I found confusing:

1) expression_stmt -> expression ; | ;

I couldn't figure out what the ; on its own is for, so I changed the production to

expression_stmt -> expression ;

This is to allow empty expressions. You need this in cases such as :

Code:
while(x != somevalue) ;

Which could be used for busy waiting ( eg. a "spinlock" )

The above was just a random example. If you google for eg. "C empty expression uses" then you'll get many examples of different uses of "null" or empty expressions other than my simple ( and somewhat contrived ) example.

Regards,

Jonathan

Aeternitas modo est. Longa non est, paene nil.
02-19-2021, 10:39 PM
Post: #18
 Jonathan Busby Member Posts: 250 Joined: Nov 2014
RE: monic part 5: writing C programs on the calculator
(02-18-2021 06:53 PM)Jonathan Busby Wrote:
(02-17-2021 10:48 PM)Jonathan Busby Wrote:  This is to allow empty expressions. You need this in cases such as :

Code:
while(x != somevalue) ;

Which could be used for busy waiting ( eg. a "spinlock" )

The above was just a random example. If you google for eg. "C empty expression uses" then you'll get many examples of different uses of "null" or empty expressions other than my simple ( and somewhat contrived ) example.

Regards,

Jonathan

Also, googling for "C while loop empty loop statement" turns up a number of excellent examples of the usage of while loops with empty loop statements ( ie. "while(something) ;" ), especially the StackOverflow answers.

Regards,

Jonathan

Aeternitas modo est. Longa non est, paene nil.
11-16-2021, 07:58 AM (This post was last modified: 12-11-2021 02:27 AM by F-73P.)
Post: #19
 F-73P Junior Member Posts: 43 Joined: May 2020
RE: monic part 5: writing C programs on the calculator
I've created a new type specifier ("local") for local variables (which can be declared at the beginning of a function and are valid only in that function) and another type specifier for arrays ("array"). I've removed the other type specifiers from the grammar, as the first word of each object pushed onto the stack determines the object type. I've also added Boolean data types. Below is the EBNF grammar for the mC programming language (tokens are in uppercase/quotation marks):

Code:
 program -> code ENDFILE code -> { array-declaration } { function-declaration } main-function-declaration array-declaration -> ARRAY ID dimensions SEMICOLON dimensions -> LEFTBRACKET NUM RIGHTBRACKET { LEFTBRACKET NUM RIGHTBRACKET } function-declaration -> ID LEFTPARENTHESIS params RIGHTPARENTHESIS LEFTBRACE local-declarations statement-list RIGHTBRACE params -> [ param { COMMA param } ] param -> ID { LEFTBRACKET NUM RIGHTBRACKET } main-function-declaration -> MAIN LEFTBRACE local-declarations statement-list RIGHTBRACE local-declarations -> { LOCAL var-declaration } var-declaration -> ID SEMICOLON | array-declaration statement-list -> { statement } statement -> compound-stmt | if-stmt | while-stmt | do-stmt | for-stmt | switch-stmt | return-stmt | read-stmt | write-stmt | assignment-stmt | expression-stmt | goto-stmt | label-stmt | nop-stmt  compound-stmt -> LEFTBRACE statement-list RIGHTBRACE if-stmt -> IF LEFTPARENTHESIS Boolean-expression RIGHTPARENTHESIS statement [ ELSE statement ] while-stmt -> WHILE LEFTPARENTHESIS Boolean-expression RIGHTPARENTHESIS statement do-stmt -> DO statement WHILE LEFTPARENTHESIS Boolean-expression RIGHTPARENTHESIS SEMICOLON for-stmt -> FOR LEFTPARENTHESIS [ assignment ] SEMICOLON Boolean-expression SEMICOLON [ addditive-expression ] RIGHTPARENTHESIS statement  switch-stmt -> SWITCH LEFTPARENTHESIS ID { LEFTBRACKET additive-expression RIGHTBRACKET } RIGHTPARENTHESIS LEFTBRACE { CASE NUM COLON statement-list BREAK SEMICOLON } [ DEFAULT COLON statement-list ] RIGHTBRACE   return-stmt -> RETURN [ expression ] SEMICOLON     read-stmt -> READ ID { LEFTBRACKET additive-expression RIGHTBRACKET } SEMICOLON  write-stmt -> WRITE expression SEMICOLON  assignment-stmt -> assignment SEMICOLON expression-stmt -> [ expression ] SEMICOLON goto-stmt -> GOTO ID SEMICOLON label-stmt -> ID COLON statement nop-stmt -> NOPR SEMICOLON assignment -> ID { LEFTBRACKET additive-expression RIGHTBRACKET } "=" expression  expression -> Boolean-expression | additive-expression   Boolean-expression -> logical-expression { boolop logical-expression }  logical-expression -> additive-expression relop additive-expression | ID { LEFTBRACKET additive-expression RIGHTBRACKET } | TRUE | FALSE  additive-expression -> term { addop term } term -> [ NEGATE ] factor { mulop [ NEGATE ] factor } factor -> LEFTPARENTHESIS additive-expression RIGHTPARENTHESIS | NUM | ID { LEFTBRACKET additive-expression RIGHTBRACKET } | ID LEFTPARENTHESIS args RIGHTPARENTHESIS | ID "++" | ID "--"  args -> [ expression { COMMA expression } ] boolop -> "and" | "or" | "not" relop -> "<=" | "<" | ">" | ">=" | "==" | "!="  addop -> "+" | "-" mulop -> "*" | "/"

Some test programs:

1) factorial using a while loop

Code:
 /*factorial program 1*/ main{    read x;                              factorial=1;    while (x>0) {      factorial = factorial*x;      x--;                              }    write factorial; }

2) factorial using recursion

Code:
 /*factorial program 2*/ fact(x){    if (x==1) return 1;    else return x*fact(x-1); }   main{                               read y;     if (y>1) fact(y);     else write 1;                         }

3) squaring a number using nested for loops

Code:
 /*square program*/ main{    read x;    product=0;        for(i=x;i>=1;i--) {       for(j=1;j<=x;j++) {         product++;       }    }                              write product; }

The C language combines all the power of assembly language with all the ease-of-use of assembly language
12-05-2021, 09:44 AM (This post was last modified: 12-11-2021 08:59 AM by F-73P.)
Post: #20
 F-73P Junior Member Posts: 43 Joined: May 2020
RE: monic part 5: writing C programs on the calculator
As the compiler for the programming language (mC) introduced in the previous post nears completion, I would like to provide a brief overview of how the compiler works and share some source code.

The compiler is comprised of 3 main components - the scanner, parser and virtual machine. In this post I'll describe the scanner. Below is the C code for the scanner (which is based on the scanner in Kenneth C. Louden's excellent "Compiler Construction:Principles and Practice"):

Code:
 //scanner.c v1.00 //DFA scanner - read the source code and return the next token to the parser //includes #include <stdbool.h> #include <ctype.h> #include <stdint.h> #include <string.h> #include "definitions.h" //structs struct { //lookup table of reserved words char* str; TokenType tok; } reservedWords[RESERVED] = {{"main",MAIN},{"if",IF},{"else",ELSE},{"do",DO},{"while",WHILE},{"for",FOR}, {"return",RETURN},                              {"read",READ},{"write",WRITE},{"and",AND},{"or",OR},{"not",NOT},{"true",TRUE},                              {"false",FALSE},{"local",LOCAL},{"array",ARRAY},{"switch",SWITCH},                              {"case",CASE},{"break",BREAK},{"default",DEFAULT},{"goto",GOTO},{"nop",NOPR}}; //global variables                                                               char source[SOURCE+1]={"/*factorial program*/\n" //example program                       "main {\n"                           "read x;\n"                                                       "factorial=1;\n"                           "while (x>0) {\n"                             "factorial = factorial*x;\n"                             "x--;\n"                         "}\n"                         "write factorial;\n"                       "}\n"                                                    }; uint32_t sourceIndex=0; //current position in source uint16_t lineno=0; //current line number in source     char lexemeString[MAXLEXEMELEN+1]; //prototypes                                           TokenType reservedLookup(char *lexemeString); //functions TokenType getToken(void) { //returns the next token in source         char c;    //current character in source          uint16_t lexemeStringIndex = 0; //index for storing lexeme into lexemeString     _Bool save; //flag to indicate save to lexemeString          TokenType currentToken; //holds current token to be returned to the parser            StateType state = START; //current state - always begins at START   while (state!=DONE) {         c=source[sourceIndex++]; //next character in program         save = true;         switch (state) { //begin switch (state)             case START:                 if (isdigit(c))                     state = INNUM;          else if (isalpha(c))            state = INID;                  else if (c == '<')                      state = INLESSTHAN;                  else if (c == '>')                      state = INGREATERTHAN;                  else if (c == '=')                      state = INEQUAL;                  else if (c == '!')                      state = INNOTEQUAL;                  else if (c == '/') {                      save = false;                      state = INDIV;                  }                  else if (c =='+') {                      state = INPLUS;                  }                  else if (c =='-') {                      state = INMINUS;                  }          else if ((c == ' ') || (c == '\t') || (c == '\n')) {                      save = false;                      if (c == '\n') lineno++;                  }          else          { state = DONE;            switch (c) {                          case ENDFILE:                save = false;                currentToken = ENDFILE;                break;              case '*':                currentToken = TIMES;                break;                          case ':':                currentToken = COLON;                break;              case ';':                currentToken = SEMI;                break;              case ',':                              currentToken = COMMA;                break;              case '(':                currentToken = LPAREN;                break;              case ')':                currentToken = RPAREN;                break;                          case '[':                currentToken = LBRACKET;                break;              case ']':                currentToken = RBRACKET;                break;                          case '{':                currentToken = LBRACE;                break;              case '}':                currentToken = RBRACE;                break;                          case '&':                currentToken = ADDR;                break;              default:                  //insert error code                  state = DONE;            }          }          break;             case INNUM:          if (!isdigit(c)) { //backup in the input                      if(c == 'e') {                          state=INEXP;                      }            else if(c != '.') {                          sourceIndex--;                          save = false;                          state = DONE;                          currentToken = NUM;                      }          }          break;             case INEXP:                 state=INNUM;                 break;       case INID: //ID is letters/digits (cannot start with a digit)          if (!isalpha(c)) { //backup in the input                      if (!isdigit(c)) {                          sourceIndex--;                          save = false;                          state = DONE;                          currentToken = ID;                      }          }          break;             case INLESSTHAN:               state = DONE;               if (c == '=') currentToken = LTE;               else {           save = false;                     sourceIndex--;                     currentToken = LT;         }               break;             case INGREATERTHAN:               state = DONE;               if (c == '=') currentToken = GTE;               else {           save = false;                     sourceIndex--;                     currentToken = GT;         }               break;             case INEQUAL:                 state = DONE;               if (c == '=') currentToken = EQ;               else {                     sourceIndex--;                     save=false;                     currentToken = ASSIGN;                 }                 break;             case INNOTEQUAL:                 state = DONE;               if (c == '=') currentToken = NEQ;               else {                     //insert error code                     state = DONE;                 }               break;             case INDIV:         save = false;               if (c == '*') state = INCOMMENT;               else {                     state = DONE;                     sourceIndex--;                     currentToken = DIV;                                             if (lexemeStringIndex < MAXLEXEMELEN) {                         lexemeString[lexemeStringIndex++] = '/';                         lexemeString[lexemeStringIndex] = '\0';                     }                     else {                         //insert error code                     }             }               break;             case INPLUS:         state = DONE;               if (c == '+') currentToken = INC;               else {           save = false;                     sourceIndex--;                     currentToken = PLUS;         }               break;             case INMINUS:         state = DONE;               if (c == '-') currentToken = DEC;               else {           save = false;                     sourceIndex--;                     currentToken = MINUS;         }               break;       case INCOMMENT:               save = false;               if (c == ENDFILE) {                      state = DONE;                   currentToken = ENDFILE;               }               else if (c == '*')                   state = OUTCOMMENT;                 else if (c == '\n') lineno++;               break;             case OUTCOMMENT:                 save = false;                 if (c == '/') state = START;               else { //backup in the input            sourceIndex--;            state = INCOMMENT;         }                 break;       case DONE:       default: //should never happen                //insert error code                  state = DONE;          break;     } //end switch (state)         if (save) {             if (lexemeStringIndex < MAXLEXEMELEN) lexemeString[lexemeStringIndex++] = c;       else {                 //insert error code             }                         }     if (state == DONE) {             lexemeString[lexemeStringIndex] = '\0';        if (currentToken == ID) currentToken = reservedLookup(lexemeString);         }   }     return currentToken; } TokenType reservedLookup (char *lexemeString) { //lookup an identifier to see if it is a reserved word (linear search)     uint16_t i;     for (i=0;i<RESERVED;i++) {         if (!strcmp(lexemeString,reservedWords[i].str)) return reservedWords[i].tok;     }     return ID; }

The code for the "definitions.h" header file is:

Code:
 //definitions.h v1.00 //tokens typedef enum { ENDFILE, //book-keeping tokens     MAIN,IF,ELSE,DO,WHILE,FOR,RETURN,READ,WRITE,AND,OR,NOT,TRUE,FALSE,LOCAL,ARR​AY,SWITCH, //reserved words       CASE,BREAK,DEFAULT,GOTO,NOPR, //reserved words     ID,NUM, //multicharacter tokens     //special symbols     PLUS,MINUS,TIMES,DIV,LT,LTE,GT,GTE,EQ,NEQ,ASSIGN,COLON,SEMI,COMMA,LPAREN,RP​AREN,LBRACKET,RBRACKET,     LBRACE,RBRACE,INC,DEC,ADDR } TokenType; //states in scanner DFA typedef enum {START,INNUM,INEXP,INID,INLESSTHAN,INGREATERTHAN,INEQUAL,INNOTEQUAL,INDIV,I​NPLUS,INMINUS,               INCOMMENT,OUTCOMMENT,DONE} StateType; //definitions #define RESERVED 22 //the number of reserved words #define SOURCE 1000 //maximum character length of source code #define MAXLEXEMELEN 40 //maximum length of lexeme

The program source code is an array of characters terminated by the null character (i.e. a string). To improve readability, the string is padded with white space (blanks,tabs and newlines). Comments (text between /* and */) are ignored. The source code for the example program in "scanner.c v1.00" is:

Code:
 /*factorial program 1*/ main{    read x;                              factorial=1;    while (x>0) {      factorial = factorial*x;      x--;                              }    write factorial; }

The numbers,letters,words and special symbols that make up the source code are called "lexemes". Each lexeme belongs to a special category called a "token". The scanner extracts each lexeme from the source code and determine its token type.

To test the scanner, add the following two files to the project:

Code:
 //compiler.c v1.00 //prototypes void parse(void); //functions int main (void) {     parse();     }

Code:
 //parser.c v1.00 #include "definitions.h" //global variables extern char lexemeString[MAXLEXEMELEN+1]; //prototypes TokenType getToken(void); //functions void parse(void) {     TokenType token;          while(token!=ENDFILE) {         token=getToken();     }         }

Build the project, start a Debug Session (I use the Keil uVision5 simulator) and repeatedly step over the line "token=getToken();" in "parser.c" to extract the lexemes and determine the corresponding token (the lexemes are stored in the variable "lexemeString"):

Code:
 lexeme                                                token main                                                  MAIN {                                                     LBRACE read                                                  READ x                                                     ID ;                                                     SEMI factorial                                             ID =                                                     ASSIGN 1                                                     NUM ;                                                     SEMI while                                                 WHILE (                                                     LPAREN x                                                     ID >                                                     GT 0                                                     NUM )                                                     RPAREN {                                                     LBRACE factorial                                             ID =                                                     ASSIGN factorial                                             ID *                                                     TIMES x                                                     ID ;                                                     SEMI x                                                     ID --                                                    DEC ;                                                     SEMI }                                                     RBRACE write                                                 WRITE factorial                                             ID ;                                                     SEMI }                                                     RBRACE 0                                                     ENDFILE

This information is passed to the parser, which will be the subject of the next post.

The C language combines all the power of assembly language with all the ease-of-use of assembly language
 « Next Oldest | Next Newest »

User(s) browsing this thread: 1 Guest(s)