monic part 6: the mC programming language and compiler
|
12-12-2021, 06:57 PM
Post: #2
|
|||
|
|||
RE: monic part 6: the mC programming language and compiler
Nice work! I really like your project and I'm very much looking forward to learn more about it. Looks like you put a lot of effort into this to cover most of the C language.
I am curious how you plan to handle logical expressions as "short circuit" code or not. I noticed that RETURN, WRITE and a few other constructs allow 'expression' as an argument, but 'factor' for example uses 'additive_expression' instead of 'expression'. To implement RETURN and WRITE this way requires "short circuit logic" to return Boolean values without evaluating unnecessary AND and OR operands. This can be tricky to implement, but not too hard to do using backpatch lists to jump on "true" and "false" conditions to the right VM opcode targets. I think I spotted a problem with the 'function_declaration' starting with an ID just like 'var_declaration', so there is no way of telling which is which with one token lookahead. Making the '(' and ')' optional in 'function_declaration' makes it even harder to distinguish from 'var_declaration' because you have to look ahead all the way to the first '(' to see if this is a function, not a variable declaration. Perhaps insist that functions should be declared with parenthesis? Also 'expression' is defined to be either a 'Boolean_expression' or a 'additive_expression' but you have no way of telling which one applies by just looking at the lookahead token. Why is SWITCH restricted to a 'var' argument and not an `additive_expression'? I've converted the grammar to Yacc to give it a go with Bison and fixed 'function_declaration' to require parenthesis and MAIN with optional parameters for consistence and just in case if you'd like to pass arguments to the main program. However, this grammar has 47 shift/reduce conflicts which suggests that several (cascading) ambiguities exist: Code: %token ARRAY I didn't have time to verify the conflicts to see what constructs might actually become problematic to implement as a parser based on these two conceptually identical grammar versions, but the grammar is not LR(1) and LALR. Note that there should only be one shift/reduce conflict for the IF-ELSE ambiguity that always arrises in C. I haven't checked the grammar with ANTLR, which may reveal more useful information about potential grammar ambiguity issues to implement a recursive-descent parser, since ANTLR is a LL(1) generator. ANTLR also accepts grammars beyond LL(1). Hope this is helpful. Looks like a fun and useful project to work on. - Rob "I count on old friends to remain rational" |
|||
« Next Oldest | Next Newest »
|
User(s) browsing this thread: 1 Guest(s)