1. Homepage
  2. Programming
  3. INFR10065 Compiling Techniques 2022/23 | Coursework 1 - Implement a ChocoPy parser

INFR10065 Compiling Techniques 2022/23 | Coursework 1 - Implement a ChocoPy parser

Engage in a Conversation
UKUniversity of EdinburghINFR10065Compiling TechniquesImplement a ChocoPy parserPythonAST

CT 2022/23 | Coursework 1 - Parsing

Tasks: CourseNana.COM

  1. Core: Implement a ChocoPy parser
  2. Expert: Add diagnostic messages for syntax errors

Task 1 - Implement a ChocoPy parser

The goal of task 1 is to write a parser for the ChocoPy Language, in order to obtain its Abstract Syntax Tree (AST). To this end, you will work on two basic passes of the compiler frontend, that are given as classes: CourseNana.COM

  1. Lexer: the lexer reads the input file as a stream of characters and transforms it into a stream of tokens. Each token represents a lexeme (i.e. a word in natural languages).
  2. Parser: the parser consumes the tokens and determines if the input conforms to the rules of the grammar. As it examines each rule, it constructs the AST of the input program.

We provide some partial implementations of the lexer and parser. You will have to implement the rest. We strongly encourage you to write a recursive descent parser. For this, we have provided utility functions in the lexer class to allow tokenization and in the parser class to allow lookahead of tokens. The parser also contains the minimum functionality to parse the following ChocoPy programs: CourseNana.COM

# An empty program
def foo():
    pass

0. ChocoPy language

We strongly encourage you to familiarise yourself with ChocoPy before starting implementing the compiler. The lexical structure, syntax, type rules, and semantics of the language are explained in the reference manual. However, the exact grammar you need to implement is described in Section 2 . ChocoPy is designed to be a subset of Python. It is advised that you write several programs in ChocoPy to get familiar with it, especially if you are not already familiar with Python. More importantly, this will help you create your own test suite, which will later be necessary to evaluate your compiler's progress. CourseNana.COM

Example: ChocoPy program illustrating functions, variables, and static typing (source: ChocoPy Language reference). CourseNana.COM

def is_zero(items: [int] , idx: int) -> bool:
    val: int = 0        # Type is explicitly declared
    val = items[idx]
    return val == 0

mylist: [int] = None
mylist = [1, 0, 1]
print(is_zero(mylist,1))    # Prints ’True ’

1. Lexing

The file choco/lexer.py contains an implementation of: CourseNana.COM

  • a scanner
  • a tokenizer that leverages the scanner
  • a lexer that uses the tokenizer

Do not remove the existing methods, e.g. check and match. The tokens that your lexer recognizes are given in the TokenKind class. CourseNana.COM

For Linux environments, you can try the lexer functionality by running the choco-lexer tool (location tools/choco_lexer.py, but invokable also as choco-lexer): CourseNana.COM

cd /path/to/coursework
choco-lexer tests/parser/simple-statements/ops_coverage/single_stmt_pass.choc
./tools/choco_lexer.py tests/parser/simple-statements/ops_coverage/single_stmt_pass.choc # equivalent with the above

2. Grammar

Your next job will be to take the ChocoPy grammar expressed in EBNF form and transform it into an equivalent context-free LL(2) grammar . The target grammar on which you will be working follows. CourseNana.COM

Keywords and symbols are denoted with backquotes ` `, the rest of the terminals with capital letters, and non_terminals with lowercase letters. The brackets are used to group terms and are always followed either by: a * to represent a Kleene closure, a + for a positive closure, or a ? for an optional group. CourseNana.COM

      program := [var_def | func_def]* stmt* EOF
     func_def := `def` ID `(` [typed_var [`,` typed_var]*]? `)` [`->` type]? `:` NEWLINE INDENT func_body DEDENT
    func_body :=  [global_decl | nonlocal_decl | var_def]∗ stmt+
    typed_var := ID `:` type
         type := type_name | `[` type `]`
    type_name := `object`
               | `int`
               | `bool`
               | `str`
  global_decl := `global` ID NEWLINE
nonlocal_decl := `nonlocal` ID NEWLINE
      var_def := typed_var `=` literal NEWLINE
         stmt := simple_stmt NEWLINE
               | `if` expr `:` block [`elif` expr `:` block]* [`else` `:` block]?
               | `while` expr `:` block
               | `for` ID `in` expr `:` block
  simple_stmt := `pass`
               | expr
               | `return` [expr]?
               | [expr `=`]+ expr
        block := NEWLINE INDENT stmt+ DEDENT
      literal := `None`
               | `True`
               | `False`
               | INTEGER
               | STRING
         expr := cexpr
               | `not` expr
               | expr [`and` | `or`] expr
               | expr `if` expr `else` expr
        cexpr := ID
               | literal
               | `[` [expr [`,` expr]*]? `]`
               | `(` expr `)`
               | index_expr
               | ID `(` [expr [`,` expr]*]? `)`
               | cexpr bin_op cexpr
               | `-` cexpr
       bin_op := `+` | `-` | `*` | `//` | `%` | `==` | `!=` | `<=` | `>=` | `<` | `>` | `is`
   index_expr := cexpr `[` expr `]`

Please note that the simple_stmt operator allows assignments like -1 = 2 to be valid. Even though our parser accepts that, it is something that will be caught later as an error in subsequent passes. In general, at this stage, our test cases examine only the syntax of the program, not the semantics (e.g., assigning to a variable without defining it). CourseNana.COM

Also, please note that operators in ChocoPy have specific precedence and associativity rules. These are described in the following table: CourseNana.COM

Precedence Operators Associativity
1 · if · else · Right
2 or Left
3 and Left
4 not Not applicable
5 ==, !=, <, >, <=, >=, is None
6 +, - (binary) Left
7 *, //, % Left
8 - (unary) Not applicable
9 [] Left

Operators that lie on the same line have the same precedence. CourseNana.COM

Also, note that the comparison operators are nonassociative. So ChocoPy, unlike Python 3, does not allow expressions such as: CourseNana.COM

x < y < z
x == y > z

On the other hand, the conditional expression · if · else · is associative. To better understand the associativity of this ternary operator, you can treat it as a binary operator with an extra expression in the middle, that acts as a condition: CourseNana.COM

left-expr `if` condition `else` right-expr

Now, it is a binary operator that acts upon left-expr and right-expr. And, since it is right-associative, the following: CourseNana.COM

expr1 `if` condition1 `else` expr2 `if` condition2 `else` expr3

is equivalent to: CourseNana.COM

expr1 `if` condition1 `else` (expr2 `if` condition2 `else` expr3)

Finally, "Not applicable" means that it is meaningless to talk about associativity for not or the unary -, since they are unary operators, not binary/ternary/etc. Therefore, they don’t have both a left- and right-hand side, so we can’t talk about associativity. CourseNana.COM

3. Parser

After having transformed the grammar into a LL(2)-grammar you will have to complete the implementation of the parser. A partial implementation of a recursive-decent parser has already been provided inside the Parser class of choco/parser.py. CourseNana.COM

In addition, the Parser class contains some utility methods, e.g.: CourseNana.COM

  • check(expected) takes a variable number of expected tokens and returns whether the next tokens in the input are the same as the expected.
  • match(expected) attempts to match the given expected tokens and consumes them, failing otherwise.

These methods are used by the parsing methods of each non-terminal to facilitate parsing. CourseNana.COM

An effective way to develop your parser would be to implement incrementally parsing subroutines for some of your grammar's non-terminals. CourseNana.COM

More specifically, the ChocoPy features for which you are required to implement parsing rules are the following: CourseNana.COM

Feature Notes
Literals and identifiers identifiers, integers, strings, True, False, None
Variable definitions and declarations types, variable definitions, global/nonlocal declarations
Function prototypes and definitions typed argument list, return types, (uses parsing function for func_def)
Complex expressions - I index expressions, lists, function calls
Control flow if, while, for
Simple statements return, assignments, (uses parsing function for expr)
Arithmetic and comparison operators unary -, ==, !=, <=, >=,<, >, is, +, -, *, //, % + right precedence and associativity
Logical and conditional operators · if · else ·, and, or, not + right precedence and associativity
Complex expressions - II parenthesized expressions, mixed features

We recommend that you implement your parser by following the order of the table. This will help you build your parser incrementally, by parsing more and more complex programs. CourseNana.COM

For Linux environments, you can try the current minimal parser functionality by running the choco-opt tool (location tools/choco_opt.py, but invokable also as choco-opt): CourseNana.COM

cd /path/to/coursework
choco-opt tests/parser/simple-statements/ops_coverage/single_stmt_pass.choc
./tools/choco_opt.py tests/parser/simple-statements/ops_coverage/single_stmt_pass.choc # equivalent with the above

4. AST

During the parsing phase, you are also expected to generate the AST of the input program. This will be done using the API of a suitable AST dialect for ChocoPy. You can find it in choco/dialects/choco_ast.py. There, you can find python classes for every ChocoPy construct you need, along with their respective constructors. CourseNana.COM

Each class represents a node in the ChocoAST. Each node has a distinct name, along with some optional further fields, that are either attributes or regions. You can think of an attribute as a property of the node itself, and the regions as groups of one or more other nodes, that serve as the children of the node in the AST. CourseNana.COM

For example, the BinaryExpr class has the following fields: CourseNana.COM

class BinaryExpr(Operation):
    name = "choco.ast.binary_expr"

    op: OpAttr[StringAttr]
    lhs: SingleBlockRegion
    rhs: SingleBlockRegion

Moreover, the Literal class has the following fields: CourseNana.COM

class Literal(Operation):
    name = "choco.ast.literal"

    value: OpAttr[StringAttr | IntegerAttr | BoolAttr | NoneAttr]

An instance of the BinaryExpr is the following: CourseNana.COM

  choco.ast.binary_expr() ["op" = "+"] {
    choco.ast.literal() ["value" = 0 : !i32]
  } {
    choco.ast.literal() ["value" = 1 : !i32]
  }

We can see that the identifier of the node is choco.ast.binary_expr(). Also, we can see this node represents the operation +, which was the op attribute in the class fields. Finally, this node has two regions as children, each containing one node: lhs and rhs are nodes representing literals. Both nodes have the same name (choco.ast.literal), but different attributes, which represent different values ("value" = 0 : !i32 and "value" = 1 : !i32 respectively). CourseNana.COM

In every class, you can find a get method which instantiates the class. You will need to call these methods in your code. The code already contains some basic calls to this library. CourseNana.COM

5. Hints

Some hints to guide your implementation are the following: CourseNana.COM

Precedence

A simple way to achieve the desired precedence for the operators is to unfold the parsing rules in a hierarchical way. CourseNana.COM

An example of how to give the multiplication operator higher precedence than of addition is this: CourseNana.COM

# Same precedence
E := INTEGER `+` E
   | INTEGER `*` E
   | INTEGER

# `*` has higher precedence
E := T `+` E
   | T
T := INTEGER `*` T
   | INTEGER

The second block of rules would parse the expression 1 + 2 * 3 as 1 + (2 * 3). CourseNana.COM

Associativity

Getting the correct associativity can be implemented with the correct rule definition: CourseNana.COM

# `+` is right associative
E := T `+` E
   | T

# `+` is left associative
E := E `+` T  # Be aware of left recursion!
   | T

Left recursion elimination

The aforementioned example has left recursion: CourseNana.COM

E := E `+` T
   | T

As you have learned on the lectures, this can be eliminated by introducing a new auxiliary non-terminal E_aux: CourseNana.COM

    E := T E_aux
E_aux := `+` T E_aux
       | epsilon

As mentioned in the lectures, this is equivalent with: CourseNana.COM

E := T (`+` T)*

Therefore, in order to remove left recursion, you have two implementation options: CourseNana.COM

  1. Introduce a new non-terminal and create an additional parsing function:
def parse_E():
    parse_T()
    parse_E_aux()

def parse_E_aux():
    if check(PLUS):
        match(PLUS)
        parse_T()
        parse_E_aux()
  1. Parse the Kleene closure, without introducing a new non-terminal:
def parse_E():
    parse_T()
    while (check(PLUS)):
        match(PLUS)
        parse_T()

Both implementations are equivalent. CourseNana.COM

Implementing elif

The elif logic is essentially syntactic sugar for nested if statements, so you need to remove the elif when you create the AST. CourseNana.COM

The following code: CourseNana.COM

if {expr1}:
    {body1}
elif {expr2}:
    {body2}
elif {expr3}:
    {body3}

is equivalent to: CourseNana.COM

if {expr1}:
    {body1}
else:
    if {expr2}:
        {body2}
    else:
        if {expr3}:
            {body3}

First Sets auxiliary functions

You can make use of the is_expr_first_set and is_stmt_first_set functions. These functions have only one token as lookahead. Their purpose is to return true if the next token belongs to the First set of expressions and statements respectively. Let's focus on is_expr_first_set. CourseNana.COM

In the lectures, it has been discussed that the First set of α ∈ N | T, is the set of terminals that appear first in some string that derives from α. So, the First set of the expr nonterminal is the set of all the possible terminals that can be "first" in a string produced by expr. CourseNana.COM

For example, there is a rule expr -> `not` expr. This means that this rule can produce something like not (x+y), or not 3, or many other strings that start with not and are followed by an expression. The common characteristic of all these strings is that they begin with not. That's why we say that `not` belongs to the First set of expr. CourseNana.COM

Also, there is a rule expr -> cexpr, which can be expanded to expr -> `(` expr `). This rule can produce things like (x + y), (3), ((not True)). The common characteristic of all these strings is that they begin with (. That's why we say that the left round bracket `(` belongs to the First set of expr. CourseNana.COM

So far, we have found two terminals of the expression First set, and there are more. CourseNana.COM

The method is_expr_first_set just looks at the next token and tells you whether it belongs to the First set of expr. For example, it would return true if the next token is not or ( (and for some others also), but would return false for ). CourseNana.COM

The purpose of this function is to just help with your code, in case you need to check if the next token belongs to the First set of expr. Instead of having to check: CourseNana.COM

if self.check(TokenKind.NOT) or self.check(TokenKind.LROUNDBRACKET) or ... :
    ....

you can call the is_expr_first_set. CourseNana.COM

It is not necessary to use it, but mostly for convenience and as a "hint" that at some point you may need it. The same applies to is_stmt_first_set. CourseNana.COM

Task 2 - Add diagnostic messages for syntax errors

The goal of task 2 is to modify the given lexer and extend the parser you implemented in task 1, in order to add syntax error handling. CourseNana.COM

To this end, you need to understand the internals of the Scanner, Tokenizer and Lexer classes. You also need to extend your parser with the necessary syntax error recovery mechanisms. CourseNana.COM

The main idea is that for every syntax error your parser encounters, it should catch it and print helpful information on the error in the form of a message: CourseNana.COM

SyntaxError (line <row number>, column <column number>): <message>
>>>erroneous source line
>>>visual pointer to the line

For example, consider the following program which should give a syntax error: CourseNana.COM

  for i [1,2]
    pass

The in token is missing, so the expected message would need to be of the form: CourseNana.COM

# CHECK-NEXT: SyntaxError (line 1, column 9): token of kind TokenKind.IN not found.
# CHECK-NEXT: >>>  for i [1,2]
# CHECK-NEXT: >>>--------^

Things to notice here: CourseNana.COM

  • In the 1st line we see that the error occurs in line number 1.
  • In the 1st line we see that the error occurs in column number 9, i.e. where we found [ while expecting in.
  • In the 1st line we see the message is informing that the token TokenKind.IN was not found.
  • The second line starts with >>>.
  • The second line continues with exactly the full line of the source code where the error was, i.e. for i [1,2].
  • The source code line is printed until the end of line, i.e. the parser doesn't stop, until it finds the NEWLINE token.
  • The source code line is printed with all the trailing spaces in the beginning (2 spaces in the example).
  • The third line starts with >>>.
  • The 3rd line uses hyphens (-) followed by a caret (^) to point to the error that happened.
  • The hyphens start at column 1, relative to the source code line, and end right before the caret.
  • The caret points right at column 9, relative to the source code line, where the first character of the wrong token occurred.

You need to extend your lexer to provide location information, which will be used to print the location of the syntax error. Also, you need to extend your parser to catch the following syntax errors: CourseNana.COM

Error Message
Unmatched token token of kind \<expected> not found.
Wrong indentation Unexpected indentation.
Missing comma in a parameter/expression list expression found, but comma expected.
Function without at least one properly indented statement expected at least one indented statement in function.
Wrong type Unknown type.
Block without at least one properly indented statement expected at least one indented statement in block.
Empty left-hand side in assignment No left-hand side in assign statement.
Expression missing in statement Expected expression.
Declaring a variable among a statement sequence Variable declaration after non-declaration statement.
Right round bracket with no left counterpart unmatched ')'.
Chain of comparison operators Comparison operators are not associative.

When your parser catches the above errors, it should print the respective messages, with the format described before. Hint: to make testing easier choco-opt should return with a zero exit code even if an error is found. CourseNana.COM

Deciding which message to print

In general, when you are sure about the token you expect, and you encounter something else, we expect you to give the token of kind <expected> not found message. For example, there must always be a : after ) in function definition. CourseNana.COM

That said, there are cases where you don’t know necessarily what token should follow, but you know what tokens shouldn't follow. For example, when you have foo(x , you don’t necessarily expect ) after x, because also a , could follow. But, if you see another expression, e.g., y, then the string foo(x y has definitely a syntax error. And since there are two expressions, the parser assumes that a comma is missing between these two. CourseNana.COM

It could also assume that ) should be there, instead of y, but that isn’t very likely as an error from a programmer’s perspective, and the error messages' purpose is to help pinpoint the most probable root cause of the error. So it can help the programmer amend it. CourseNana.COM

As another example, consider the code: CourseNana.COM

if True:
    pass
    else:   <--- syntax error here
        pass

The if statement is followed by a block. Blocks should end with DEDENT. Now, the parser encounters else before a DEDENT is found, so token of kind TokenKind.DEDENT not found is a reasonable error to output. CourseNana.COM

However, in this example: CourseNana.COM

if True:
    pass
        else:   <--- syntax error here
        pass

Now, the parser encounters specifically INDENT before a DEDENT was found, so Unexpected indentation. is a reasonable error to output, as a more specific kind of error. CourseNana.COM

You can find more details on what the syntax errors look like and what is the expected behavior of your parser in tests/parser/syntax-errors. CourseNana.COM

Hint on grading: Task 2 is -- by design -- less documented, touches larger parts of the compiler, and is overall expected to be more difficult. Solving this task is seen as highly exceptional. If you feel you went even beyond what is expected in Task 2, please put at most five new test cases of less than 50 lines each into tests/parser/syntax-errors/outstanding-parsing-errors that demonstrate additional outstanding features your parser provides. CourseNana.COM

Implementation

You will be using GitHub to implement and submit this coursework. In particular, follow these five steps: CourseNana.COM

  1. Download this coursework
  2. Program your solution
  3. Test your changes locally
  4. Commit and push your changes to GitHub
  5. Check your points

Program your solutions

Inside the repository, you will find various source files. You should write your solutions for each task into the file indicated in the task description. CourseNana.COM

For the code to work, you need to install the required packages running: CourseNana.COM

cd <path-to-coursework-repo>
pip install -U -r requirements.txt
pip install -e .

If you're having errors during pip install about dependencies, please look at Dependency management with venv. CourseNana.COM

It would be convenient for you, if you used a modern IDE for Python. A popular choice, PyCharm, comes pre-installed in your DICE desktop environment. CourseNana.COM

If you decide to use PyCharm, in order to install the packages, you should open the embedded terminal (Alt+F12 by default) and follow the previous instructions using pip install -U -r requirements.txt. CourseNana.COM

Test your solutions

You can use lit to automatically test your code, which is include in the requirements.txt. CourseNana.COM

To run it locally, do: CourseNana.COM

lit -v tests/parser/

This will examine recursively all the files with valid formats inside the above directory. The -v flag adds a verbose output with more information in case some tests fail. You can also leverage the --timeout <seconds> flag, in order to bound the time allowed for your test cases to run. This way you can detect if your parser loops infinitely in some test cases. CourseNana.COM

For more details on the configuration of lit, see tests/lit.cfg. For more info on lit check the online documentation. CourseNana.COM

Commit and push your changes to GitHub

You are encouraged to commit your changes regularly. This allows you to track the history of your changes so that you can revert to an earlier version of your code if you need to. It also protects you from losing any of your work in the case of a computer failure. CourseNana.COM

Furthermore, every time you commit and push your changes in the main branch, your points are updated, giving you continuous feedback on your implementation. CourseNana.COM

Check your points

This badge shows how many successful test cases you've had so far CourseNana.COM

These points are provisional and are not related to the final grade. The number of successful test cases is only an indicator of how strong your implementation is, with respect to how many language features it covers. Your parser will be tested also on other test cases, so it is important that you write your own tests to ensure that your code is thoroughly tested. CourseNana.COM

Once you have pushed all of your changes to GitHub and you are happy with your code and your points, you're finished! At the deadline, we will revoke your write access to this repository and grade your work. CourseNana.COM

Questions

If you have questions, you should consult the lecture slides and recordings. If you have questions about the coursework, please start by checking existing discussions on Piazza. If you can't find the answer to your question, start a new discussion. It is quite possible that other students will have encountered and solved the same problem and will be able to help you. The TA will also monitor Piazza and clarify things as necessary, after allowing time for student discussion to take place. CourseNana.COM

Dependency Management with venv

If pip install -U -r requirements.txt fails with dependency resolution errors, one can create an isolated python environment using venv to prevent such errors. To setup venv for the assignment, follow the steps below (a summary is given below the bulleted list): CourseNana.COM

  1. Setup a virtual environment for the assignment with python3 -m venv env. This creates a subdirectory called env in the current folder that creates an isolated version of Python.
  2. Activate the virtual environment by sourceing the activation file: source env/bin/activate
  3. Confirm that you are in the virtual environment by running which python. The output should be /path/to/coursework/env/bin/python.
  4. Run pip install -U -r requirements.txt to install dependencies within the virtual environment.
  5. Run pip install -e . to install the ChocoPy compiler as a package.
  6. If you are using PyCharm, please configure PyCharm to work with the environment by following the instructions on this page of the PyCharm manual: Configure a virtual environment.

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
UK代写,University of Edinburgh代写,INFR10065代写,Compiling Techniques代写,Implement a ChocoPy parser代写,Python代写,AST代写,UK代编,University of Edinburgh代编,INFR10065代编,Compiling Techniques代编,Implement a ChocoPy parser代编,Python代编,AST代编,UK代考,University of Edinburgh代考,INFR10065代考,Compiling Techniques代考,Implement a ChocoPy parser代考,Python代考,AST代考,UKhelp,University of Edinburghhelp,INFR10065help,Compiling Techniqueshelp,Implement a ChocoPy parserhelp,Pythonhelp,ASThelp,UK作业代写,University of Edinburgh作业代写,INFR10065作业代写,Compiling Techniques作业代写,Implement a ChocoPy parser作业代写,Python作业代写,AST作业代写,UK编程代写,University of Edinburgh编程代写,INFR10065编程代写,Compiling Techniques编程代写,Implement a ChocoPy parser编程代写,Python编程代写,AST编程代写,UKprogramming help,University of Edinburghprogramming help,INFR10065programming help,Compiling Techniquesprogramming help,Implement a ChocoPy parserprogramming help,Pythonprogramming help,ASTprogramming help,UKassignment help,University of Edinburghassignment help,INFR10065assignment help,Compiling Techniquesassignment help,Implement a ChocoPy parserassignment help,Pythonassignment help,ASTassignment help,UKsolution,University of Edinburghsolution,INFR10065solution,Compiling Techniquessolution,Implement a ChocoPy parsersolution,Pythonsolution,ASTsolution,