Project #2 - Parser
CSE3341 (Spring 2023)
Due Date: Sunday, Feb. 12 by 11:59pm
Guidelines
- The grader will NOT overcome compiler or debugging errors for you.
- All make-ups for lab must be accompanied by a documented and verifiable excuse well before the deadline. Given the severity of the emergency please inform me as soon as possible.
- Lab submissions will NOT be accepted via email to me or the TA.
- Work on the lab on your own. No group work!
Preliminaries
The goal of this project is to build a parser for the Fun language (refer to Project 1). At the end of this handout is the grammar you should follow for this project. Every valid input program for this language should be parsed, and every invalid input program for this language should be rejected with a meaningful error message. This project should be completed using Java or Python. Your submission should compile and run on stdlinux. It is your responsibility to ensure your code works there.
Constraints on your implementation:
- Use only the standard libraries of Java or Python.
- Your submission should compile and run in the standard linux environment the CSE department provides (stdlinux). I highly encourage you to do your development locally, but as a final step before submitting your code please make sure it works on stdlinux.
- Use the subscribe command - make sure you are subscribed to JDK-CURRENT or PYTHON-3.7. The TA/grader will not spend any time fixing/debugging your code - if it does not compile on stdlinux, your project will not be graded and you will get a 0.
- Please leave yourself enough time to ensure your solution runs on the stdlinux environment. If you wait until the last minute you may need to submit late with penalty.
Testing
I highly encourage you to develop/write your code using Unit Testing and TDD (Test Driven Development). You are not required to use a Unit Testing framework, but feel free to use one.
I have provided the “tester.sh” script which works similar to the script from Project 1. It will use the diff command to compare your output to the expected output.
The test cases are weak. You should do additional testing with your own test cases. Feel free to create and post additional test cases on piazza.
Main
Your project should have a main procedure in a file called Main.java or Main.py, which does the following in order:
- Instantiate the scanner with the file name given as a command line argument.
- Generate a parse tree for the input Fun program using recursive descent.
- Perform semantic checks on the parse tree.
- Use recursive descent to print the Fun program from the parse tree.
Make sure your parser can be ran using one of these commands on stdlinux so
that it will interact with my tester.sh and grading scripts:
- javac *.java java Main Correct/1.code
- python3.7 Main.py Correct/1.code
Input to the project
The input will be identical to the input for project 1. The scanner will process the sequence of ASCII characters in the file and produce a sequence of tokens as input to the parser.
Parse Tree Representation
The parser performs syntax analysis of the input token stream and builds a parse tree using the top-down recursive descent approach (see the textbook assigned readings and class notes). You may represent the parse tree however you like. In the “Suggestions” section I give a recommendation on how to do this, not a requirement.
Output from your project
All output should go to stdout. This includes error messages - do not print to stderr. The parser functions should only produce output in the case of an error. The printer functions should produce “pretty” code with the appropriate indentation, i.e. statement sequences for if/while statements should be indented, and nested statements should be further indented. I do not have strict format requirements here, the printer functions are mainly to help you verify you generated a correct parse tree and help you with your debugging. NOTE: THE PRINT FUNCTIONS SHOULD ONLY BE CALLED AFTER THE ENTIRE PROGRAM HAS BEEN PARSED! The purpose of the print functions is to verify you have a complete parse tree that has accurately stored all the information we will need to interpret the program in project 3. The print functions should have no interaction with the scanner.
Invalid Input and Semantic Checks
Your parser should recognize and reject invalid input. For any error, you should catch it and print a message. The message should have the form “ERROR: some description” (ERROR in uppercase). The description should be a few words and should accurately describe the source of the problem. You do not need to have sophisticated error messages that denote exactly where the problem is in the input file. After printing the error message to stdout, just exit back to the OS. But make sure you catch the error conditions! When given invalid input, your parser should not “crash” with a segmentation fault, uncaught exceptions, etc. Up to 20% of your score will depend on the handling of incorrect input and on printing useful error messages.
Scanner Errors
The scanner should make sure that the input stream of characters represents a valid sequence of tokens. For example, characters such as ‘_’ and ’%’ are not allowed in the input stream. Your scanner from Project 1 should already be catching these. If the scanner finds an error, the parser should stop executing.
Parser Errors
The parser should make sure that the stream of tokens is correct with respect to the context-free grammar. If the parser detects a token in the wrong place, it should print a meaningful error message and stop parsing.
Semantic Errors
After building the parse tree there are additional checks that need to be done to ensure that the program “makes sense” (semantic checking). For this project, the following semantic checking needs to occur: Verify that every ID being used has been declared in that scope and prior to being used. Variables declared in the DeclSeq are “global” variables and can be accessed from any scope. Variables declared in a StmtSeq are “local” variables and are only in scope for that StmtSeq or any nested scope. For example, there are two semantic errors in this program: program { int x; begin () { x=1; int y; y=x; if x==y then { z=y; int z; }
y=z; } }
- “z=y;” is before “z” has been declared
- “y=z” is happening after “z” has gone out of scope
Verify that IDs were declared with the appropriate type (int or ref) for how they are being used. There are two places where you need to check that the variables being used are of the correct type:
- “id = new;”, the id here needs to have been declared as ref, not int
- “id = share id”, both id’s here need to have been declared as ref, not int
Check for “doubly-declared” variable names. All variables declared within the same scope should have unique names, but variables declared in different scopes can have the same name. For example, program { int x, y; int z, x; begin () { x=1; } } should result in an error because there are multiple x’s declared in the global scope, but program { int x, y; begin () { int z, x; if y==z then { int x; x=1; } } }
should NOT result in an error because the three x’s are all declared in different scopes.
Suggestions
There are many ways to approach this project. Here are some suggestions:
-
Spend some time making sure you understand the grammar on the last page.
-
Make sure you understand the recursive descent parser described in lecture.
-
Plan to spend a significant amount of time on the parser. Once it is working correctly, the semantic checking and printing should be simple.
-
Have separate functions for the parsing and semantic checks. Trying to do these simultaneously will increase the complexity of your code and increase the likelihood of errors.
Represent the parse tree by creating a class for each nonterminal. Instances of these classes will then represent the nodes of your parse tree. Each class should contain at least a field for each child, a parse method, and a print method.
- The parse tree does not need to store everything, many tokens can be discarded after checking them. For example, the root node does not need to store the PROGRAM, BEGIN, and END tokens, we just to need to verify those were present in the input.
- Develop your solution INCREMENTALLY. Pick a small subset of the language (e.g. only a few of the grammar productions) and implement a fully functioning parser for that subset. Do extensive testing. Add more grammar productions. Repeat.