Computer Science Department
Programming Methods Laboratory
(LAMP)
Ecole Polytechnique Federale de Lausanne
Concurrence et Compilation (Compilation)     summer semester 2001

Introduction
Installation
Scanner
Parser
Howto
Links
Main
After discussion with some of you and other people, I have decided to drop the project. I misestimated the amount of work involved in it, it simply is too big for this class.

Introduction

This is the programming project for the course. You can do the project in three different sizes. The base project implements a minimal subset of the language. The standard project implements the language as given and is what is recommended to do. The extended version implements even more features.

You can sometimes use different sizes for different phases of the compiler, but for example a standard semantic analysis will require at least a standard parser.

Installation

You have to install JLex and java_cup. The easiest way is to do the following:
  • Create a directory ~/compilation/classes
  • Copy JLex.jar and java_cup.jar to that directory.
  • If you want you can add the jar files to your classpath with
    setenv CLASSPATH $HOME/classes/JLex.jar:$HOME/classes/java_cup.jar;
    (for csh) or
    CLASSPATH=$HOME/classes/JLex.jar:$HOME/classes/java_cup.jar;
    export CLASSPATH;

    (for sh).
  • If you want to keep your old CLASSPATH as well, you can append it with :$CLASSPATH.
See also the local Java - HOWTO.

Assignment1: Scanner

Your first task is to write a scanner for the language "filter". We will use the tool JLex to generate a file Scanner.java from the file filter.lex. (Again, the local Java - HOWTO explains how to call JLex in more detail).

You can download the file filter.lex, in which some parts are already done. What is left to you is the addition of the patterns for the tokens and the corresponding actions. [What is done, is position management (counting of lines and columns), construction of the object for the token (which is handed to the parser) and a string representation of tokens, which is useful for debugging.]

For the token data this is already done. You have to do it similarly for all the other tokens. The specification, what kind of tokens, white space and comments should be recognized are in the language specification. The patterns that you can use are on slide 15ff of lexical analysis.

There are some additional files which you should also download: Position.java is useful for handling positions, Report.java is a simple error module, and Tokens.java is an interface, which contains the definition of an integer for each token. Finally, there is a class ScannerTest.java, which has a main method and allows you to test your Scanner on the standard input.

You can use byte.filter as a sample input file:
java filter.ScannerTest < byte.filter

JLex reacts weird to comments in the pattern section, so only put them inside commands. Also do not use spaces in patterns except between quotes " ", in sets [ ], or after a backslash \ .

Assignment2: Parser

The source file for the parser is filter.cup. Here, we describe what terminals and non-terminals we have, the precedence of the operators and the rules (productions) of the grammar. The given file has already some definitions, you have to add the remaining ones. The original grammar is given in the language specification. You will have to transform it into BNF for using it in filter.cup.

Now you can use java_cup to generate the two files Tokens.java and Parser.java from filter.cup (See also the local Java - HOWTO on using java_cup). Parser.java then contains the Java source-code for the parser and Tokens.java contains the token-ids that must also be known to the scanner.

There is also a given class ParserTest.java, which allows you to test your parser.:
java filter.ParserTest < byte.filter

Most likely java_cup will give you warnings or error-messages.

  • Warnings about declared but unused tokens. Ignore these!
  • Errors about undeclared non-terminals. You have to declare them!
  • Errors for shift-reduce conflicts. Often these show up for ambiguous grammars. For expressions you can fix this giving all precedences of operators. Otherwise, you have to resolve the ambiguities.

    Sometimes, there are shift-reduce conflicts in unambiguous grammars. In this case, try to make the grammar LL(1).

Try to test your parser also with other input files.

Related Links



Teaching
LAMP homepage
Last modified: 12.03.2001, Christoph Zenger <christoph.zenger@epfl.ch>