LAMP -- Advanced compiler construction course

Overview

The goal of your first assignment is to have a first encounter with the framework we give you. As true engineers, we believe in "learning by doing" so your job will be to make several small changes to the compiler.

A crashcourse on Darcs

We will use darcs to distribute the framework we provide. Darcs is a revision control system like CVS and SVN, with the notable difference that there is no central repository: every working copy is a repository in its own right. Darcs makes our life easier in case we need to publish patches (a.k.a. fix bugs in the framework), but it can also make yours easier: since you need to use it to get the distribution, you can as well use it throughout your project.

Before you can the framework, you need to install darcs on your machine. If you are working on the computers in INF1, you need not worry, it has been done already. Otherwise, follow the instructions on darcs' webpage.

Our darcs repository is found at http://lamp.epfl.ch/~schinz/act-project. To get a working copy, simply run the following command in the directory where you want to check out the project:

$ darcs get http://lamp.epfl.ch/~schinz/act-project
Copying patch 1 of 1... done!
Applying patch 1 of 1... done.
Finished getting.

You should now have a directory act-project which contains the source files for your project. Supposing we release a patch to fix some bug, you can apply it to your own repository by going to your local copy and pulling changes from the remote repository:

$ darcs pull http://lamp.epfl.ch/~schinz/act-project
Pulling from "http://lamp.epfl.ch/~schinz/act-project"...
No remote changes to pull in!

To learn more about darcs, we strongly encourage you to read the darcs manual.

Improve the compiler

The language we use this year is a dialect of Scheme. The compiler you receive uses predefined functions to implement some of the primitives. You can find them in predefined.scm. While this works fine for most cases, the logical operators 'and' and 'or' don't implement shortcut logic. Shortcut logic allows the programmer to write 'safe' code like this:

        (if (and (not (null? xs)) (cdr xs)) 
            (print-int (car (car xs)))
            (print-int 0))

This code should return the second element of xs, or 0 if the list has less than two elements. It should never fail with a null pointer dereference, and that relies on the fact that the right-hand side of 'and' is not evaluated when the left-hand side is false. However, when and is a function, in a strict language as minischeme, all arguments are evaluated prior to the function call and this example would crash for an empty list.

Your job is to modify the compiler such that both and and or implement shortcut logic. The easiest solution is to implement it as a tree transformation. For instance, whenever a call is made to and, replace it with an If node:

 (and tree1 tree2)

==>

   (if tree1 tree2 0)

The minischeme compiler

The compiler is written in Scala and it's structure is similar to what you have seen in the Compilation course. The language and compiler are described in the project introduction slides.

Note: The latest release candidate of the scala compiler (2.4.0 RC1) is necessary to compile the project. This is the version installed on computers in IN1 (accessible, like all software used by this course) under ~iclamp/soft/bin. The latest version of the Scala Eclipse plugin comes with the right version of the Scala compiler.

The minischeme compiler does it's job in several passes, which are applied on the abstract syntax tree. The AST is produced by the parser, and input to the name analyzer. This phase attaches unique symbols to identifiers, and can be ran multiple times. In principle, it should be ran after each phase that transforms the ASTs in some way, since new identifiers might have been generated, and their symbols need to be created.

Eta expansion

The first phase after parsing (and name analysis) is eta expansion. As described in the slides, primitive functions can not, in principle, be passed around as first-class values. The job of eta expansion is to transform primitive functions found in value positions into functions, for example:

 (map + xs)

==>

   (map (lambda (a b) (+ a b)) xs)

Since eta-expansion produces functions which are not top-level functions, you will not be able to enjoy it until you have closure conversion.

Generator and Interpreter

After eta expansion (and another round of name analysis) the backend comes into play. In can be either the code generator, which produces code for minivm, or the interpreter, which runs directly on trees. For this assignment you will be using the interpreter. The interpreter is handy when you debug code, since it performs dynamic type checking, so finding bugs is much easier (although it runs quite slow compared to the VM version).

Working with the compiler

The project comes with a Makefile which builds the compiler for you. It needs to know the whereabouts of a scala compiler, so if scalac is not accessible in your PATH you should either add it (for computers in IN1, prepend /home/iclamp/soft/bin to your PATH) or edit the corresponding line in the Makefile to point to your scalac:

...
# Commands
CC := gcc
SCALAC := /path/to/scalac
TOUCH := touch
MKDIR := mkdir
...

To run the compiler you can type:

$ scala ch.epfl.lamp.minischeme.Main tests/fact.scm tests/fact.asm
$ scala ch.epfl.lamp.minischeme.Main tests/fact.scm -int

The second command is running the interpreter on test.scm.

Minischeme warmup

In order to get acquainted with the wonderful world of minischeme, we ask you to complete an arithmetic parser evaluator. The skeleton is found in assignments/part0/calculator.scm. Your program will receive an arithmetic expression as input, and should produce it's value as output.

The expression language

The expression language consists of the four arithmetic operators, integer constants and variables. The eval function needs an evironment which holds values for variables which might appear in the expression. It is an error for a variable to appear in the expression and not be bound in the environment.

Expressions are represented as lists:

Integer constants:: ('c' <number>), where 'c' stands for the ASCII code of character c
Variables:: ('v' <character>), where 'v' stands for the ASCII code of character v
Terms:: (op <e1> <e2>), where op stands for the ASCII character of one of +, -, *, / and <e1,2> are expressions.

Environements are represented as associative lists, that is a list of pairs.

We provide you with several utility functions for creating lists (list-1 to list-4), testing wheather a given character is an operator (operator?), printing expressions (print-expr) and getting a value from the environment (assoc).

Your task is to fill in the eval function so that it returns the result of evaluating the given expression in the given environment.

Ecole polytechnique fédérale de Lausanne

Programming Methods Laboratory

Advanced compiler construction

Summary

News

Lecturer

Assistant

Questions