Updates

Note: Project GC, part 1.1, and Dynamic typing, have been modified to allow changes in the VM.

Bugfix: The interpreter has been patched to fix a bug in forward references in top level definitions (see mail on newsgroup). The patch can be applied as usual, from our darcs repository.

Introduction

The advanced project part of this course is meant to give you the chance to work independently on a more advanced topic covered by this course.

Organization

There are several proposed projects, each having associated a number of points according to our estimation of its difficulty. If your implementation is perfect, you get that number of points. Your grade for this project will be the number of points you gather from all your projects that you implement (but not more than 6, the maximum grade). So if you implement a project with weight 4 perfectly, you will get grade 5 (1 point is 'free'). You are allowed to choose more than one project, as long as the sum of their points does not exceed 8. It is perfectly fine if you choose to implement 3 simpler projects (and therefore get an overview of several topics), or implement just one, very difficult project.

Here is a short description of each project, and its associated points. Some projects are divided in several parts, later parts usually depending on earlier parts (but now always). Unless stated otherwise, you are allowed to pick just a specific part from a project.

During this project you are allowed to use C++ features in the VM code. The most interesting C++ feature is the standard template library itself (STL), which provides polymorphic collection classes.

Deliverables

You should submit your source code for the chosen projects, and a 2-5 pages report about what you did and what difficulties you encountered (it can be either in French or English). The source code you submit must run on the IN computers, as we won't test it on any other architecture.

Timeline

There is no deadline for choosing the projects that you do. However, we strongly encourage you to decide quickly and start working on them sooner, as these projects are more difficult and we provide less guidance than for the previous ones. The deadline for project submission is Friday, June 22.

1 Copying garbage collector

The aim of this project is to replace the conservative, mark & sweep GC of your virtual machine by a copying GC.

Part 1.1: copying GC (4 points)

The first part of this project consists in replacing the current mark & sweep GC of your VM with a simple copying one. This part can be accomplished in two steps:

  1. Without changing the GC yet, modify your compiler to make sure that integers can be distinguished from pointers at run-time. For this, we suggest you implement a tagging scheme as described in the course, where the integer n is represented as 2n+1. Notice that you should not have to modify your VM in any way for this first step, as the compiler alone can manage the tagging. You are allowed to modify the VM too.
  2. Replace the mark & sweep GC by a copying one. We suggest you use Cheney's algorithm, as it is elegant and uses little additional memory, but you are of course free to use another one.

After implementing the copying GC, you should measure how well it performs relative to the mark & sweep GC, using a set of benchmarks.

Part 1.2: improved copying GC (3 points)

(Depends on part 1.1)

Notice that even after the first part, the GC will still be more conservative than it should, since it will for example trace all pointers present in registers, even though some of them will never be used again by the compiler. The same is true of some locations in the activation frames of procedure.

Therefore, in this second part, we ask you to modify your compiler so that it communicates to the GC some information describing the set of registers and activation frame fields that are live at one point. For registers, a simple bitmap can do, in which bit n is set only if register Rn is live. For activation frames, a similar scheme has to be designed. Notice that all this information about which locations are live must only be present (and accurate) when a GC can actually happen.

After improving your copying GC, you should measure the gain offered by your changes on a set of benchmarks. You should measure both the gain in running time, and in the amount of memory collected.

References

2 Continuations

The aim of this project is to add the call-with-current-continuation (a.k.a. call/cc) primitive to minischeme.

Part 2.1: machine continuations (2 points)

As explained in the course, a continuation represents the state of the machine at a given time. Therefore, it is possible to add continuations to a language by saving and restoring that state, without having to transform the program to continuation-passing style.

The aim of this first part is to add call/cc to minischeme using this technique. Notice that this is made very easy by the fact that our compiler allocates all stack frames on the heap!

Once the implementation is done, you should demonstrate that your continuations work by writing some test programs. These tests should make use of continuations in non-trivial ways (e.g. by throwing a continuation several times).

Part 2.2: continuation passing style (3 points)

(Depends on part 2.1)

Another way to add continuations to a language is to transform the whole program to continuation-passing style (CPS). The call/cc primitive can then be implemented easily, since it simply needs to reify the current continuation that it recieves as argument.

The aim of this second part is to adapt the CPS transformation presented in the course to make it work for the whole minischeme language, and then implement it in the compiler. Once this is done, an implementation for call/cc should be made available to the programmer.

Once this second implementation is done, you should compare it with the first, in terms of performance of the generated code.

Part 2.3: improved continuation passing style (2 points)

(Depends on part 2.2)

The CPS transformation presented in the course is very simple but generates too much code. An improved version of the transformation is presented in the paper Representing Control by Olivier Danvy and Andrzej Filinski.

The aim of this third part is to transform your compiler to implement Danvy and Filinski's CPS transformation instead of the one given in the course.

Once this is done, you should compare the two CPS transformations in terms of code size and execution speed.

References


3 Super-instructions

The aim of this project is to try to speed up the VM using super-instructions.

There are basically two ways to add super-instructions to a VM: statically or dynamically. Static super-instructions are simply instructions that combine several other instructions on the ground that those instructions happen often in sequence. Dynamic super-instructions are created just before the program is run, and are therefore adapted to the program being executed.

Part 3.1: static super-instructions (2 points)

This first part consists in adding static super-instructions. In order to know which instructions should be combined into super-instructions, you should instrument the VM to detect frequently-occuring instruction sequences. After having performed the measurements on several benchmark programs, you should add the super-instructions that seem worthwile and modify the compiler or the VM to take advantage of them.

Once your implementation works, you should measure the gain obtained by the super-instructions you added, both in terms of code size and execution time, using benchmark programs.

Part 3.2: dynamic super-instructions (4 points)

The second part consists in adding dynamic super-instructions using the scheme described by Ian Piumarta and Fabio Riccardi in their paper Optimizing direct threaded code by selective inlining. Basically, their technique consists in producing one super-instruction per basic block of the program to interpret. The code for these super-instructions is obtained by concatenating the compiled C code corresponding to the instructions composing the basic-block.

Once your implementation works, you should measure its gain in terms of execution time using a set of benchmark programs. If you also completed part 3.1, you should additionally compare the gain offered by the two kinds of super-instructions.

References

4. Just In Time (JIT) compilation

The aim of this project is to add native compilation (just-in-time or JIT) capabilites to the virtual machine.

Part 4.1: JIT the whole program (3 points)

In this part you will JIT-compile the whole program at load time (sometimes called ahead-of-time compilation, to distinguish it from just-in-time compilation). We suggest you use GNU Lightning, a portable library for JIT compilation.

Most VM instructions are straight forward to implement, but special care has to be taken for labels: taking the address of a label is done by loading a pointer to code_mem. When code is JIT compiled these labels will have to point to the JIT compiled buffer.

Although we don't require a register allocator for this part, your code should register-allocate FP and R1 to machine registers, as they are the most common used registers.

Once your implementation works, you should test it for performance. Use several non-trivial programs and compare executino times with and without JITting.

Part 4.2: Proper JIT (4 points)

The first part compiled all the program before it even started. Now we would like to do a more realistic JIT: compile only those parts of the program that are hot. You need to add some performance monitors to the VM in order to detect code that is frequently executed.

Once hot spots are identified, you need to decide what is the granularity on which you JIT compile. Since the VM has no concept of function, one possibility is to JIT compile on basic blocks. Another would be to extend the VM to have this concept (and extend the compiler to generate proper code for that). In general, you are allowed to make reasonable extensions to the VM and the compiler.

Again, jumps are tricky to get right (jumps might be inside JIT compiled code, or into byte code). You could never JIT compile jumps, but then loops (which are always present in hot spots) will not be as efficient as they could be, since each loop will require to jump in and out of native code.

Entering and exiting native code sections needs special care: for instance, you probably need to update the R[..] register array, in case the native code uses native registers for some of the VM registers.

Once your implementation works, you should test it for performance. Use several non-trivial programs and compare executino times with and without JITting. If you did part 4.1, compare the two implementations in terms of performance. Try to explain the differences you observe.

Part 4.3 Register allocator (3 points)

Depends on Part 4.1 or 4.2

GNU Lightning does not provide as many registers as minivm does. In order to make your code run with maximum efficiency, you need a register allocator. You should implement linear-scan register allocation, as described in Poletto and Sarkar, Linear Scan Register Allocation. Spilling could be done by using the R register array in your VM.

Use a benchmark suite to compare execution times of the VM with and without register allocation.

References:

5. Dynamic type checking (3 points)

The aim of this project is to implement dynamic type checking in minischeme. This means that all operations should check at run time that the data they receive is of the right type.

For this project you add a phase which generates the right run-time checks to ensure that a program compiled in this way behaves like being run in the interpreter. The only difference allowed is that the stack trace is not required in case of a type error. The program can simply stop after printing a meaningful error message. A meaningful error message is not 'Type error', but something more explanatory, like: 'Wrong number of arguments when calling ...'. It would be nice to keep line numbers in error messages.

You are allowed to make small changes to the VM, in order to reduce the amount of code the compiler has to generate.

Here are some checks you should perform. For a complete list, check the interpreter implementation, which can be considered as an executable specification of this project:

In order to make this checks at run-time, values need to carry their type. Different representations are presented in the memory management lecture.

Once your implementation works, you should test it for performance as compared to the unchecked version. You should observe a performance degradation with regard to the unchecked version. Also, compare your implementation with the interpreter. Which one is faster?

References

6. Inlining (3 points)

The aim of this project is to implement inlining and constant folding in the minischeme compiler.

You can implement inlining as another phase of the compiler. Your compiler should be able to inline global functions and immediate lambda terms, like ((lambda (...) ...) arg1 ...).

Inlining usually generates some boilerplate code which needs some cleaning, so you should also implement a simple version of constant folding. Constant expressions are either

Constant expressions should be replaced by an integer which represents their value, as it would be computed by running the program.