LAMP: Advanced Project

Advanced Project

English only

Introduction

The advanced project part of this course is meant to give you the chance to work independently on a more advanced topic covered by this course.

Organization

There are several proposed projects, each having associated a number of points according to our estimation of its difficulty. If your implementation is perfect, you get that number of points. Your grade for this project will be the number of points you gather from all your projects that you implement (but not more than 6, the maximum grade). So if you implement a project with weight 4 perfectly, you will get grade 5 (1 point is 'free'). You are allowed to choose more than one project, as long as the sum of their points does not exceed 8. It is perfectly fine if you choose to implement 3 simpler projects (and therefore get an overview of several topics), or implement just one, very difficult project.

Here is a short description of each project, and its associated points. Some projects are divided in several phases, later phases usually depending on earlier phases (but now always). Unless stated otherwise, you are allowed to pick just a specific phase from a project.

During this project you are allowed to use C++ features in the VM code. The most interesting C++ feature is the standard template library itself (STL), which provides polymorphic collection classes.

Deliverables

You should submit your source code for the chosen projects, and a 2-5 pages report about what you did and what difficulties you encountered (it can be either in French or English).

Timeline

You should choose what projects you wish to do by Friday, June 2nd. However, we strongly encourage you to decide quickly and start working on them sooner, as these projects are more difficult and we provide less guidance than for the previous ones. The deadline for project submission is Friday, June 23.

Minischeme interpreter (1 point)

Write a simple interpreter for the mini scheme language. Your interpreter can run on the syntax trees so you can design it as a phase in your compiler.

The interpreter should check for run time errors (for instance, passing the wrong number of arguments to a function, or applying a value that is not a function).

JIT compilation

Phase 1: JIT the whole program (ahead-of-time) (4 points)

In this phase you will JIT-compile the whole program at load time. We suggest you use GNU Lightning, a portable library for JIT compilation.

In this first phase you may keep all VM registers in memory (R[]) so that no register allocator is required. Most VM instructions are straight forward to implement, but special care has to be taken for jumps. Since jumping is accomplished through CMOV to R31, you should take care of such occurences when you JIT compile. You should also take care of labels: taking the address of a label was actually loading a pointer to code_mem. When code is JIT compiled these labels will have to point to the JIT compiled buffer. In order to improve execution even more, you should hard code some of minivm's registers to machine registers. For instance, the PC and the frame pointer are used very often, so it would be beneficial to store them in registers instead of memory.

Proper JIT (5 points)

The first phase compiled all the program before it even started. Now we would like to do a more realistic JIT: compile only those parts of the program that are hot. You need to add some performance monitors to the VM in order to detect code that is frequently executed.

Once hot spots are identified, you need to decide what is the granularity on which you JIT compile. Since the VM has no concept of function, one possibility is to JIT compile on basic blocks. Another would be to extend the VM to have this concept (and extend the compiler to generate proper code for that). In general, you are allowed to make reasonable extensions to the VM and the compiler.

Again, jumps are tricky to get right (jumps might be inside JIT compiled code, or into byte code). You could never JIT compile jumps, but then loops (which are always present in hot spots) will not be as efficient as they could be, since each loop will require to jump in and out of native code.

You need to take care of entering and exiting native code sections. Here you probably need to update the R[..] register array (in case native code uses native registers for some of the VM registers), update R31, etc.

It would be nice if minivm registers were allocated to machine registers. Since there are only 6 of the general purpose ones, a reg allocator is needed. See project "Register allocator" for details.

References:

GNU Lightning info page

Precise garbage collection

The aim of this project is to replace the conservative, mark & sweep GC of your virtual machine by a precise, copying GC.

Phase 1 (4 points)

In a first phase, you can make the GC precise, by modifying your compiler so that pointers can be distinguished from integers at run time, for example using a tagging scheme.

Once this is done, replace the mark & sweep GC of your VM with a copying one. In order to minimise the memory requirements of your collector, we suggest that you use Cheney's algorithm, but you are free to choose another one.

Phase 2 (3 points)

Notice that even after the first phase, the GC will still be more conservative than it should, since it will for example trace all pointers present in registers, even though some of them will never be used again by the compiler. The same is true of some locations in the activation frames of procedure.

Therefore, in a second phase, we ask you to modify your compiler so that it communicates to the GC some information describing the set of live registers and activation frame live at one point. For registers, a simple bitmap would do, in which bit N would be set only if register N is live. For activation frames, you have to design a similar scheme yourself, and implement it. Notice that this liveness information must only be present (and accurate) when a GC can actually happen.

This phase cannot be chosen without the first one.
References:

Richard Jones and Rafael Lins, Garbage Collection : Algorithms for Automatic Dynamic Memory Management
Uniprocessor Garbage Collection Techniques
http://www.memorymanagement.org/

Optimisations

Implement several optimizations in the mini scheme compiler.

Inlining (2 points)

Since functions are values, the compiler does not always know which function is actually called at a call site. A control flow analysis is needed to approximate the set of possible call targets for a function application. If this set contains just one function, it can be inlined, or the call can be made more efficient by a 'static' jump to it.

Your optimization pass should at least take care of calls to global functions and calls to an immediate lambda (((lambda (...) ...) arg1 ...)).

Classical (imperative) optimisations (4 points)

Implement constant propagation, dead code elimination, copy propagation. All these are similar in that they use a data flow analysis to derive the necessary facts about the program.

You will need to write a control-flow like representation for functions, so that the algorithms presented in the course can be readily applied.

These optimizations may give back better results after inlining is performed.

References:

the course
Steven Muchnick, Advanced Compiler Design and Implementation

Continuations

The aim of this project is to add the call-with-current-continuation (also known as call/cc) primitive to your minischeme language.

Phase 1 (3 points)

One way of implementing that primitive is to transform the whole program to continuation-passing style, as explained in the course. The call/cc primitive can then be implemented easily, since it simply needs to "reify" the current continuation.

The aim of this first phase is to adapt the CPS transformation presented in the course to make it work for the whole minischeme language, and then implement it in the compiler. Once this is done, an implementation for call/cc should be made available to the programmer.

Phase 2 (2 points)

The CPS transformation presented in the course is very simple but generates too much code. An improved version of the transformation is presented in the paper Representing Control by Olivier Danvy and Andrzej Filinski.

The aim of this second phase is to transform your compiler to implement Danvy and Filinski's CPS transformation instead of the one given in the course.

Phase 3 (2 points)

As explained in the course, a contiunation represents the state of the machine at a given time. Therefore, it is also possible to add continuations to minischeme by saving and restoring that state, without having to transform the program to CPS.

The aim of this third phase is to use such a solution to implement continuations and then compare that approach with the one based on the CPS transformation. This comparison should be made in terms of implementation complexity, and in terms of run-time efficiency.

References:

The aim of this project is to improve the way registers are used by the compiler. Phase 1 can be completed alone, and is very useful when combined with JIT compilation, as the GNU lightning library only exposes very few registers.

Phase 1 (4 points)

The aim of the first phase is to improve the register allocator of your compiler. Currently, the register allocator is very simple and aborts compilation when not enough registers are available. You should change this by replacing the current register allocator with a linear-scan allocator which handles spilling.

Phase 2 (2 points)

In this phase, we ask you to modify your compiler so that it uses registers to store local variables instead of slots in the current activation frame.

This modification increases the register pressure, and puts the register allocator to good use.

This phase cannot be chosen without the first one.
References:

http://www.research.ibm.com/jalapeno/papers/toplas99.pdf

Safe execution (4 points)

The aim of this project is to ensure that the execution of the programs compiled by your compiler is safe. This means that all operations should check at run time that the data they receive is of the right kind.

For example, all arithmetic primitives should verify that their arguments are integers. Function application should check that what appears in the position of the function really is a closure, and that the number of arguments it receives matches what it expects. And so on...

Notice that all these checks should be performed at run time, since we explicitely do not want to add a type system to the language. This implies that the various kinds of data (integers, closures, arrays, etc.) should be distinguishable at run-time, for example by being tagged.

Super instructions (4 points)

Implement super instructions in the VM. For a simple instruction set like ours, most of the execution time is spent in instruction dispatch. If instructions did more 'real work', the relative cost of dispatch to meaningful computation would decrease. The idea is to create new VM instructions consisting of sequences of common instructions, and replace them by the new opcodes.

There are two ways in which this can be achieved: have a fixed set of 'super-instructions', or dynamically create super-instructions at run time. The first option means you look at the VM code and notice a number of code patterns that appear very often. Then you modifiy the VM to recognize and replace them by specialized opcodes. We prefer the other solution, in which the VM is smart enough to create these opcodes at run time.

The idea is that each basic block should be turned into a super instruction. There are some opcodes which cannot, because they contain C function calls, but other than that there should be no problem to achive that. Super instructions are created by simply copying the code that implements each instructions to a new location and storing a pointer to this address in the code area. This implies that the VM is already using 'threaded code'.

Special care should be taken for jump targets. Since code will be moved around, jumps in the code area have to be updated. Minivm has no jump instructions, instead a CMOV is used. You need to be able to identify jumps, and more important, the address where they jump, so that it can be updated. All jumps require at some point a 'LINT label' instruction. An easy way to know where pointers are handled is to keep a list of these instructions at load-time (when labels are resolved) and later consult this list for knowing whether a number should be updated or not.

References:

Ian Piumarta: Optimizing direct threaded code by selective inlining