Optimisations |
|||||
English only |
|||||
The aim of this project is to optimize as much as possible the code produced by the misc compiler. As a starting point, we give you a new version of a working implementation of the misc compiler. Up to type analysis, this compiler is mostly identical to the one of the first project, but the later phases are quite different. The back-end of the compiler (what comes after type analysis) is now structured as follows:
The output of the last phase is an executable RISC program, as usual. It does require a new version of the interpreter, though, as explained in the next section. The new compiler is also different in that it outputs textual labels to refer to functions and locations in the code, instead of numerical offsets. This makes it easier to debug the generated code, but requires a new version of the RISC interpreter. This new version is installed in INF, but people working on their own computers need to download it. This new RISC interpreter also includes the following new instructions, which make code generation easier:
Finally, this interpreter has a new option, called
In order to help you (and us) evaluate the effectiveness of the optimizations you introduce into the compiler, we provide you with the following set of benchmarks:
For this project, we strongly encourage you to use some kind of version-control system like CVS, Subversion, Meta-CVS, darcs, Open CM or arch. This will help you work with your group partner, and ease the merging of our changes with yours (we will certainly have to fix some bugs in our code as the course progresses). Here we will suggest some optimizations and give some hints on how to implement them. Given the current framework, Dead code elimination is probably one of the optimizations that will be easiest to implement. To implement dead code elimination you can start by looking at how Rewrite (in LinearScan) works. By writing a new version of rewrite for DeadCode you can probably quite quickly and easily get working code. (our implementation of DeadCode is 56 lines.) What dead code elimination needs to do is to traverse the
code of each basic block backwards and create new code for
the basic block by only emitting those instructions that are
live. (And those instructions that have a side effect such as
memory writes, function calls, returns, syscall, break, and
instructions manipulating the stack. We will soon provide
a new version of Instruction.java and Opcode.java that provides
you with the function To get really interesting results you probably have to apply
the dead code elimination iteratively. This can easily be done
by having flag DeadCode dc = new DeadCode(); while(dc.changed) dc.removeDeadCode(cfgs[i]); So DeadCode.java could look something like: package miscc; import java.util.*; class DeadCode { private Liveness liveness; private CFG cfg; public boolean changed = true; public void removeDeadCode(CFG cfg) { this.cfg=cfg; liveness = new Liveness(cfg); changed = false; cfg.visitRPO(new Rewrite()); } private class Rewrite extends CFG.EmptyVisitor { public void pre(CFG.Node node) { removeDeadInstrs(node); } } private void removeDeadInstrs(CFG.Node bb) { // SOME INTERESTING STUFF HERE // WHICH TRAVERSES bb.code BACKWARDS // EMITTING ONLY NON-DEAD INSTRUCTIONS // AS DETERMINED BY liveness.liveOut(bb) // AND sideEffect() } } The current code generator sometimes reuses the targetReg for both a subtree and for the result of the operation of a node. This produces some false dependencies in the code. The code would be more SSA-like if each expression used its own register. Function calls being quite expensive, inline expansion of functions can be a very worthwhile optimisation. It is especially interesting because it opens the door to many further optimisations, like constant folding and propagation, dead code elimination, etc. To implement inlining, one has to solve basically two problems:
There is no general method to decide when to perform inlining, and therefore one has to rely on heuristics. A first (necessary) condition to perform inlining is that the function being called must be known at compilation time, otherwise it is not possible to know what to inline. Then, one must take care not to make the compiler loop by trying to repeatedly inline recursive functions; we therefore suggest that you do not consider recursive calls as candidates for inlining. Finally, the size of the function called must somehow be taken into account, to avoid code size explosion. For this project, you can simply write a function to compute the "size" of a tree (for example by counting the number of nodes it contains) and only inline functions whose body is smaller than some value. We propose that you compute this size after performing inlining inside of the body of a function. Notice that many other heuristics than the one suggested can be used. With our solution, a function will either be inlined every time it is used, or never. This is due to the fact that we base our decision entirely on the definition of the function, not on its use(s). More realistic heuristics take individual decisions for every uses of a function. We suggest that you perform inlining on the tree directly, as this seems to be the easiest. That is, your inlining phase should be placed between type analysis and code generation. If you follow this recommendation, then the basic algorithm can be outlined as follows:
Currently, leaf functions (functions containing no function
call) save the The following three optimisations are relatively easy to implement, and can provide a modest but interesting improvement in code quality.
These three optimisations can be performed directly on the control-flow graph, but constant folding can also be performed on the tree, before code generation. All of them can make some code dead, and should therefore be run before dead code elimination. With the current compiler, every "cons" operation produces a system call to allocate memory for the cell. Since system calls are expensive, a better solution is to perform only one allocation for all the "cons" operations appearing in a single basic block, and then to divide the allocated memory among all of them. |
|