LAMP -- Advanced compiler construction course

Overview

This week we continue the series of ungraded exercise with a small assignment regarding your VM. As you might recall, the minischeme compiler can generate bytecode for a special VM, which we named minivm. To get you introduced to the VM, you will implement most of the instructions (it's not a big set), and the threading function.

Get the source

To get the (incomplete) source code of minivm, go to your working copy of the project, and run darcs:

act-project $ darcs pull --dry-run
Pulling from "http://lamp.epfl.ch/~schinz/act-project"...
Would pull the following changes:
Thu Mar 22 07:59:44 CET 2007 Michel Schinz <Michel.Schinz@epfl.ch>
* Added virtual machine

Tue Mar 20 19:50:08 CET 2007 Michel Schinz <Michel.Schinz@epfl.ch>
* Added missing "not" primitive (SVN 3549)

Tue Mar 20 19:49:19 CET 2007 Michel Schinz <Michel.Schinz@epfl.ch>
* Fixed empty string parser problem (SVN 3549)

You see a list of changes to our repository, next you apply these changes to your local copy. Hopefully, there won't be any conflicts.

act-project $ darcs pull

Dracs will ask you for each patch whether you want to pull it or not. Answer yes to all to get all our changes.

To learn more about darcs, we strongly encourage you to read the darcs manual.

Using the vm

After you recompile your project, you can test that your minivm runs:

act-project $ vm/minivm --help

minivm (c) LAMP 2007
Usage: minivm [options] <input_file>

  -c,   --code          Set code memory size (default 131072)
  -e,   --heap          Set heap memory size (default 1048576)
  -h,   --help          Show this help screen
  -v,   --version       Print version information
  -i,   --interactive   Interactive execution mode
  -g,   --log           Log level (0 - ERROR .. 3 - DEBUG)

The code and heap options allow one to specify how much memory should the VM use for code and heap. In most cases these are enough, but for long running programs without a garbage collector, you might need to increase the size of the heap. The interactive mode is shortly described in the following section, and the logging specifies the amount of detail you want the VM to generate (the higher the more output you get) in minivm.log.

To test the VM, compile a minischeme program and then feed it into the VM. Note that the compiler needs to be passed explicitly the predefined.scm file, which contains basic functinos like null?, car, cdr, etc. If you don't specify an output file for the compiler (recognized by ending in .asm), the code will be printed to stdout.

act-project/src $ scala -cp classes/ ch.epfl.lamp.minischeme.Main libraries/predefined.scm ../tests/fact.scm fact.asm
act-project/src $ vm/minivm fact.asm
not implemented yet!

The minivm architecture

The parser

The VM accepts as input an assembler-like file. The syntax is very simple:

program     ::= line*
line        ::= comment | labeldef | instruction
comment     ::= ";" <char>* "\n"
labeldef    ::= ident ":" "\n"
instruction ::= opcode3reg reg reg reg "\n"
              | opcode2reg reg reg "\n"
              | LINT reg (reg|ident) "\n"
              | LOAD reg reg (int|ident) "\n"
              | STOR reg reg (int|ident) "\n"
ident       ::= <scheme_startident> <char>* 
reg         ::= (R|r) int
int         ::= [1..9][0..9]*

The parser makes the following assumptions:

Each instruction fits on exacly one line of input
If the first character of a line is a whitespace, the line is considered to be an instruction. If it's semicolon, it's considered to be a comment and ignored. Otherwise it's considered to be a label.

The parser is also responsible for building the code binary image used by the emulator. Each instruction is represented in two or three words, as follows:

1 word for the opcode
1 word for the three registers, 5 bits/register, starting with the most signifficant bit
for LOAD/STOR and LINT, an additional word to store the integer argument

All memory references (both into heap-allocated memory and the code segment) are kept as "hard" memory pointers: that means they point inside the VM address space. For example, the program counter is initialized to point at the beginning of the code memory variable, instead of 0. When labels are resolved, during parsing, they are also replaced by the memory address they point to.

The execution engine

The interpreter is a classical threaded code interpreter, with one twist: The VM supports a special mode called interactive, used for debugging. This allows step by step execution, inspecting the registers or disassemble code found at the current PC. To support this mode, there are two versions of each instruction: one that simply executes the instruction, and another one which first prompts the user and waits for input before executing (by delegating to the 'fast' version).

Memory

minivm provides two separate memory spaces: one is for code, and one is for heap. During parsing, code is loaded into code_mem. During execution, program requests for memory, issued by ALLOC are satisfied from the heap, pointed to by heap_mem. Even though there is no memory manager right now (or rather there is, but a very limited one), your code should only request memory blocks through the heap_alloc function. When a GC is added, it will hook on this function and perform garbage collection when no more memory is available.

Task

Your job is to fill in the missing pieces in the interpreter. They are marked by the error message Not implemented yet!. There are two places that need implmementation:

the VM instructions, in function threaded_exec inside src/vm/vm.c. This should be straight forward for most instructions. For ALLOC you need to allocate memory using the heap_alloc.
instruction threading, in function thread_code inside the same file, src/vm/vm.c. This function should iterate through code, identify opcodes and replace them by the corresponding memory address, found using the function parameter labels. The only tricky bit is to make sure you correctly identify opcodes, since instructions have different sizes.

Ecole polytechnique fédérale de Lausanne

Programming Methods Laboratory

Advanced compiler construction

Summary

News

Lecturer

Assistant

Questions