Ecole Polytechnique Fédérale de Lausanne
Getting started
English only

The goal of your first assignment is to have a first encounter with the framework you receive. As true engineers, we believe in "learning by doing" so your job will be to make several small changes to the compiler and the virtual machine.

In order to test that your changes really work, we propose you to write a simple function that will make use of the missing features. Write a takeWhile function, that given a list l and a predicate will return the longest prefix of l for which the predicate is true.

For example:

        (define takeWhile ...)

        (define greater-than-2
          (lambda (x)
            (> x 2)))

        (define print-list
          (lambda (l)
            (if (null? l)
              (let ()
                (print-int (car l))
                (print-list (cdr l))))))

           (take-while (cons 4 (cons 3 (cons 1 nil)))
should print "43"

The virtual machine you receive implements the specification you saw in the course, with one exception: CMOV is not recognized. Your job is to add it.

Before describing the task you have to do, we'll have a more detailed look at the VM.

The VM accepts as input an assembler-like file. The syntax is very simple:

program     ::= line*
line        ::= comment | labeldef | instruction
comment     ::= ";" <char>* "\n"
labeldef    ::= ident ":" "\n"
instruction ::= opcode3reg reg reg reg "\n"
              | opcode2reg reg reg "\n"
              | LINT reg (reg|ident) "\n"
              | LOAD reg reg (int|ident) "\n"
              | STOR reg reg (int|ident) "\n"
ident       ::= <scheme_startident> <char>* 
reg         ::= (R|r) int>
int         ::= [1..9][0..9]*

The parser makes the following assumptions:

  • Each instruction fits on exacly one line of input
  • If the first character of a line is a whitespace, the line is considered to be an instruction. If it's semicolon, it's considered to be a comment and ignored. Otherwise it's considered to be a label.

The loader will be called by the parser for each instruction found in the input file, and it is responsible for building the code binary image used by the emulator. Each instruction is represented in two or three words, as follows:

  • 1 word for the opcode
  • 1 word for the three registers, 5 bits/register, starting with the most signifficant bit
  • for LOAD/STOR and LINT, an additional word to store the integer argument

All memory references (both into heap-allocated memory and the code segment) are kept as "hard" memory pointers: that means they point inside the VM address space. For example, the program counter (register R31) is initialized to point at the beginning of the code memory variable, instead of 0. When labels are resolved, during parsing, they are also replaced by the memory address they point to.

The execution loop is a simple infinite loop which switches on the current opcode. Besides the normal sanity checks (memory pointers being inside the allocated memory, division by zero) there is not much to talk about.

The emulator may run in 'interactive' mode. In this mode it will wait user input after each instruction. The interface is very limited, it consists of only step, disassemble, print registers and quit.


To add a new instruction to the VM, you need to modify several components of the VM:

  • Modify the parser to understand the new opcode
  • Modify the loader to treat the new token
  • Modify the emulator to implement the new instruction
  • Modify the disassembler, to be able to print back the new instruction in interactive mode

Once this is done, you can (and we advice you to) start testing your changes with simple Scheme programs.

The language we use this year is a dialect of Scheme. The compiler you receive uses predefined functions to implement some of the primitives. You can find them in predefined.scm. While this works fine for most cases, the logical operators 'and' and 'or' don't implement shortcut logic. Shortcut logic allows the programmer to write 'safe' code like this:

        (if (and xs (cdr xs)) 
            (print-int (car (car xs)))
            (print-int 0))
This code should return the second element of xs, or 0 if the list has less than two elements. It should never fail with a null pointer dereference, and that relies on the fact that the right-hand side of 'and' is not evaluated when the left-hand side is false. However, when and is a function, in a strict language as minischeme, all arguments are evaluated prior to the function call and this example would crash for an empty list.

Your job is to modify the compiler such that both and and or implement shortcut logic. The easiest solution is to implement it as a tree transformation. For instance, whenever a call is made to and, replace it with an If node:

 (and tree1 tree2) 
   (if tree1 tree2 0)