Overview
        This week we continue the series of ungraded exercise with a
        small assignment regarding your VM. As you might recall, the
        minischeme compiler can generate bytecode for a
        special VM, which we named minivm. To get you
        introduced to the VM, you will implement most of the
        instructions (it's not a big set), and the threading
        function.
      
Get the source
        To get the (incomplete) source code of minivm, go
        to your working copy of the project, and run darcs:
      
Pulling from "http://lamp.epfl.ch/~schinz/act-project"...
Would pull the following changes:
Thu Mar 22 07:59:44 CET 2007 Michel Schinz <Michel.Schinz@epfl.ch>
* Added virtual machine
Tue Mar 20 19:50:08 CET 2007 Michel Schinz <Michel.Schinz@epfl.ch>
* Added missing "not" primitive (SVN 3549)
Tue Mar 20 19:49:19 CET 2007 Michel Schinz <Michel.Schinz@epfl.ch>
* Fixed empty string parser problem (SVN 3549)
You see a list of changes to our repository, next you apply these changes to your local copy. Hopefully, there won't be any conflicts.
Dracs will ask you for each patch whether you want to pull it or not. Answer yes to all to get all our changes.
To learn more about darcs, we strongly encourage you to read the darcs manual.
Using the vm
        After you recompile your project, you can test that your
        minivm runs:
      
minivm (c) LAMP 2007 Usage: minivm [options] <input_file> -c, --code Set code memory size (default 131072) -e, --heap Set heap memory size (default 1048576) -h, --help Show this help screen -v, --version Print version information -i, --interactive Interactive execution mode -g, --log Log level (0 - ERROR .. 3 - DEBUG)
        The code and heap options allow one
        to specify how much memory should the VM use for code and
        heap. In most cases these are enough, but for long running
        programs without a garbage collector, you might need to
        increase the size of the heap. The interactive mode is shortly
        described in the following section, and
        the logging specifies the amount of detail you want the VM to
        generate (the higher the more output you get) in
        minivm.log.
      
        To test the VM, compile a minischeme program
        and then feed it into the VM. Note that the compiler needs to
        be passed explicitly the predefined.scm file,
        which contains basic functinos like null?, car,
        cdr, etc. If you don't specify an output file for the
        compiler (recognized by ending in .asm), the code
        will be printed to stdout.
      
act-project/src $ vm/minivm fact.asm
not implemented yet!
The minivm architecture
The parser
The VM accepts as input an assembler-like file. The syntax is very simple:
program     ::= line*
line        ::= comment | labeldef | instruction
comment     ::= ";" <char>* "\n"
labeldef    ::= ident ":" "\n"
instruction ::= opcode3reg reg reg reg "\n"
              | opcode2reg reg reg "\n"
              | LINT reg (reg|ident) "\n"
              | LOAD reg reg (int|ident) "\n"
              | STOR reg reg (int|ident) "\n"
ident       ::= <scheme_startident> <char>* 
reg         ::= (R|r) int
int         ::= [1..9][0..9]*
The parser makes the following assumptions:
- Each instruction fits on exacly one line of input
- If the first character of a line is a whitespace, the line is considered to be an instruction. If it's semicolon, it's considered to be a comment and ignored. Otherwise it's considered to be a label.
The parser is also responsible for building the code binary image used by the emulator. Each instruction is represented in two or three words, as follows:
- 1 word for the opcode
- 1 word for the three registers, 5 bits/register, starting with the most signifficant bit
- for LOAD/STOR and LINT, an additional word to store the integer argument
All memory references (both into heap-allocated memory and the code segment) are kept as "hard" memory pointers: that means they point inside the VM address space. For example, the program counter is initialized to point at the beginning of the code memory variable, instead of 0. When labels are resolved, during parsing, they are also replaced by the memory address they point to.
The execution engine
The interpreter is a classical threaded code interpreter, with one twist: The VM supports a special mode called interactive, used for debugging. This allows step by step execution, inspecting the registers or disassemble code found at the current PC. To support this mode, there are two versions of each instruction: one that simply executes the instruction, and another one which first prompts the user and waits for input before executing (by delegating to the 'fast' version).
Memory
      minivm provides two separate memory spaces: one is
      for code, and one is for heap. During parsing, code is loaded
      into code_mem. During execution, program requests
      for memory, issued by ALLOC are satisfied from the
      heap, pointed to by heap_mem. Even though there is
      no memory manager right now (or rather there is, but a very
      limited one), your code should only request memory blocks
      through the heap_alloc function. When a GC is
      added, it will hook on this function and perform garbage
      collection when no more memory is available.
    
Task
Your job is to fill in the missing pieces in the interpreter. They are marked by the error message Not implemented yet!. There are two places that need implmementation:
- the VM instructions, in function threaded_execinside src/vm/vm.c. This should be straight forward for most instructions. ForALLOCyou need to allocate memory using theheap_alloc.
- instruction threading, in function thread_codeinside the same file, src/vm/vm.c. This function should iterate through code, identify opcodes and replace them by the corresponding memory address, found using the function parameterlabels. The only tricky bit is to make sure you correctly identify opcodes, since instructions have different sizes.