Overview
This week we continue the series of ungraded exercise with a
small assignment regarding your VM. As you might recall, the
minischeme
compiler can generate bytecode for a
special VM, which we named minivm
. To get you
introduced to the VM, you will implement most of the
instructions (it's not a big set), and the threading
function.
Get the source
To get the (incomplete) source code of minivm
, go
to your working copy of the project, and run darcs:
Pulling from "http://lamp.epfl.ch/~schinz/act-project"...
Would pull the following changes:
Thu Mar 22 07:59:44 CET 2007 Michel Schinz <Michel.Schinz@epfl.ch>
* Added virtual machine
Tue Mar 20 19:50:08 CET 2007 Michel Schinz <Michel.Schinz@epfl.ch>
* Added missing "not" primitive (SVN 3549)
Tue Mar 20 19:49:19 CET 2007 Michel Schinz <Michel.Schinz@epfl.ch>
* Fixed empty string parser problem (SVN 3549)
You see a list of changes to our repository, next you apply these changes to your local copy. Hopefully, there won't be any conflicts.
Dracs will ask you for each patch whether you want to pull it or not. Answer yes to all to get all our changes.
To learn more about darcs, we strongly encourage you to read the darcs manual.
Using the vm
After you recompile your project, you can test that your
minivm
runs:
minivm (c) LAMP 2007 Usage: minivm [options] <input_file> -c, --code Set code memory size (default 131072) -e, --heap Set heap memory size (default 1048576) -h, --help Show this help screen -v, --version Print version information -i, --interactive Interactive execution mode -g, --log Log level (0 - ERROR .. 3 - DEBUG)
The code
and heap
options allow one
to specify how much memory should the VM use for code and
heap. In most cases these are enough, but for long running
programs without a garbage collector, you might need to
increase the size of the heap. The interactive mode is shortly
described in the following section, and
the logging specifies the amount of detail you want the VM to
generate (the higher the more output you get) in
minivm.log
.
To test the VM, compile a minischeme
program
and then feed it into the VM. Note that the compiler needs to
be passed explicitly the predefined.scm
file,
which contains basic functinos like null?, car,
cdr
, etc. If you don't specify an output file for the
compiler (recognized by ending in .asm
), the code
will be printed to stdout
.
act-project/src $ vm/minivm fact.asm
not implemented yet!
The minivm architecture
The parser
The VM accepts as input an assembler-like file. The syntax is very simple:
program ::= line* line ::= comment | labeldef | instruction comment ::= ";" <char>* "\n" labeldef ::= ident ":" "\n" instruction ::= opcode3reg reg reg reg "\n" | opcode2reg reg reg "\n" | LINT reg (reg|ident) "\n" | LOAD reg reg (int|ident) "\n" | STOR reg reg (int|ident) "\n" ident ::= <scheme_startident> <char>* reg ::= (R|r) int int ::= [1..9][0..9]*
The parser makes the following assumptions:
- Each instruction fits on exacly one line of input
- If the first character of a line is a whitespace, the line is considered to be an instruction. If it's semicolon, it's considered to be a comment and ignored. Otherwise it's considered to be a label.
The parser is also responsible for building the code binary image used by the emulator. Each instruction is represented in two or three words, as follows:
- 1 word for the opcode
- 1 word for the three registers, 5 bits/register, starting with the most signifficant bit
- for LOAD/STOR and LINT, an additional word to store the integer argument
All memory references (both into heap-allocated memory and the code segment) are kept as "hard" memory pointers: that means they point inside the VM address space. For example, the program counter is initialized to point at the beginning of the code memory variable, instead of 0. When labels are resolved, during parsing, they are also replaced by the memory address they point to.
The execution engine
The interpreter is a classical threaded code interpreter, with one twist: The VM supports a special mode called interactive, used for debugging. This allows step by step execution, inspecting the registers or disassemble code found at the current PC. To support this mode, there are two versions of each instruction: one that simply executes the instruction, and another one which first prompts the user and waits for input before executing (by delegating to the 'fast' version).
Memory
minivm
provides two separate memory spaces: one is
for code, and one is for heap. During parsing, code is loaded
into code_mem
. During execution, program requests
for memory, issued by ALLOC
are satisfied from the
heap, pointed to by heap_mem
. Even though there is
no memory manager right now (or rather there is, but a very
limited one), your code should only request memory blocks
through the heap_alloc
function. When a GC is
added, it will hook on this function and perform garbage
collection when no more memory is available.
Task
Your job is to fill in the missing pieces in the interpreter. They are marked by the error message Not implemented yet!. There are two places that need implmementation:
- the VM instructions, in function
threaded_exec
inside src/vm/vm.c. This should be straight forward for most instructions. ForALLOC
you need to allocate memory using theheap_alloc
. - instruction threading, in function
thread_code
inside the same file, src/vm/vm.c. This function should iterate through code, identify opcodes and replace them by the corresponding memory address, found using the function parameterlabels
. The only tricky bit is to make sure you correctly identify opcodes, since instructions have different sizes.