Logo EPFL
LAMP
Ecole Polytechnique Fédérale de Lausanne
Part One: Threaded Interpreter
English only

The goal of this project is to implement and evaluate a direct threaded interpreter for minivm. You get a new framework which fixes some bugs you discovered last time, and implements the missing features (CMOV and shortcutting and/or).

As described in the course, threaded code execution is a more efficient interpretation technique than the "big switch" implementation. It consists in replacing each opcode in the program code with the address of the code that implements it. This way, moving from one instruction to the next can be done in a single jump to the address found at PC, as opposed to two, in the case of the switch: one to jump to the end of the switch, and another that dispatches on the opcode value at PC.

Let's have a look at a short example to see how code is currently represented in minivm and how it should be, to implement threaded-code execution.

      LINT R1 100
      LINT R2 1
      ADD  R3 R2 R1

The code shown above will be transformed by the loader in a sequence of binary data, according to the layout described in vm/loader/alignloader.c:

PC Code Comment
00 0x00000001 OP_LINT
04 0x04000000 R1 (R0 R0)
08 0x00000064 100 in Hexa
12 0x00000001 OP_LINT
16 0x08000000 R2 (R0 R0)
20 0x00000001 1 in Hexa
24 0x00000002 OP_ADD
28 0x18820000 R3 R2 R1

What we want to obtain, is an image where each opcode is replaced by the address of the code that implements it. It would look something like this:

PC Code Comment
00 0x0a878661 OP_LINT
04 0x04000000 R1 (R0 R0)
08 0x00000064 100 in Hexa
12 0x0a878661 OP_LINT
16 0x08000000 R2 (R0 R0)
20 0x00000001 1 in Hexa
24 0x0a87866f OP_ADD
28 0x18820000 R3 R2 R1

(the addresses in red are, of course, just an example. The actual address is to be determined at runtime).

The trick that enables us to retrieve the address of each opcode implementation is an extension of gcc known as labels as values. The address of a label can be stored and passed around as any other value, and later jumped to by using goto:

  l_ADD: 
    R[c] = R[a] + R[b];
    // ...

    void **label_add = &&l_ADD
    // ...
    goto **label_add;

We suggest you start by writing a new execution loop which does not use a switch to dispatch, but uses labels for each opcode implementation. At the end of each such block of code, you jump explicitely to the next instruction by calling goto **(R[31]). We suggest to write a NEXT_OP macro so that additional operations can be added to each instruction easily.

Note that since the opcode will be replaced by a pointer to the code implementing it, you cannot call decode anymore. To understand why, we need to see what decode expects. Given an address that points to an instruction, it decodes the opcode, and the arguments of that instruction. Not all instructions take the same number and type of arguments (for example, arithmetic instructions take three registers, while LINT takes a register and a word). This function, based on the opcode that it found, will return the right arguments back to the caller.

Since the opcode is not there anymore, decode will not work anymore. Instead, you should use decode_args, which given the address and the current opcode, decodes only the arguments.

The output of the loader is still the binary image we saw in the beginning of the assignment, so you need to add a pass which iterates over the code block and replaces each occurence of an opcode by the proper address. This step we call patching.

The easiest way to do it is to collect all label addresses in an array, indexed by the opcode, so we have the following property: labels[OP_LINT] == &&l_LINT, where we use l_LINT to denote the label for the implementation of LINT.

For threading the code, you need to know the beginning and the end of the code segment. These limits are found in code_mem and end_program. The first one points to the first instruction in the program, while the last one points immediately after the last instruction of the program.

Important:

  • Pointer arithmetic in C is very dangerous and sometimes misleading. For example:
            // ex. 1
            void* p = .. // some address
            p++;
    
            // ex. 2
            int* p = .. // some address
            p++
    
            // ex. 3
            struct header* p = ..// some address
            p++
    
    each example leaves the pointer p at a different address relative to the beginning! Example 1 increments p by 1 byte, the second by 4 bytes, and the last by the size of the structure header. Generally, incrementing a pointer is done with the size of the underlying type.
    That's why we strongly encourage you to use the macros defined in vm/util/pointers.h for both pointer arithmetic (INC_POINTER) and for load/stores from memory (LOAD_BYTE, LOAD_SHORT, LOAD_WORD and STORE_BYTE, STORE_SHORT, STORE_WORD)

To trigger threaded mode interpretation, you can pass -t on the command line. Type minivm --help to get a list of all options. When this option is set, the threaded_mode flag is set to 1. This option is currently not taken into account by your vm so you need to modify init_vm to call your threaded execution loop when this flag is set.

You can now test your vm by running it with -t appended on the command line: minivm -t tests/hello.asm. When debugging, it is useful to have a look in the log file that is generated by default. You find it in the current directory, under the name minivm.log.

Write some programs in minischeme (like factorial, prime numer generator, etc.) that take a long time to execute, and compare running times between normal and threaded interpretation. You should turn off logging so that execution time is not dominated by input/output (by passing --log 1). You might need to increase the heap size for your programs, which can be done like this: minivm --heap 2000000 to allocate ~2 MB of heap.

If you tried to run your threaded interpreter in the interactive mode (-i) you noticed that disassemble does not work anymore. This function relies on the decode function to print back the executing code. Since we replace opcodes by memory addresses, it is not able to understand what instructions are in the code.

Have a look in vm/emulator/vm.c at the function disassemble. You should add a check for the interactive mode, and if that's the case, replace the decoded opcode (which is wrong) with the correct one. You need to write a find_opcode function which given the address found in the code (which points to the implementation of the given opcode) returns the opcode we used to have. This is easily achieved if you have the array of labels as we suggested earlier: simply search through the array for the given address, and return the index where it is found!