Introduction

The goal of this first graded assignment is to write a conservative mark&sweep garbage collector for the virtual machine. This garbage collector should replace the current memory module, which does not free memory.

To begin this assignment, you can either use your current version of the project, or start anew with our own minischeme.zip that includes the solution to the two ungraded assignments.

The mark phase

The mark phase identifies the set of reachable memory blocks, and marks them so that the sweep phase can then free unmarked blocks.

Traversing the object graph requires identifying pointers to objects. Since the minischeme compiler does not produce the information necessary to differentiate pointers from integers, a conservative approach has to be taken: everything that looks like a pointer to a heap block is treated as such, which means that the block it refers is considered as reachable.

To identify pointers, the mark phase must repeatedly look at values and determine whether they represent the address of the beginning of an allocated block, or not. This should be done as efficiently as possible. One technique is to manage a bitmap with one bit per heap address, in which a bit is set if and only if the corresponding address designates the beginning of an allocated block.

Once an allocated block has been identified as (conservatively) reachable, it must be marked so that the sweep phase knows it cannot be freed. This mark bit can conveniently be stored in the least-significant bit of the size, in the block header. This is because block size is expressed in bytes, and the size of any block is at least a multiple of two.

As we have seen in the course, when a block is marked, it must be searched for pointers, and the blocks designated by these pointers must be recursively marked. For this, you are allowed to write a recursive mark function, although this solution is far from optimal.

The sweep phase

The sweep phase traverses the whole heap, block after block, and puts those that are not marked in the free list. Adjacent free blocks are coalesced. The mark bit of blocks is also cleared in the process, to guarantee that all blocks are unmarked at the beginning of the next garbage collection.

Notice that the free list is rebuilt from scratch in this phase.

Tips and tricks

Your garbage collector must at least run on the IN machines, which have a 64 bits architecture. Ideally, your code should be completely independent of the architecture size, but this might be difficult to achieve in the limited time at your disposal. Therefore, we strongly recommend that you do all your testing and debugging on a 64 bits architecture.

To test your garbage collector, write long-running programs that consume memory. To help you, we provide a single compiled program, bignums.asm, that computes the factorial of 200.

Solution submission

Your solution must be submitted as a ZIP archive containing exactly the following files - not one more, not one less:

  README  (short description of your solution)
  vm/engine.h
  vm/engine_switch.c
  vm/fail.c
  vm/fail.h
  vm/loader.c
  vm/loader.h
  vm/main.c
  vm/memory.h
  vm/memory_marksweepgc.c  (your GC)
  vm/opcode.h

The README file should contain a short description of your solution and its peculiarities, as well as the name of all its authors.

Except for vm/memory_marksweepgc.c, the C files should be identical to the ones you received initially. If they are not, please explain why in the README. Compiling all those C files together should produce a working, garbage-collected VM.

To help you check the contents of the archive, we provide you with a Scala script called SubCheck3.scala. When run with your archive as argument, this script should not report any problem.

The archive containing your solution should be attached to an e-mail sent to michel.schinz+acc3@gmail.com, before 23:59, April 3, 2008.

Requirements summary

To get the maximum number of points, your solution must at least satisfy all of the following constraints:

  1. When run on the IN machines, your VM must be able to compute the 30th element of the Fibonacci series using the naïve, recursive algorithm, and a heap size of 50'000 bytes (i.e. with argument -m 50000).
  2. When run on the IN machines, your VM must be able to successfully execute the bignums.asm test file we provide with the default heap size (1'000'000 bytes).
  3. The technique used to determine whether pointers designate allocated heap blocks must be at least as efficient as the bitmap-based one presented above.
  4. Your garbage collector must not allocate additional memory to store its internal data structures - e.g. the bitmap.
  5. The sweep phase must coalesce adjacent blocks.
  6. Your code must be submitted before the deadline, in the format specified above.

Solution

The file memory_marksweepgc.o contains a working GC for minivm. This object file is compiled for the IN machines.