LAMP - Type Systems, Ex.1: Arithmetic expressions

Exercise 1: Arithmetic expressions

For handing in, please follow this procedure:

pack your project files (source files and makefile or build.xml) using zip (or tar/gzip/bzip2, but please don't use rar or any other compressor)
email your project to Iulian or Stephane, specifying the members of your group.

In this exercise, you are asked to familiarize yourself with Scala and the combinator parsing library.

The first part will not be graded, and consists of extending an existing arithmetic evaluator with conditionals and comparison operators. (package arithmetic in the provided framework)
The second part is your assignment, consisting of implementing (variations of) the NB language from chapter 3 of the TAPL book (package tapl1 in the provided framework).

The framework you will be using can be downloaded as zip or tar.gz archive.

The API documentation for this exercise is available online.

Technicalities

The programming assignments of this course will be done in Scala.

Scala should be installed on all computers in the exercise room, but if you plan to use your own, you can download it here. Scala has an Eclipse plugin, downloadable here and comes with support for several editors such as Emacs or Vi.

The Scala configuration on Linux and Windows is described in a separated documentation page.

The provided framework contains configuration files for Ant and Eclipse.

Ant: build.xml
Eclipse: .classpath and .project (use "File" → "Import.." → "General" → "Existing Projects into Workspace" to import the extracted framework)

1. Arithmetic expressions

In this first part you will modify an existing parser for arithmetic expressions. The goal is to familiarize yourself with the combinator parsing library and Scala, so your imagination can run wild and implement as many features as you want! The given framework can parse and evaluate simple arithmetic expressions, and we'd like you to (at least) extend it with booleans and comparison expressions.

Here is the grammar which is currently accepted by the parser:

  Expr ::= Term { ('+' Term) | ('-' Term) }

  Term ::= Factor { ('*' Factor) | ('/' Factor) }

  Factor ::= <number> | '(' Expr ')'

A quick introduction to the combinator parsing library

The combinator parsing library comes in its own package, not surprisingly named combinatorParsing. It defines several classes, among which is GenericParser and TokenizingParser. The parsers you will write using this library will have to extend one of these classes. TokenizingParser extends GenericParser with functionality to recognize common tokens, like identifiers or string and integer literals, so this is what you will use most often.

You're are encouraged to have a look at the following API documentation of the combinator parsing library.

Writing a parser for the above grammar is straight forward using the combinator parsing library: nonterminals become methods, terminals are either keywords or numbers, and to link them all together one calls the methods provided by the library:

& for sequence
| for alternative
rep for repetition
opt for optional parts

For example, here is how the first production should look:

  /** Expr ::= Term { ('+' Term) | ('-' Term) }   */
  def Expr: Parser = production(Term & rep(kw("+") & Term | kw("-") & Term)

You probably noticed that the whole production is wrapped into a call to method production. This instructs the library to call a user-defined method when this grammar production has matched, passing the parsed tokens (see below).

Terminals deserve a few more words: the parsing library has to stay general, so it cannot decide beforehand what are the terminals of your grammar. It does, however, provide three stock options: identifiers, and string and numeric literals, and two configurable ones: delimiters and keywords. Delimiters are characters which can break a text into tokens even without a separating space. A common example is parenthesis. Keywords are strings which are not identifiers. Both of them can be specified by listing them in a call to 'incl' before any parsing gets done, as for example:

  delimiters.incl('(', ')', '+', '-', '*', '/')
  keywords.incl("for", "while", "do")

(these two sets are inherited from the GenericParser class)

The BuildingParser mixin defines an abstract type called Output, which is the type of values your parser will produce. In this example, it is set to Int since evaluation is done during parsing, and therefore it produces ints. In a more elaborate setting, this would probably be an abstract syntax tree.

The parsing library calls method build each time a grammar production is matched. This is the hook normally used to build up an Abstract Syntax Tree, but in this simple example it is the place where evaluation takes place. This method gets a list of tokens, and a position. One thing to note is that delimiters will be passed to this method as instances of the keyword (KW) token.

Once the parser recognizes the extensions of your grammar, you will need to update this method with the new cases.

The parsing library is well commented and the code is not very large, so we encourage you to read the code and try to understand how it works (be warned though, it makes heavy use of higher order functions and call by name parameters, so you should feel comfortable with Scala before diving in). And of course, experiment with it.

Extend our example

Extend the grammar with another expression form, if t1 then t2 else t3, and the three comparison operators <, >, =. Here are some examples that should be accepted by your modified parser:

10 + (if (2) then 2 else 3)
if (2 < 3) then 2 else 3
10 + (if (2*3 = 3*2) then 2 else 3)

and one example which it shouldn't:

0 > 0 = 0

In order to keep things simple, we keep the Output type of the parser (Int) and encode booleans to 1 and 0.

Assignment 1: The NB language

Hand in: November 8 (in 2 weeks).

The cryptic acronym stands for Numbers and Booleans and comes from the course book. This simple language is defined in Chapter 3 of the the TAPL book.

t  ::= "true"                   terms
     | "false"
     | "if" t1 "then" t2 "else" t3
     | "0"
     | "succ" t
     | "pred" t
     | "iszero" t

v  ::= "true"                   values
     | "false"
     | nv

nv ::= 0                        numeric values
     | "succ" nv

This language has three syntactic forms: terms, which is the most general form, and two kinds of values: numeric values and boolean values. The evaluation rules are as follows:

Computation Congruence

if true then t1 else t2 → t1

if false then t1 else t2 → t2

isZero zero → true

isZero succ NV → false

pred zero → zero

pred succ NV → NV

t1 → t1'

if t₁ then t₂ else t₃ → if t1' then t2 else t3

t → t'

isZero t → isZero t'

t → t'

pred t → pred t'

t → t'

succ t → succ t'

Big step semantics

The other style of operation semantics commonly in use is called big step sematics. Instead of defining evaluation in terms of a one step reduction, it formulates the notion of a term that evaluates to a final value, written "t ⇓ v". Here is how the big step evaluation rules would look for this language:

v ⇓ v

(B-VALUE)

t1 ⇓ true t2 ⇓ v2

if t1 then t2 else t3 ⇓ v2

(B-IFTRUE)

t1 ⇓ false t3 ⇓ v3

if t1 then t2 else t3 ⇓ v3

(B-IFFALSE)

t1 ⇓ nv1

succ t1 ⇓ succ nv1

(B-SUCC)

t1 ⇓ 0

pred t1 ⇓ 0

(B-PREDZERO)

t1 ⇓ succ nv1

pred t1 ⇓ nv1

(B-PREDSUCC)

t1 ⇓ 0

iszero t1 ⇓ true

(B-ISZEROZERO)

t1 ⇓ succ nv1

iszero t1 ⇓ false

(B-ISZEROSUCC)

What we'd like you to do is:

Write a parser that recognizes this language, using the combinator library
Write a reduce method which performs one step of the evaluation, according to the rules listed above
Write an eval method which implements a big step evaluator (one which evaluates a term down to a value, or it gets stuck when no rule applies)
Extend the language with numeric literals which are encoded to succ succ ... 0 during parsing.

Implementation hints

For this project we encourage you to use an abstract syntax tree. The standard way to do this in Scala is to define case classes for each form a term can get, and construct the tree in the build method. The provided framework should give you an idea about how things should look.

The parsing library will parse the digit zero to a numeric literal. If you want to allow only 0, but none of the other numeric literals, you can add the following method to the object tapl1.Arithmetic:

  def zero = token("0", t => t match {
    case NumericLit("0") => true
    case _ => false
  })
  def Expr: Parser = production( //...

The method zero will create a parser which accepts only numeric zero (of course, later on, when implementing the last bullet in the list, you will not need this method anymore, but deal with any numeric literal).

Besides the reduce and eval methods, your code should provide two testing methods which take a string and evaluate the program.

The one for reductions should print each intermediary step until it finds a normal form.
The other one should print either a value (if evaluation ended successfully), or an error message together with the stuck term (a term which is not a value, but for which there are no rules to apply).

Ecole polytechnique fédérale de Lausanne

Laboratoire des méthodes de programmation

Type Systems Course 2006–2007

Exercises

Summary