Exercise 1: Arithmetic expressions

For handing in, please follow this procedure:

In this exercise, you are asked to familiarize yourself with Scala and the combinator parsing library.

The framework you will be using can be downloaded as zip or tar.gz archive.

The API documentation for this exercise is available online.

Technicalities

The programming assignments of this course will be done in Scala.

Scala should be installed on all computers in the exercise room, but if you plan to use your own, you can download it here. Scala has an Eclipse plugin, downloadable here and comes with support for several editors such as Emacs or Vi.

The Scala configuration on Linux and Windows is described in a separated documentation page.

The provided framework contains configuration files for Ant and Eclipse.

1. Arithmetic expressions

In this first part you will modify an existing parser for arithmetic expressions. The goal is to familiarize yourself with the combinator parsing library and Scala, so your imagination can run wild and implement as many features as you want! The given framework can parse and evaluate simple arithmetic expressions, and we'd like you to (at least) extend it with booleans and comparison expressions.

Here is the grammar which is currently accepted by the parser:

  Expr ::= Term { ('+' Term) | ('-' Term) }

  Term ::= Factor { ('*' Factor) | ('/' Factor) }

  Factor ::= <number> | '(' Expr ')'

A quick introduction to the combinator parsing library

The combinator parsing library comes in its own package, not surprisingly named combinatorParsing. It defines several classes, among which is GenericParser and TokenizingParser. The parsers you will write using this library will have to extend one of these classes. TokenizingParser extends GenericParser with functionality to recognize common tokens, like identifiers or string and integer literals, so this is what you will use most often.

You're are encouraged to have a look at the following API documentation of the combinator parsing library.

Writing a parser for the above grammar is straight forward using the combinator parsing library: nonterminals become methods, terminals are either keywords or numbers, and to link them all together one calls the methods provided by the library:

For example, here is how the first production should look:

  /** Expr ::= Term { ('+' Term) | ('-' Term) }   */
  def Expr: Parser = production(Term & rep(kw("+") & Term | kw("-") & Term)

You probably noticed that the whole production is wrapped into a call to method production. This instructs the library to call a user-defined method when this grammar production has matched, passing the parsed tokens (see below).

Terminals deserve a few more words: the parsing library has to stay general, so it cannot decide beforehand what are the terminals of your grammar. It does, however, provide three stock options: identifiers, and string and numeric literals, and two configurable ones: delimiters and keywords. Delimiters are characters which can break a text into tokens even without a separating space. A common example is parenthesis. Keywords are strings which are not identifiers. Both of them can be specified by listing them in a call to 'incl' before any parsing gets done, as for example:

  delimiters.incl('(', ')', '+', '-', '*', '/')
  keywords.incl("for", "while", "do")

(these two sets are inherited from the GenericParser class)

The BuildingParser mixin defines an abstract type called Output, which is the type of values your parser will produce. In this example, it is set to Int since evaluation is done during parsing, and therefore it produces ints. In a more elaborate setting, this would probably be an abstract syntax tree.

The parsing library calls method build each time a grammar production is matched. This is the hook normally used to build up an Abstract Syntax Tree, but in this simple example it is the place where evaluation takes place. This method gets a list of tokens, and a position. One thing to note is that delimiters will be passed to this method as instances of the keyword (KW) token.

Once the parser recognizes the extensions of your grammar, you will need to update this method with the new cases.

The parsing library is well commented and the code is not very large, so we encourage you to read the code and try to understand how it works (be warned though, it makes heavy use of higher order functions and call by name parameters, so you should feel comfortable with Scala before diving in). And of course, experiment with it.

Extend our example

Extend the grammar with another expression form, if t1 then t2 else t3, and the three comparison operators <, >, =. Here are some examples that should be accepted by your modified parser:

10 + (if (2) then 2 else 3)
if (2 < 3) then 2 else 3
10 + (if (2*3 = 3*2) then 2 else 3)
and one example which it shouldn't:
0 > 0 = 0

In order to keep things simple, we keep the Output type of the parser (Int) and encode booleans to 1 and 0.

Assignment 1: The NB language

Hand in: November 8 (in 2 weeks).

The cryptic acronym stands for Numbers and Booleans and comes from the course book. This simple language is defined in Chapter 3 of the the TAPL book.

t  ::= "true"                   terms
     | "false"
     | "if" t1 "then" t2 "else" t3
     | "0"
     | "succ" t
     | "pred" t
     | "iszero" t

v  ::= "true"                   values
     | "false"
     | nv

nv ::= 0                        numeric values
     | "succ" nv

This language has three syntactic forms: terms, which is the most general form, and two kinds of values: numeric values and boolean values. The evaluation rules are as follows:

Computation Congruence
if true then t1 else t2 → t1
if false then t1 else t2 → t2
isZero zero → true
isZero succ NV → false
pred zero → zero
pred succ NV → NV
t1 → t1'
if t1 then t2 else t3 → if t1' then t2 else t3
t → t'
isZero t → isZero t'
t → t'
pred t → pred t'
t → t'
succ t → succ t'

Big step semantics

The other style of operation semantics commonly in use is called big step sematics. Instead of defining evaluation in terms of a one step reduction, it formulates the notion of a term that evaluates to a final value, written "t ⇓ v". Here is how the big step evaluation rules would look for this language:

v ⇓ v
(B-VALUE)
t1 ⇓ true     t2 ⇓ v2
if t1 then t2 else t3 ⇓ v2
(B-IFTRUE)
t1 ⇓ false     t3 ⇓ v3
if t1 then t2 else t3 ⇓ v3
(B-IFFALSE)
t1 ⇓ nv1
succ t1 ⇓ succ nv1
(B-SUCC)
t1 ⇓ 0
pred t1 ⇓ 0
(B-PREDZERO)
t1 ⇓ succ nv1
pred t1 ⇓ nv1
(B-PREDSUCC)
t1 ⇓ 0
iszero t1 ⇓ true
(B-ISZEROZERO)
t1 ⇓ succ nv1
iszero t1 ⇓ false
(B-ISZEROSUCC)

What we'd like you to do is:

Implementation hints

For this project we encourage you to use an abstract syntax tree. The standard way to do this in Scala is to define case classes for each form a term can get, and construct the tree in the build method. The provided framework should give you an idea about how things should look.

The parsing library will parse the digit zero to a numeric literal. If you want to allow only 0, but none of the other numeric literals, you can add the following method to the object tapl1.Arithmetic:

  def zero = token("0", t => t match {
    case NumericLit("0") => true
    case _ => false
  })
  def Expr: Parser = production( //...

The method zero will create a parser which accepts only numeric zero (of course, later on, when implementing the last bullet in the list, you will not need this method anymore, but deal with any numeric literal).

Besides the reduce and eval methods, your code should provide two testing methods which take a string and evaluate the program.