  Compilation A Summer Semester 2003 School of Computer and Communication Sciences Programming Methods Laboratory (LAMP)  Home Staff Research Publications Events Teaching   CompilationA     Exam Jobs  # The Exam

The Exam is now corrected, the results can be found on the door to room INR 315. We have also made a PostScript file with examples of correct answers to the questions.

1. Showing up
2. >= 35 points in total
3. >= 50 points in total
4. >= 60 points in total
5. >= 75 points in total
6. >= 95 points in total

The section "Cheat Sheet" below will show you what a typical cheat sheet might look like. Such a sheet will be provided with the exam.

The section "Example Exam Questions" shows some typical exam questions that might show up on the exam. It is not the same composition or the same questions as on the real exam, the exam will contain fewer but similar questions. (On the exam each question will also show how many points it is worth, since you have 105 minutes for the exam and need 100 points you should spend approximately 1 minute per point on the exam.)

# Cheat Sheet

Here are some extra information that can be useful when solving the exam questions.

## BNF

Backus-Naur Form was developed by J. Backus and P. Naur for Algol 60.

A production (rule) consists of a left-hand-side (lhs) and a right-hand-side (rhs). The rhs contains terminals and non-terminals. We use | for alternatives. We use juxtaposition for concatenation. Concatenation binds stronger than |.

Names introduced on the left-hand-side can be used on the right-hand-side. A BNF grammar can be recursive, that is, the rhs of a production can refer to the symbol that is being defined in that production (or to another production which in turn refers to the symbol being defined).

## EBNF

EBNF allows us to use:
• ( ) for grouping.
• ε for the empty word.
• [ E ] for one or zero occurrences of E.
• { E } for zero or more occurrences of E.

## Regular expressions

A regular expression can be described by a number of EBNF rules without recursion. There are alternative ways of writing regular expressions. The most common is:
• E* for {E}.
• E? for [E].
• E+ for E{E}.
• [ac] for (a|c).
• [a-d] for (a|b|c|d).
• [^0-9] for anything but a digit.

## JLex

In the regexp of JLex | * ? + () [] have their usual reg-exp meaning:
• E* for {E}.
• E? for [E].
• E+ for E{E}.
• [ac] for (a|c).
• [a-d] for (a|b|c|d).
• [^0-9] for anything but a digit.
In Jlex
`Name = E `
is used to define macros.
`{name}`
refers to the macro name. And you can use . as a shorthand for [^\n]

# Example Exam Questions

## The Structure of a Compiler

Why is a compiler divided up in several stages? What stages are a compiler usually divided up into? What does these stages do?

## Glossary

Explain the expressions:
1. Native code
2. Runtime
3. Compiler
4. Interpreter
5. Virtual Machine
6. Scanner
7. Parser
8. Syntactic Analysis
9. Semantic Analysis

## Lexical Analysis

The following description of integer constants in C++ appears on page 480 of "The C++ Programming Language".

An integer constant consisting of a sequence of digits is taken to be decimal (base 10) unless it begins with 0 (digit zero). A sequence of digits starting with 0 is taken to be an octal integer (base 8). The digits 8 and 9 are not octal digits. A sequence of digits preceded by 0x or 0X is taken to be a hexadecimal integer (base 16). The hexadecimal digits include a or A through f or F. ... The type of an integer depends on its ... suffix. [an integer may have no suffix.] ... If [the integer] is suffixed by u or U, its type is ... If it is suffixed by l or L , its type is ...

Give a regular expression for C++ integer constants. Use only the operations |, *, and concatenation in your expression (no special JLex notation). You may also use parentheses and ε (which are not operations). For convenience, you may name a regular expression R by writing name = R.

## Tokens

Split the following "Expressive"-code into a sequence of tokens. For each token note if it has any associated value, and in that case what the value is.
```    // Read two integers
max := 0.
/* Return the max of the two */
if (x>y) { max := x.} else { max := y.}.
print "Max: ". print Max. printnl.
```

## Let E be an eff

1. Write a non-recursive EBNF grammar for the language of international telephone numbers. A telephone number may start with a + and a country code then there might be an area code within parentheses followed by the subscriber number. Spaces are allowed between the country code and the area code and between the area code and the subscriber number, and between any digits in the subscriber number but nowhere else. Examples: +46 (18) 51 13 29, (021)311 87 53, 693 7593, +41(21)6936660.
2. Updated 08/07/03:
Write a recursive EBNF grammar for the language which consists of a sequence of the character a followed by a sequence of as many ba words. Examples: aba, aababa, aaaababababa.

## An Ambigous Grammar

Here is a context free grammar for some regular expressions:
```    R ::= RR
R ::= R '*'
R ::= a
```
1. Rewrite the grammar to an equivalent unambiguous grammar. Motivate why it is unambiguous. Give some example of how regular expressions are parsed by your grammar. Give precedence and associativity for the diffrent contstructs.
2. Rewrite the grammar so that it can be parsed by a predictive parser.

## Drop the E

1. Convert the following EBNF grammar to a BNF grammar. (That is, get rid of "{}" and "[]".)
```            S ::= 'a' { B } [ C ]
B ::= 'b' | S
C ::= 'd' | S
```
2. Where in a compiler are BNF grammars useful.

## Grandma

[Note: This question is quite hard, the questions on the exam will probably be easier than this.]

Given this grammar G0 with the start symbol S:
```    S ::= if C then S
S ::= if C then S else S
S ::= ε
C ::= E == E
E ::= E = E
E ::= E + E
E ::= id
E ::= ( E )
```
Where == is comparison and = is assignment.

1. The grammar G0 is ambiguous. Explain what this means and give an example that shows that G0 is ambiguous.
2. Assume that + should be left associative, = should be right associative, and + should have a higher precedence than =. Rewrite G0 to a grammar G1 that expresses these rules.
3. Rewrite G1 to a grammar G2 which is suited for a top-down (predictive, LL(1)) parser.

## Lexical or Syntactical

What is the difference between the lexical and the syntactical analysis of a compiler? What kind of information flows between the phases?

## (Un)ambiguous

1. What does it mean that a grammar is ambiguous?
2. How can one tell that a grammar is ambiguous?
3. What imapct does it have on a compiler if the grammar is ambiguous?
4. Mention two methods to cope with ambiguity in a parser generator like JLex.
5. Give the meaning of the words "associativity" and "priority".

## Take it from the top

Write a top down parser in Java for the following Grammar. You may assume that there are:

• A `Token nextToken()` function, which returns the next token.
• A function `void error(String str)` which signals an error.
• And that the integer constants `IDENT`, `LBRACKET`, `RBRACKET`, `LPAR`, `RPAR`, and `NUMLIT` are defined.

Grammar:
```      E ::= IDENT { LBRACKET E RBRACKET }
|  LPAR E RPAR
|  NUMLIT
```
Code:
```      void E() {
...
}
```

## Abstract Syntax Tree

Consider the following grammar:
```      E ::= IDENT { LBRACKET E RBRACKET }
|   E PLUS E
|   E MINUS E
|   LPAR E RPAR
|   NUMLIT
```
1. Design an abstract syntax for the grammar.
2. Given your abstract syntax draw the abstract syntax tree for the expression "`x [y [z + (5-a)]] `".

## A Concrete Abstract Syntax

Here is a grammar for a statements in an imperative programming language.
```    Statement ::= Assignment | LoopStatement
Statement ::= CaseStatement | ForStatement
Assignment ::= Designator ':=' Expression
Designator ::= id { '[' Expression ']' }
LoopStatement ::= loop StatementSequence end
StatementSequence ::= Statement { ';' Statement }
CaseStatement ::= case Expression of Case { '||' Case }
Case ::= CaseLabelList ':' StatementSequence
CaseLabelList ::= ConstExpression { ',' ConstExpression }
```
Define a suitable Java representation for the abstract syntax tree.

## The Object Formerly Known as Prince

What is a symbol table used for in a compiler? What type of information is stored in a symbol table?

## Can you cope with scope?

Assume a language with static scope rules allowing nested scopes. How can you implement an imperative symbol table for the semantic analyzer in the compiler for such a language. The symbol table should be a global structure that is updated through side effects. Describe especially how scope is handled, both in the implementation of the symbol table and in the semantic analysis.

## Semantic Analysis

Assume a procedural language with integers and floating point numbers (floats) as basic types and with the following operations:
+
The arguments should have the same type, the result type should be the same as the argument type.
xor
The argument should be integers, the result is an integer
/
The arguments should be floating point numbers, the result is a floating point number
:=
The expression should have the same type as the variable.

Sketch on a Java-solution for type cheking of expressions in this language which returns the type of the expression.

## Intermediate Representation

What does "Intermediate Representation" mean? What is the adavatage of using an "Intermediate Representation" in a compiler?

## Back-end

Explain the meaning of the following words in the context of compiler back-ends and interpretation:

1. Runtime
2. Environment
3. Virtual Machine
4. Static
5. Dynamic

## Can you interpret this?

Assume that you have a compiler for a small imperative language. This compiler compiles source code to an abstract syntax tree (AST), and performs name and type analysis. The language has only two types integer and boolean. The type analysis has annotated the AST with the type of each node. At runtime the boolean false is represented by 0 (zero) and the boolean true by any other integer.

Now you are writing a interpreter in Java for this language which uses the visitor pattern over the AST. Since there is only one type at runtime (integers) you have decided that the visitor should always return an integer.

You have the following code in the AST:

```       /** Tree node representing if E then Ss Else Ss. */
public static class If extends Stat {
/** Condition. */
public final Expr cond;
/** Then part of the statement. */
public final Stat[] thenp;
/** Else part of the statement. */
public final Stat[] elsep;
public If(int pos, Expr cond, Stat[] thenp, Stat[] elsep) {
super(pos);
this.cond = cond;
this.thenp = thenp;
this.elsep = elsep;
}
public void apply(Visitor v) {
v.caseIf(this);
}
}
```

And you have already written this code in the interpreter:

```       /** Interpret and return the result from a tree node. */
public int interpret(Tree tree) {
int tmp = result;
tree.apply(this);
int res2 = result;
result = tmp;
return res2;
}

/** Interpret an array of trees. */
public void interpret(Tree[] trees) {
for (int i = 0; i < trees.length; i++)
interpret(trees[i]);
}
```
Finish the implementation of the interpreter for the 'if'-node.
```      public void caseIf(If tree) {

}
``` [an error occurred while processing this directive] 