Semester projects
- Scala benchmark suite
- MultiSets Scala collection
- Distributed Collections for Nephele/PACTs
- Distributed Collections for Nephele/PACTs
- Extending Scaladoc to emit API docs in .NET format
- Prefuse for Scala
- SHotDraw
- Debugger for Scala Parser Combinators
Scala benchmark suite
keywords: scala, benchmarking
Goal
The goal of this project would be to develop a benchmark suite that can be used to assess how changes to the Scala compiler and libraries affect the space and time usage of various operations, programming patterns and idioms, etc.
offered by Aleksandar Prokopec.
MultiSets Scala Collection
The Scala collection library has sequences, sets, maps, and several other collection types, but it does not currently have MultiSets. The goal of the project is the write a complete set of APIs and implementations for MultiSets that fit seamlessly in the Scala collection library. See Programming in Scala, chapters 17, 24 and 25 for a discussion of the Scala collections API and its architecture and how to add new collection types to it.
The project can be a semester project for 2 people or a master project for one.
offered by LAMP
Distributed Collections for Nephele/PACTs
The Distributed Collections for Scala [1] currently process data by using Hadoop framework [2] as the processing engine. Goal of this project is to extend Distributed Collections to use Nephele/PACTs [3] framework as the processing back-end.
Achiveing this goal requires:- Learning and experimental deployment of Nephele/PACTs system
- Modification of Distributed Collections base API that enables easy compilation to Nephele/PACTs
- Development of the execution back-end for Nephele. This step would reuse significant parts of existing code
[1] Distributed Collections for Scala https://github.com/scala-incubator/distributed-collections [2] Hadoop http://hadoop.apache.org/ [3] Nephele/PACTs http://www.stratosphere.eu
offered by Vojin Jovanovic.
Extending Scaladoc to emit API docs in .NET format
Sandcastle [1] is “an open source documentation generator from Microsoft that reads your assemblies (DLL or EXE files) and their XML Comments and auto- matically generates HTML documentation.” [2]
Sandcastle carries out the very last phase of documentation generation, ensuring that the output meets the visual cues and file formats that developers expect of API docs on .NET. This division of labor bears similarities with the workflow of Scaladoc tools on JVM, where the final generation stage can be customized. That way, two goals can be achieved simultaneously:
- platform-friendliness (for example, the new .NET Lightweight style can be supported without additional effort);
- reuse of the Scaladoc front-end, which takes charge of parsing Scala sources and computing the document model of the resulting API documentation.
For comparison, the functionality of the Scaladoc front-end is usually em- bedded in .NET compilers (for example, as in C# [3] or F# [4]). In our case, we need not modify the Scala.NET compiler to emit documentation attributes in the output assemblies; and the required XML-comment file (with pointers back to source locations) is to be emitted by a new Scaladoc backend (a modular component) to be developed.
- Architecture of Scaladoc: http://wiki.scala-lang.org/display/SW/Scaladoc, in particular the design description, http://wiki.scala-lang.org/display/SW/Implementation+Details
- Understanding Sandcastle: http://en.wikipedia.org/wiki/Sandcastle_(software)
A nice to have: Mono support
Regarding Mono, it’s not clear whether the equivalent of Sandcastle (Monodoc) understands all the XML tags of Sandcastle. Therefore, it’s not a requirement for this project to generate Monodoc-compatible XML files (it’s still desirable, for example in case just a few tags can’t be handled by Monodoc we can acco- modate that). If our output works for Monodoc, fine. Quoting from [5]: Mono generally uses Monodoc. As I understand it, this is a some- what different format - but it can extract XML documentation from C# code as normal.
- Monodoc, http://www.mono-project.com/Monodoc
- Monodoc’s generating documentation: http://www.mono-project.com/Generating_Documentation
offered by Miguel Garcia
Prefuse for Scala
keywords: scala, type systems, reactive programming, visualization, GUI,
Prefuse is a visualization toolkit for Java based on Java2D and Swing. It has a relatively clean separation between model classes and view classes. They communicate with each other through events and listeners. Due to Java's weaker type system compared to Scala and the fact that prefuse is implemented without generics, the framework's numerous model classes (trees, tables, graphs etc) are largely untyped.
Your task for this project is to port (a substantial part of) Prefuse to Scala with the following points in mind:- Investigate how Scala's type system can help improve the type safety of the framework itself and of programs written for Prefuse.
- Replace the event/listener layer and other event logic by reactive abstractions from our Scala.React framework.
We can adapt the tasks according to your prior experience and interests. For example, you could just address the first point and leave the second for a later project. You should be experienced with Java generics. Experience with more advanced features of Scala, such as variance, path dependent types or higher-kinded types is a plus. This can be a semester project for one if you are an experienced programmer, a semester project for two or a Master's project.
References: http://prefuse.org https://github.com/prefuse/Prefuse http://lamp.epfl.ch/~imaier/pub/DeprecatingObserversTR2010.pdfoffered by Ingo Maier.
SHotDrawkeywords: scala, reactive programming, design patterns, GUI
JHotDraw is a Java application and framework for structured graphics. It provides a basic version of drawing facilities found in presentation software such as MS PowerPoint or Apple Keynote or diagram editors such as Dia or Omnigraffle. JHotDraw - a port of HotDraw, originally written in Smalltalk - is a showcase for good object oriented (OO) design and implements a variety of design patterns.
Your task for this project is to write a Scala version of JHotDraw in which you- replace the event handling logic based on listeners by reactive abstraction from our Scala.React framework, and
- identify OO design patterns used in JHotDraw and investigate how they can be replaced with more idiomatic Scala patterns using a mix of OOP, functional and reactive programming.
offered by Ingo Maier.
Debugger for Scala Parser Combinatorskeywords: Scala, debugger, parser combinators, GUI
Scala has a powerful parser combinators library which enables programmers to design language grammar using familiar Scala methods and fields. Therefore instead of spending time on learning (often powerful) tools from scratch like ANTL, JavaCC or Beaver which is often a steep curve, they can focus on the design. Parser combinators can be powerful but they also often cause troubles, especially for beginners, when the defined grammar doesn't behave as expected and get only cryptic messages. This isn't such a big issue for toy languages but people use Scala parser combinators for full blown projects and finding bugs there is a real issue.
What one would ideally like to have is a specialized debugger for parser combinators that takes the defined grammar and visualizes it in an interactive way. For example we would like to see how literals are consumed, where parser combinators backtrack because it couldn't match the pattern etc. At the moment the only way to debug bugs in your grammar is through primitive println statements which is hardly sufficient for normal Scala programmers.
Student can pick whatever visualization library he wants (for example Scala Swing wrappers). If you find it suitable you can also use Scala.React framework. Successful project would be definitely appreciated by the Scala programmers and could be also used during teaching for our courses. Prior knowledge of Scala parser combinators is not necessary if student has a good knowledge of Scala.
This can be a challenging semester project or maybe even a Master's project (depending on the scope). References: http://lamp.epfl.ch/teaching/foundations_of_software/docs/combinator_parsing.pdf http://www.cs.kuleuven.be/publicaties/rapporten/cw/CW491.pdf http://lamp.epfl.ch/~imaier/pub/DeprecatingObserversTR2010.pdf http://lamp.epfl.ch/teaching/foundations_of_software/docs/week01.pdf(second part of slides contains a really quick introduction to Scala parser combinators)offered by Hubert Plociniczak.
Master projects
- Distributed Collections DSL
- Compiler Regression Testing
- Integrating Workstealing Abstractions in the Parallel Collection Framework
- Parallel Collection Extensions
- Java Virtual Machine(JVM) Micro Benchmarks
Distributed Collections DSL
keywords: scala, delite, dsl, collections, parallelism, distribution, large scale data processing, domain specific languages
Contemporary large scale data processing frameworks either provide higher level abstractions that increase expressiveness while sacrificing performance or provide low level APIs that require significant development effort to program and maintain. Goal of this project is to provide a domain specific language (DSL) with a high level API, similar to Scala Collections [1], for distributed data parallel processing. Although having a high level API it would use domain knowledge to generate highly optimized code for existing large scale processing frameworks like (Hadoop [2] or Nephele/PACTs [3]). It would be implemented as a building block for other DSLs inside the Delite framework [4]. DSL implementation would use/extend the interface from existing Delite Collections DSL, use existing optimization techniques and then generate code that targets a specific processing framework. This project would innovate in following aspects:
- Show that the DSL based approach will allow peak performance while keeping high level of abstraction in large scale data processing
- Provide code portability between different processing frameworks with minimal development effort
Requirements this project requires good prior knowledge of Scala.
[1] Scala Collections http://docs.scala-lang.org/overviews/collections/introduction.html [2] Hadoop http://hadoop.apache.org/ [3] Nephele/PACTs http://www.stratosphere.eu [4] Delite http://stanford-ppl.github.com/Delite/index.html
offered by Vojin Jovanovic.
Compiler Regression Testing
keywords: object-oriented programming, regression testing, compilers
The Scala compiler has an extensive test suite that we use to detect regressions. Certain files should compile, others must result in specific error messages, and yet another category of tests (indirectly) checks the code generated by the compiler by compiling and running tests and comparing their output to the expected result.
To test the type checker and other crucial, yet under-tested, parts of the compiler, more efficiently and more precisely, we'd like you to design and implement a new kind of test. The following Scala file -- note the comment -- illustrates how these tests could work:
object Test {
def f /*@expectedType: Int */ = 1
}
The comment (or, depending on your research on how to best design this, an annotation) specifies the expected type at a certain position in a Scala program. This allows for more direct checking of the type checker, which is currently quite complicated[1]. Moreover, we'd like you to generalise this idea to implicit resolution (which implicit value gets inferred?) or [type|term]-completion in the presentation compiler (which completions are found?), and any other application you can think of!
This is a reasonably challenging project, which requires prior knowledge of Scala programming, and preferably (but not necessarily) some background in compilers. However, these tests will be used several times a day to ensure the quality of a widely used language!
[1]: http://lampsvn.epfl.ch/trac/scala/browser/scala/trunk/test/files/run/constrained-types.scala
offered by Adriaan Moors
Integrating Workstealing Abstractions in the Parallel Collection Framework
keywords: work stealing, schedulers
All languages come with collection support in the form of arrays, hash tables, linked lists etc. Scala parallel collections framework is a parallel extension to the standard collection framework in Scala. It parallelizes bulk operations like collection traversal, transformation or filtering. Currently, it includes parallel arrays, parallel hash maps and hash sets, as well as their persistent (immutable) counterparts. The framework is generic and can be further extended with new data structures.
Goal
The goal of this project is to extend the parallel collection framework with the workstealing abstraction on the level of the framework and develop a new scheduler based on this abstraction. The workstealing abstraction comes in the form of a new method called "steal" for parallel iterators, which simply returns a part of the set being iterated. It should then be shown that this approach leads to better load balancing and a better performance.
offered by Aleksandar Prokopec.
Parallel Collection Extensions
keywords:parallel, collections, data structures
All languages come with collection support in the form of arrays, hash tables, linked lists etc. Scala parallel collections framework is a parallel extension to the standard collection framework in Scala. It parallelizes bulk operations like collection traversal, transformation or filtering. Currently, it includes parallel arrays, parallel hash maps and hash sets, as well as their persistent (immutable) counterparts. The framework is generic and can be further extended with new data structures.
Goal
Your goal is to extend the parallel collection framework with binomial heaps, parallel ropes, conc-lists and parallel ordered maps and sets.
offered by Aleksandar Prokopec.
Java Virtual Machine(JVM) Micro Benchmarks
keywords: jvm, bytecode, optimization, benchmarking
The goal of this project is to provide an overview of how different changes to the bytecode generated by the Scala compiler impact the execution speed in different implementations of the Java Virtual Machine (JVM). The conclusions of this study can be further used to fine-tune the optimization and code generation phases within the Scala compiler.
Some of the questions the study should answer:
- What is the impact of large method bodies? Would it be useful to split large methods into smaller ones? Could this be decided automatically?
- What is the impact of duplicate code in the JVM?
- What is the impact of adding many function-local variables in the generated code?
- How do jumps affect execution speed? (maybe affecting caching?)
- What is the time impact of throwing and catching exceptions? What is the runtime cost of not hitting the first layer of exception handlers?
- How long does it take for the JVM to create a simple exception instance?
- How does running multiple times (hot code) change the above metrics?
- How does this change when moving from one JVM to another? (openjdk, sun, different versions, client vs server, ...)
offered by Vlad Ureche.