Parallelizing Machine Learning-- Functionally: A Framework and
Abstractions for Parallel Graph Processing
Paper [PDF]
presented at the Scala Workshop 2011.
Philipp Haller and Heather Miller.
Abstract
Implementing machine learning algorithms for large data, such as
the Web graph and social networks, is challenging. Even though
much research has focused on making sequential algorithms more
scalable, their running times continue to be prohibitively long.
Meanwhile, parallelization remains a formidable challenge for
this class of problems, despite frameworks like MapReduce which
hide much of the associated complexity.We present a framework
for implementing parallel and distributed machine learning
algorithms on large graphs, flexibly, through the use of
functional programming abstractions. Our aim is a system that
allows researchers and practitioners to quickly and easily
implement (and experiment with) their algorithms in a parallel
or distributed setting. We introduce functional combinators for
the flexible composition of parallel, aggregation, and
sequential steps. To the best of our knowledge, our system is
the first to avoid inversion of control in a (bulk) synchronous
parallel model.
Project Page
See the
project page
for up-to-date information on our implementation, examples, and
documentation.