Views | Contents |
Collections have quite a few methods that construct new collections. Examples are map, filter or ++. We call such methods transformers because they take at least one collection as their receiver object and produce another collection in their result.
There are two principal ways to implement transformers. One is strict, that is a new collection with all its elements is constructed as a result of the transformer. The other is non-strict or lazy, that is one constructs only a proxy for the result collection, and its elements get constructed only as one demands them.
As an example of a non-strict transformer consider the following implementation of a lazy map operation:
def lazyMap[T, U](coll: Iterable[T], f: T => U) = new Iterable[T] {
def iterator = coll.iterator map f
}
Scala collections are by default strict in all their transformers, except for Stream, which implements all its transformer methods lazily. However, there is a systematic way to turn every collection into a lazy one and vice versa, which is based on collection views. A view is a special kind of collection that represents some base collection, but implements all transformers lazily.
To go from a collection to its view, you can use the view method on the collection. If xs is some collection, then xs.view is the same collection, but with all transformers implemented lazily. To get back from a view to a strict collection, you can use the force method.
Let's see an example. Say you have a vector of Ints over which you want to map two functions in succession:
scala> val v = Vector(1 to 10: _*)
v: scala.collection.immutable.Vector[Int] =
Vector(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> v map (_ + 1) map (_ * 2)
res5: scala.collection.immutable.Vector[Int] =
Vector(4, 6, 8, 10, 12, 14, 16, 18, 20, 22)
scala> (v.view map (_ + 1) map (_ * 2)).force
res12: Seq[Int] = Vector(4, 6, 8, 10, 12, 14, 16, 18, 20, 22)
scala> val vv = v.view
vv: scala.collection.SeqView[Int,Vector[Int]] =
SeqView(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Applying the first map to the view gives:
scala> vv map (_ + 1)
res13: scala.collection.SeqView[Int,Seq[_]] = SeqViewM(...)
scala> res13 map (_ * 2)
res14: scala.collection.SeqView[Int,Seq[_]] = SeqViewMM(...)
scala> res14.force
res15: Seq[Int] = Vector(4, 6, 8, 10, 12, 14, 16, 18, 20, 22)
One detail to note is that the static type of the final result is a Seq, not a Vector. Tracing the types back we see that as soon as the first delayed map was applied, the result had static type SeqViewM[Int, Seq[_]]. That is, the "knowledge" that the view was applied to the specific sequence type Vector got lost. The implementation of a view for some class requires quite a lot of code, so the Scala collection libraries provide views mostly only for general collection types, but not for specific implementations.5
There are two reasons why you might want to consider using views. The first is performance. You have seen that by switching a collection to a view the construction of intermediate results can be avoided. These savings can be quite important. As another example, consider the problem of finding the first palindrome in a list of words. A palindrome is a word which reads backwards the same as forwards. Here are the necessary definitions:
def isPalindrome(x: String) = x == x.reverse
def findPalidrome(s: Seq[String]) = s find isPalindrome
findPalindrome(words take 1000000)
findPalindrome(words.view take 1000000)
The second use case applies to views over mutable sequences. Many transformer functions on such views provide a window into the original sequence that can then be used to update selectively some elements of that sequence. To see this in an example, let's suppose you have an array arr:
scala> val arr = (0 to 9).toArray
arr: Array[Int] = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
scala> val subarr = arr.view.slice(3, 6)
subarr: scala.collection.mutable.IndexedSeqView[
Int,Array[Int]] = IndexedSeqViewS(...)
scala> def negate(xs: collection.mutable.Seq[Int]) =
for (i <- 0 until xs.length) xs(i) = -xs(i)
negate: (xs: scala.collection.mutable.Seq[Int])Unit
scala> negate(subarr)
scala> arr
res4: Array[Int] = Array(0, 1, 2, -3, -4, -5, 6, 7, 8, 9)
After having seen all these nifty uses of views you might wonder why have strict collections at all? One reason is that performance comparisons do not always favor lazy over strict collections. For smaller collection sizes the added overhead of forming and applying closures in views is often greater than the gain from avoiding the intermediary data structures. A probably more important reason is that evaluation in views can be very confusing if the delayed operations have side effects.
Here's an example which bit a few users of versions of Scala before 2.8. In these versions the Range type was lazy, so it behaved in effect like a view. People were trying to create a number of actors like this:
val actors = for (i <- 1 to 10) yield actor { ... }
val actors = (1 to 10) map (i => actor { ... })
To avoid surprises like this, the Scala 2.8 collections library has more regular rules. All collections except streams and views are strict. The only way to go from a strict to a lazy collection is via the view method. The only way to go back is via force. So the actors definition above would behave as expected in Scala 2.8 in that it would create and start 10 actors. To get back the surprising previous behavior, you'd have to add an explicit view method call:
val actors = for (i <- (1 to 10).view) yield actor { ... }
In summary, views are a powerful tool to reconcile concerns of efficiency with concerns of modularity. But in order not to be entangled in aspects of delayed evaluation, you should restrict views to two scenarios. Either you apply views in purely functional code where collection transformations do not have side effects. Or you apply them over mutable collections where all modifications are done explicitly. What's best avoided is a mixture of views and operations that create new collections while also having side effects.
Next: Iterators
Views | Contents |