Scala Saturday – Vector.splitAt

Vector.splitAt splits a vector into two parts at the index you specify. The first part ends just before the element at the given index; the second part starts with the element at the given index. Some examples will help:

Some Examples

The most obvious use case is bisecting a vector, i.e., splitting it into two (nearly) equal parts. To do that, half the length, and use it as your splitting index:

val xs = Vector(1,2,3,4,5,6)
val mid = xs.length / 2 // 3
val (left, right) = xs splitAt mid
// left: Vector[Int] = Vector(1, 2, 3)
// right: Vector[Int] = Vector(4, 5, 6)

But what happens if your vector contains an odd number of elements? No sweat! Because of the way integer division works, the quotient length ÷ 2 is truncated. In other words, the left vector will always have one less element than the right vector:

val xs = Vector(1,2,3,4,5)
val mid = xs.length / 2 // 2
val (left, right) = xs splitAt mid
// left: Vector[Int] = Vector(1, 2)
// right: Vector[Int] = Vector(3, 4, 5)

Of course, you don’t have to cut the thing in half. You can split it anywhere:

val xs = Vector(1,2,3,4,5)
val (left, right) = xs splitAt 1
// left: Vector[Int] = Vector(1)
// right: Vector[Int] = Vector(2, 3, 4, 5)

Or …

val xs = Vector(1,2,3,4,5)
val (left, right) = xs splitAt 4
// left: Vector[Int] = Vector(1, 2, 3, 4)
// right: Vector[Int] = Vector(5)

What happens if you split at index 0?

val xs = Vector(1,2,3,4,5)
val (left, right) = xs splitAt 0
// left: Vector[Int] = Vector()
// right: Vector[Int] = Vector(1, 2, 3, 4, 5)

Ah, the left vector is empty! That’s because Vector.splitAt starts the right vector at the given index. There’s nothing before the given index, so the only thing left to return for left is an empty vector.

The reverse happens if you split at the length: The right vector is empty while the left vector contains the entire input vector:

val xs = Vector(1,2,3,4,5)
val (left, right) = xs splitAt 5
// left: Vector[Int] = Vector(1, 2, 3, 4, 5)
// right: Vector[Int] = Vector()

But what happens if you try to split at an index greater than the length?

val xs = Vector(1,2,3,4,5)
val (left, right) = xs splitAt 6
// left: Vector[Int] = Vector(1, 2, 3, 4, 5)
// right: Vector[Int] = Vector()

Well, that’s interesting. So then, if you split at any index greater than the length of the input vector, the left contains the input vector while the right vector is empty.

You get the reverse if you split on a negative number:

val xs = Vector(1,2,3,4,5)
val (left, right) = xs splitAt -1
// left: Vector[Int] = Vector()
// right: Vector[Int] = Vector(1, 2, 3, 4, 5)

Merge Sort with Vector.splitAt

You can use Vector.splitAt in performing a merge sort. Vector.splitAt is, admittedly, a pretty small piece of the puzzle. Merge sort is a divide-and-conquer algorithm, and Vector.splitAt just performs the divide part. Nevertheless it comes in handy for that part.

Speaking of which, start by using Vector.splitAt to define a bisect function that splits a vector in half:

def bisect[A](xs: Vector[A]) = {
  val mid = xs.length / 2
  xs splitAt mid
}

A merge sort breaks a vector down into single-element (or empty) vectors and then puts them back together, sorting the elements of each block as it combines them. Start with the merge function that puts the blocks back together after you’ve broken them down:

def merge(left: Vector[Int], right: Vector[Int]) = {
  @tailrec
  def mergeWith(l: Vector[Int], r: Vector[Int], acc: Vector[Int]): Vector[Int] = {
    (l.isEmpty, r.isEmpty) match {
      // If either side is empty, just add
      // the non-empty side to the accumulator.
      case (true, _) => acc ++ r
      case (_, true) => acc ++ l

      // Compare the head elements, and add
      // the lesser head value to the
      // accumulator. Then call recursively.
      case _ =>
        val lh = l.head
        val rh = r.head
        val (next, l2, r2) =
          if (lh < rh) (lh, l.tail, r)
          else (rh, l, r.tail)
        mergeWith(l2, r2, acc :+ next)
    }
  }

  mergeWith(left, right, Vector[Int]())
}

Now mergeSort can use bisect and merge to break down the vector and then merge it back together:

def mergeSort(as: Vector[Int]): Vector[Int] =
  as match {
    case Vector() => as
    case Vector(a) => as
    case _ =>
      val (l, r) = bisect(as)
      merge(mergeSort(l), mergeSort(r))
  }

val xs:  = Vector(43,48,3,23,28,6,25,43,16)
val sorted: Vector[Int] = mergeSort(xs)
// sorted: Vector[Int] = 
//   Vector(3, 6, 16, 23, 25, 28, 43, 43, 48)

A quick announcement: I’ve got some instructional material to develop. Unfortunately it’s going to take up a fair amount of my time. This is probably the last Scala Saturday post for a little while, but I hope to pick back up in a month or two.

Scala Saturday – Stream.groupBy

Sometimes you have a collection of items that you want to group according to some common property or key. Stream.groupBy can do that job for you. It takes that collection and returns a map keyed to that grouping key. The value for each key is the sequence of all the items that fall into that group.

That’s a little hard to follow. So what’s it useful for?

Well, maybe you need to group a list of names by initial. No sweat:

val names = Stream(
  "Rehoboam",
  "Abijah",
  "Asa",
  "Jehoshaphat",
  "Jehoram",
  "Ahaziah")

val groupedByInitial = names.groupBy(_.head)
// groupedByInitial: Map[Char,Stream[String]] =
//   Map(
//     J -> Stream(Jehoshaphat, Jehoram),
//     A -> Stream(Abijah, Asa, Ahaziah),
//     R -> Stream(Rehoboam) )

And of course, if you want to sort those groups, convert the resulting map to a stream of tuples, and throw in a call to Stream.sortBy to sort by the first element in the tuple:

val groupedByInitialAndSorted =
  groupedByInitial.toStream.sortBy(_._1)
// val groupedByInitialAndSorted: Stream[(Char, Stream[String])] = 
//   Stream(
//     A -> Stream(Abijah, Asa, Ahaziah),
//     J -> Stream(Jehoshaphat, Jehoram),
//     R -> Stream(Rehoboam) )

Another example: Maybe you want to group some test scores according to grade. That is, all the students who scored in the 90s are grouped together, then all those scoring in the 80s, and so on:

case class TestScore(name: String, score: Int)

val grades = Stream(
  TestScore("Anna", 74),
  TestScore("Andy", 76),
  TestScore("Brenda", 70),
  TestScore("Bobby", 90),
  TestScore("Charlotte", 98),
  TestScore("Chuck", 83),
  TestScore("Deborah", 88),
  TestScore("Dan", 66),
  TestScore("Ellie", 80),
  TestScore("Ed", 61),
  TestScore("Frannie", 89),
  TestScore("Frank", 96) )

val grouped = grades.groupBy(_.score / 10 * 10)
// grouped: Map[Int,Stream[TestScore]] =
//   Map(
//     80 -> Stream(
//             TestScore(Chuck,83),
//             TestScore(Deborah,88),
//             TestScore(Ellie,80),
//             TestScore(Frannie,89) ),
//     70 -> Stream(
//             TestScore(Anna,74),
//             TestScore(Andy,76),
//             TestScore(Brenda,70) ),
//     60 -> Stream(
//             TestScore(Dan,66),
//             TestScore(Ed,61) ),
//     90 -> Stream(
//             TestScore(Bobby,90),
//             TestScore(Charlotte,98),
//             TestScore(Frank,96) )
//   )

You can take it another couple of steps to produce a histogram by counting the number of students in each group and sorting by the group key (i.e., grade level):

val histogram = grouped.map {
  case (grade, scores) => grade -> scores.length
}.toStream.sortBy(_._1).reverse
// histogram: Stream[(Int, Int)] =
//   Stream((90,3), (80,4), (70,3), (60,2))

One more example, and this one I borrowed from one of Steven Proctor’s Ruby Tuesday posts. You can find anagrams in a list of words by sorting the characters in each word and grouping on that:

val anagrams = Stream(
  "tar", "rat", "bar",
  "rob", "art", "orb"
).groupBy(_.sorted)
// anagrams: Map[String,Stream[String]] = 
//   Map(
//     abr -> Stream(bar),
//     art -> Stream(tar, rat, art),
//     bor -> Stream(rob, orb) )

Scala Saturday – Code That Looks Like Math

Something that Scala and most other modern languages allow these days is variable names that contain what you might think of as non-traditional characters from the Unicode character set, e.g., Greek symbols such as π and τ. If you’re a C programmer, you have to settle for spelling out the name of the character:

const double PI = 3.141592654;
double delta = x1 - x2;

But that’s OK, right? What’s the difference, really? The value π is one thing: it’s a universally recognized constant. But even with the example delta above, don’t you want to name it something more descriptive, like marginOfError anyway?

Well, yes, many times instead of using the characters verbatim from your physics textbook …

val f = m * a

… you spell it out so that the code is clearer:

val force = mass * acceleration

Likewise, even though Scala allows you to write the following:

val ω = 2 * math.Pi * f

… it’s probably better practice to write …

val angularVelocity = 2 * math.Pi * frequency

What’s the point of this post then? Sure, you can use “special” characters in variable names, but so far, I’ve discouraged you from doing it!

Nevertheless there are times when it is appropriate. If you are coding up an algorithm that consists of a series of well-known equations in a certain field of study, and the more your code looks like those equations, the easier it is to check it against the literature.

Consider the following—a series of values and equations for converting latitude and longitude to universal polar stereographic (UPS) coordinates, a way of representing coordinates at the earth’s poles:

Values and equations for converting geodetic coordinates (latitude and longitude) into universal polar stereographic (UPS) coordinates
Converting Latitude and Longitude to Universal Polar Stereographic (UPS) Coordinates

UPS coordinates consist of a hemisphere—either northern or southern—and two distance components, easting and northing, both in meters:

object Hemisphere extends Enumeration {
  type Hemisphere = Value

  val Northern = Value('N')
  val Southern = Value('S')

  def fromLatitude(lat: Double): Hemisphere =
    if (lat < 0) Southern else Northern
}

case class UniversalPolarStereographic(
    northing: Double,
    easting: Double,
    hemisphere: Hemisphere)

Now compare the code below to the equations from the literature above:

def latLonToUps(
    lat: Double, 
    lon: Double): UniversalPolarStereographic = {

  val hemisphere = Hemisphere.fromLatitude(lat)

  val φ = lat.abs
  val λ = lon
    
  val π = math.Pi

  val FN = 2000000.0
  val FE = 2000000.0

  val a = 6378137.0
  val f = 1 / 298.257223563

  val e_2 = f * (2 - f)
  val e = math.sqrt(e_2)
  val eOver2 = e / 2
  val Cₒ = ((2 * a) / math.sqrt(1 - e_2)) *
           math.pow((1 - e) / (1 + e), eOver2)
  val kₒ = 0.994
  val πOver4 = π / 4

  val esinφ = e * math.sin(φ)
  val φOver2 = φ / 2

  val tanZOver2 = 
    math.pow((1 + esinφ) / (1 - esinφ), eOver2) *
    math.tan(πOver4 - φOver2)
  val R = kₒ * Cₒ * tanZOver2
  val Rcosλ = R * math.cos(λ)
  val Rsinλ = R * math.sin(λ)

  val N = hemisphere match {
    case Hemisphere.Northern => FN - Rcosλ
    case Hemisphere.Southern => FN + Rcosλ
  }
  val E = FE + Rsinλ

  UniversalPolarStereographic(N, E, hemisphere)
}

It’s not perfect: you still cannot set numerators above denominators, for instance. But isn’t that easier to compare to the literature than if we had to write out RsinLambda or eSinPhi?

(Note: UPS coordinates are only valid for latitudes near the poles. For simplicity, the code above does not check to make sure that the input latitude falls within those bounds. I mean, it’s complex enough as it is for the sake of exemplifying the point of this post.)

(Note: I’m aware that some of the characters in the code don’t show up correctly on all browsers, e.g., the subscript “O” and perhaps the φ. I’m working to correct that. Nevertheless you should be able to use such symbols in your source code.)

Scala Saturday – Array.last and Array.lastOption

Last week, we looked at List.headOption. If you don’t need the first item in a list, but rather the last item, the counterpart of List.head is List.last. Likewise, the counterpart of List.headOption is List.lastOption.

Recall the code example from last week:

case class Salesman(name: String, sales: BigDecimal)

def findTopSalesman (salesmen : List[Salesman]) = {
  salesmen.filter { _.sales >= 10000 }
          .sortBy { -_.sales } // descending
          .headOption
}

val sales = List(
  Salesman("Joe Bob", 9500),
  Salesman("Sally Jane", 18500),
  Salesman("Betty Lou", 11800),
  Salesman("Sammy Joe", 6500)
)

val top = findTopSalesman(sales)
// top: Option[Salesman] = 
//   Some(Salesman(Sally Jane,18500))

The List.sortBy call (highlighted above) sorts the sales records in a descending fashion so that the first record is the top salesman. What if it makes more sense to you to sort the records in an ascending fashion and take the last record? With List.lastOption, you can:

case class Salesman(name: String, sales: BigDecimal)

def findTopSalesman (salesmen : List[Salesman]) = {
  salesmen.filter { _.sales >= 10000 }
          .sortBy { _.sales } // ascending
          .lastOption
}

val sales = List(
  Salesman("Joe Bob", 9500),
  Salesman("Sally Jane", 18500),
  Salesman("Betty Lou", 11800),
  Salesman("Sammy Joe", 6500)
)

val top = findTopSalesman(sales)
// top: Option[Salesman] = 
//   Some(Salesman(Sally Jane,18500))

This is a trivial example: I mean, you can sort the records however you wish; they’re your records! But what if you receive the recordset from elsewhere—an API that is outside your control, for instance—and it is already sorted in an ascending fashion. It is probably better to accept the recordset as is and just take the last item rather than to sort it again. Which brings me to another couple of other points …

This post claims to be about Array.last and Array.lastOption. Why all the talk about List.lastOption?

First, as you have probably noticed already, just about any method available on one sequential collection is available on them all. That is, if there’s a List.last, for example, then there’s also a Seq.last, an Array.last, and a Stream.last.

Second, I want to point out a potential pitfall of using last and lastOption. Both the size and type of the collection can affect the performance of your program or even crash it.

Arrays give you O(1) access to their elements. (In case you’re not familiar with it, that’s called “Big O notation.” It’s a way of expressing how long an algorithm takes to execute.) That is, arrays give you nearly instant access to any element—first, last, somewhere in the middle—doesn’t matter.

Lists and sequences, on the other hand, give you O(n) access to their elements. That is, the more items in the list/sequence, the longer it takes to get to the one you want because the machine always has to start at the first element and iterate through every single one until it gets to the one you want. No big deal if there are only 100 elements, but if there are 10,000,000 elements, fetching the last element will take a while.

Furthermore, streams can be infinite. If you call Stream.last or Stream.lastOption on an infinite sequence, your program will crash:

// This sequence starts at one and just keeps going
val ns = Stream.from(1)
val notGood = ns.last
// java.lang.OutOfMemoryError: GC overhead limit exceeded

You don’t have to eschew last and lastOption. Just take into account what kind of collection you’re calling them on. Array.last and Array.lastOption are perfectly safe. (Well, do remember that Array.last throws an exception if the array is empty, but with regard to performance, it’s fine.) But before you call last or lastOption on a list or a stream, make sure you know how big it is, or you could, as they say, shoot yourself in the foot.

Scala Saturday – List.headOption

Sometimes you need to get the first element of a list. No problem: List.head to the rescue, right? But what happens when you call List.head on an empty list?

val xs = List[Int]()
val h = xs.head
// java.util.NoSuchElementException: head of empty list

Well that’s not good. You could get around that little wart with this:

val xs = List[Int]()
val h = xs match {
  case h :: _ => h
  case Nil => -1
}

That’s not great. Alternatively, there’s this:

val xs = List[Int]()
val h = if (xs.isEmpty) -1 else xs.head
// h: Int = -1

Yeah, I’m not wild about those options either.

Speaking of options, what if you had a method that returns a None if you ask for the head of an empty list? If the list is not empty, it could return a Some containing the value of the head element. List.headOption does just that.

val empty = List[Int]()
val nonempty = (9 to 17).toList

val nuthin = empty.headOption
// nuthin: Option[Int] = None

val sumthin = nonempty.headOption
// sumthin: Option[Int] = Some(9)

Now you can use Option.getOrElse on the result of a call to List.headOption in order to return a default value in the event of an empty list:

val empty = List[Int]()
val nonempty = (9 to 17).toList

val head = nonempty.headOption getOrElse -1
// head: Int = 9

val fallback = empty.headOption getOrElse -1
// fallback: Int = -1

Now when might you actually use something like this? Perhaps you want to determine the top salesman each day, but only if the salesman has reached a certain threshold, say, $10,000. You can filter out the salesmen who don’t reach the threshold, sort the list of salesmen according to end-of-day sales totals, and then try to take the head element. If no one makes the cut, then the filter operation returns an empty list, which ultimately yields a None.

case class Salesman(name: String, sales: BigDecimal)

def findTopSalesman (salesmen : List[Salesman]) = {
  salesmen.filter { _.sales >= 10000 }
    .sortBy { -_.sales } // descending
    .headOption
}

So then, if Monday’s sales are as follows, then no one gets the prize because no one has broken $10,000:

val monday = List(
  Salesman("Joe Bob", 9500),
  Salesman("Sally Jane", 8500),
  Salesman("Betty Lou", 9800),
  Salesman("Sammy Joe", 6500)
)

val mondayTop = findTopSalesman(monday)
// mondayTop: Option[Salesman] = None

On Tuesday, though, there are two contenders. Alas, there can be only one winner, and Sally Jane (who, ironically, is from British Columbia) takes the prize:

val tuesday = List(
  Salesman("Joe Bob", 9500),
  Salesman("Sally Jane", 18500),
  Salesman("Betty Lou", 11800),
  Salesman("Sammy Joe", 6500)
)

val tuesdayTop = findTopSalesman(tuesday)
// tuesdayTop: Option[Salesman] = 
//   Some(Salesman(Sally Jane,18500))

Scala Saturday – Stream.collect

Filtering over a sequence of values omits values that do not meet certain criteria. Mapping over a sequence of values transforms each value into another value. What if you could do both at the same time—filter out unwanted values, but transform the ones that are left? You can with Stream.collect. But first, you need to know about partial functions.

Partial Functions

A partial function is a function that has a limited domain, i.e., is not defined for every possible value of its input type, but only a subset.

The classic example is division. Division is undefined for a divisor of zero. In other words, m ÷ n is valid unless n = 0. So then, division is not defined for every number n. In this particular example, that’s not a big limitation on the domain, but it is nevertheless a limitation that prevents us from saying that division is defined for every possible n.

Scala has a PartialFunction type that allows you to represent a function that is only valid for a limited domain. Here is how you could represent integer division:

val divide = new PartialFunction[(Int,Int), Int] {
  override def isDefinedAt(x: (Int, Int)) = x._2 != 0
  override def apply(x: (Int, Int)) = x._1 / x._2
}

val quotient = divide(12, 4)
// quotient: Int = 3

Partial functions have the apply method that other functions have so that you can execute them with parentheses: divide(12, 3). They also have an isDefinedAt method so that you can ask the partial function, “Hey, can you handle this input?” That way, you can use an if-else expression to return a default or some other value:

val fine = if (divide.isDefinedAt(12, 4)) {
  divide(12, 4)
} else Int.MaxValue
// three: Int = 3

val meh = if (divide.isDefinedAt(12, 0)) {
  divide(12, 0)
} else Int.MaxValue
// meh: Int = 2147483647

In fact, this is such a common pattern, that PartialFunction has applyOrElse that takes an input and a default function that is executed if the partial function is not defined for the given input:

val default = Function.const(Int.MinValue) _  // lifted
val fine = divide.applyOrElse((12, 4), default)
// fine: Int = 3
val meh = divide.applyOrElse((12, 0), default)
// meh: Int = -2147483648

Now just because a partial function has a limited domain doesn’t mean that Scala prevents you from calling it on inputs that are outside its domain:

val quotient = divide(12, 0)
// java.lang.ArithmeticException: / by zero

Therefore, remember to check the domain of a partial function before applying it to a given input. A responsibly crafted API that accepts partial functions from you will verify that an input is in the partial function’s domain before applying it.

You may be thinking, “That’s great, but it’s got a lot of boilerplate.” That’s true. Scala is nice enough to let you use pattern matching syntax to define a partial function in a terser fashion:

val divide: PartialFunction[(Int,Int), Int] = {
  case (num, den) if den != 0 => num / den 
}

val quotient = divide(12, 4)
// quotient: Int = 3

Finally, perhaps a single partial function is not defined for the entire set of possible inputs, but you can use multiple partial functions that together cover the entire input range. It’s a contrived example, but you can take one partial function that is defined for even integers and another one that is defined for odds and then compose them together with the orElse method to get a partial function that does cover the entire set of possible inputs:

val square: PartialFunction[Int,Int] = {
  case x if x % 2 == 0 => x * x
}
val cube: PartialFunction[Int,Int] = {
  case x if x % 2 == 1 => x * x * x
}
val transform = square orElse cube

val squared = transform(4)
// squared: Int = 16

val cubed = transform(3)
// cubed: Int = 27

Collect: Filter and Map in One

Whereas Stream.filter takes a predicate—a function that takes a value and returns a Boolean—Stream.collect takes—you guessed it—a partial function. Stream.collect checks each element of the stream to see whether it is in the partial function’s domain. If the partial function is not defined for the input element, then Stream.collect discards it. If the input is within the partial function’s domain, then Stream.collect applies the partial function to the input element and returns the result as the next element in the output sequence.

val squaredEvens = (4 to 7).toStream.collect {
  case n if n % 2 == 0 => n * n
}
// squaredEvens: Stream[Int] = Stream(16, 36)

The following graphic illustrates what is going on in the code above:

Stream.collect takes a partial function performs a transform on its inputs for which it is defined. The resulting stream only retains the transformed values for which the partial function is defined; Stream.collect filters out any values not in the partial function's domain.
Collecting Items from a Stream

OK, so Stream.collect performs a filter and a map all in one. Why not just call Stream.filter and then Stream.map? One example I’ve seen is when you’re pattern matching and destructuring and then only using one/some of the potential match cases. Perhaps you have a trait and some case classes representing orders that were either fulfilled or cancelled before fulfillment:

trait Order
case class Fulfilled(id: String, total: BigDecimal)
case class Cancelled(id: String, total: BigDecimal)

You’d like to know how many dollars you “lost” in cancelled orders. Use Stream.collect to extract the dollar value of each cancelled order, and then sum them:

val orders = Stream(
  Fulfilled("fef3356074b4", BigDecimal("28.50")),
  Fulfilled("2605c9988f1d", BigDecimal("88.25")),
  Cancelled("94edac47971f", BigDecimal("22.01")),
  Fulfilled("2a1ff57b8f46", BigDecimal("39.30")),
  Fulfilled("9ee0a3e3da3a", BigDecimal("27.97")),
  Cancelled("db5dc439ad93", BigDecimal("99.49")),
  Fulfilled("08d58811ed36", BigDecimal("53.72")),
  Cancelled("63ebd07475ca", BigDecimal("93.66")),
  Cancelled("12d16ae9c112", BigDecimal( "7.79")),
  Fulfilled("c5ecedaedb0e", BigDecimal("87.21")) )

val cancelledDollars = orders.collect {
  case Cancelled(_, dollars) => dollars
}.sum
// cancelledDollars: BigDecimal = 222.95

Scala Saturday – The Stream.grouped Method

Another method Stream offers is Stream.grouped, which divides a stream’s elements into groups of a given size.

To take an example, if you have a stream of twelve elements and call Stream.grouped to turn it into groups of three, you’ll get an iterator over four sequences, each three elements in size:

val xs = (1 to 12).toStream
val grouped = xs.grouped(3)
// grouped: Iterator[Stream[Int]] =
//   Iterator(
//     Stream(1, 2, 3), Stream(4, 5, 6), 
//     Stream(7, 8, 9), Stream(10, 11, 12))

What happens if you use a group size that does not divide evenly into the size of your input stream? No sweat! The last group just contains any remaining elements, however many they may be:

val xs = (1 to 10).toStream
val grouped = xs.grouped(3)
// grouped: Iterator[Stream[Int]] =
//   Iterator(
//     Stream(1, 2, 3), Stream(4, 5, 6), 
//     Stream(7, 8, 9), Stream(10))

Where is this useful? Well, you can take my paging example from my Scala Saturday post on Stream.drop and make it slightly clearer without the (page - 1) * perPage arithmetic:

case class Book(title: String, author: String)
 
val books = Stream(
  Book("Wuthering Heights", "Emily Bronte"),
  Book("Jane Eyre", "Charlotte Bronte"),
  Book("Agnes Grey", "Anne Bronte"),
  Book("The Scarlet Letter", "Nathaniel Hawthorne"),
  Book("Silas Marner", "George Eliot"),
  Book("1984", "George Orwell"),
  Book("Billy Budd", "Herman Melville"),
  Book("Moby Dick", "Herman Melville"),
  Book("The Great Gatsby", "F. Scott Fitzgerald"),
  Book("Tom Sawyer", "Mark Twain")
)

val perPage = 3
val page = 3
val records = books.grouped(perPage)
                .drop(page - 1)
                .next
// records: scala.collection.immutable.Stream[Book] = 
//   Stream(Book(Billy Budd,Herman Melville), 
//     Book(Moby Dick,Herman Melville), 
//     Book(The Great Gatsby,F. Scott Fitzgerald))

This time, instead of having to calculate the number of elements to skip in order to skip n pages, you first use Stream.grouped to turn the stream into a paged recordset; each “page” is n records long. Then drop page - 1 pages in order to get to the page of records you want. Finally, calling Iterator.next is necessary because, remember, Stream.grouped turns a flat stream into a stream of streams.

I will admit that I find it irritating that Stream.grouped returns something that does not have a head method. Calling Iterator.next, while just as easy, is inconsistent with collection semantics. It seems to me that Stream.grouped ought to return a collection rather than an iterator. Perhaps there was once a reason for returning an iterator instead of a collection, but it would be nice if we could fix that.

Scala Saturday – The Stream.distinct Method

Scala Saturday today is short and sweet: Stream.distinct. Stream.distinct removes any duplicate members of a stream, leaving only unique values.

One way to remove duplicates is to turn your stream into a set with Stream.toSet:

val noDupes = Stream(3,5,6,3,3,7,1,1,7,3,2,7).toSet
// noDupes: scala.collection.immutable.Set[Int] = 
//   Set(5, 1, 6, 2, 7, 3)

That’s fine if you don’t care about preserving the order of the items in the input stream.

But if you do want to preserve the order, Stream.distinct is the ticket:

val noDupesOrdered =
    Stream(3,5,6,3,3,7,1,1,7,3,2,7).distinct
// noDupesOrdered: scala.collection.immutable.Stream[Int] = 
//   Stream(3, 5, 6, 7, 1, 2)

Scala Saturday – The Stream.dropWhile Method

Just as the analog to Stream.take is Stream.skip, the analog to Stream.takeWhile is Stream.dropWhile. That is, when you don’t care so much about dropping a certain number of items, but rather a certain kind of items.

Stream.dropWhile starts at the beginning of the stream and applies a predicate to each item, one by one. It does not start returning items in a new stream until it reaches an item that does not meet the predicate. Then it stops checking elements against the predicate and returns every item in the stream from that point on:

Stream.dropWhile skips items at the beginning of a sequence until it reaches an item that does not meet the given predicate.
Dropping Items in a Sequence While a Predicate Holds

Assume you have the same temperature sensor as the one in my post on Stream.takeWhile. This time, instead of once per minute, assume that it feeds you temperature readings once per second. Add to that the idea that the sensor has a few seconds of boot-up time in which it sends you -1000.0—the indication that the current reading is invalid—until it has fully booted and can start sending good temperature data.

import java.time.{LocalDateTime, Month}
case class Reading(temperature: Double, timestamp: LocalDateTime)

val readings = Stream(
  Reading(-1000.0, LocalDateTime.of(2015, Month.JULY, 19, 10, 0, 0)),
  Reading(-1000.0, LocalDateTime.of(2015, Month.JULY, 19, 10, 0, 1)),
  Reading(-1000.0, LocalDateTime.of(2015, Month.JULY, 19, 10, 0, 2)),
  Reading(-1000.0, LocalDateTime.of(2015, Month.JULY, 19, 10, 0, 3)),
  Reading(-1000.0, LocalDateTime.of(2015, Month.JULY, 19, 10, 0, 4)),
  Reading(-1000.0, LocalDateTime.of(2015, Month.JULY, 19, 10, 0, 5)),
  Reading(90.1, LocalDateTime.of(2015, Month.JULY, 19, 10, 0, 6)),
  Reading(90.2, LocalDateTime.of(2015, Month.JULY, 19, 10, 0, 7)),
  Reading(90.2, LocalDateTime.of(2015, Month.JULY, 19, 10, 0, 8)),
  Reading(90.3, LocalDateTime.of(2015, Month.JULY, 19, 10, 0, 9)),
  Reading(90.2, LocalDateTime.of(2015, Month.JULY, 19, 10, 0, 10))
)

To drop all readings until the thermometer starts returning valid data, use Stream.dropWhile:

val valid = readings dropWhile (_.temperature == -1000.0)

// valid: scala.collection.immutable.Stream[Reading] = 
//   Stream(Reading(90.1,2015-07-19T10:00:06), 
//     Reading(90.2,2015-07-19T10:00:07), 
//     Reading(90.2,2015-07-19T10:00:08), 
//     Reading(90.3,2015-07-19T10:00:09), 
//     Reading(90.2,2015-07-19T10:00:10))

Finally, like Stream.takeWhile, Stream.dropWhile doesn’t balk if it never reaches an element that fails to meet the predicate. You just get an empty stream:

val none = Stream(1,3,4,7) dropWhile { _ < 10 }
// none: scala.collection.immutable.Stream[Int] = Stream()

Scala Saturday – The Stream.drop Method

The opposite of Stream.take is Stream.drop. Stream.drop, as the name suggests, skips the first n items of the sequence and returns a new sequence that starts with element n + 1:

val xs = (1 to 10).toStream
val dropped5 = xs drop 5
// dropped5: scala.collection.immutable.Stream[Int] =
//   Stream(6, 7, 8, 9, 10)

One of most obvious applications of Stream.drop is to pair it with Stream.take to page through a set of records. Perhaps you have a list of books:

case class Book(title: String, author: String)

val books = Stream(
  Book("Wuthering Heights", "Emily Bronte"),
  Book("Jane Eyre", "Charlotte Bronte"),
  Book("Agnes Grey", "Anne Bronte"),
  Book("The Scarlet Letter", "Nathaniel Hawthorne"),
  Book("Silas Marner", "George Eliot"),
  Book("1984", "George Orwell"),
  Book("Billy Budd", "Herman Melville"),
  Book("Moby Dick", "Herman Melville"),
  Book("The Great Gatsby", "F. Scott Fitzgerald"),
  Book("Tom Sawyer", "Mark Twain")
)

If each page shows three books, and the user wants to see the records on page three, skip the first two pages’ worth of records, and take the next three records:

val perPage = 3
val page = 3
val records = books.drop((page - 1) * perPage)
                .take(perPage)
// records: scala.collection.immutable.Stream[Book] = 
//   Stream(Book(Billy Budd,Herman Melville), 
//     Book(Moby Dick,Herman Melville), 
//     Book(The Great Gatsby,F. Scott Fitzgerald))

Fortunately, like Stream.take, if you ask the sequence for more elements than it contains, you simply get an empty stream:

val empty= (1 to 5).toStream drop 6
// empty: scala.collection.immutable.Stream[Int] =
//   Stream()