Categories
Tech

F# Friday – Seq.groupBy

Sometimes you have a collection of items that you want to group according to some common property or key. Seq.groupBy can do that job for you. It takes that collection and returns a sequence of pairs. The first element of each pair is the grouping key; the second is the sequence of all the items that fall into that group.

That’s a little hard to follow. So what’s it useful for?

Well, maybe you need to group a list of names by initial. No sweat:

let names = 
    seq [ "Rehoboam"
          "Abijah"
          "Asa"
          "Jehoshaphat"
          "Jehoram"
          "Ahaziah" ]

let groupedByInitial =
    names
    |> Seq.groupBy (fun s -> s.[0])
// val groupedByInitial : seq<char * seq<string>> =
//   seq [ ('R', seq ["Rehoboam"])
//         ('A', seq ["Abijah"; "Asa"; "Ahaziah"])
//         ('J', seq ["Jehoshaphat"; "Jehoram"]) ]

And of course, if you want to sort those groups, throw in a call to Seq.sortBy to sort by the first element in the tuple:

let groupedByInitialAndSorted =
    names
    |> Seq.groupBy (fun s -> s.[0])
    |> Seq.sortBy fst
// val groupedByInitialAndSorted : seq<char * seq<string>> =
//   seq [ ('A', seq ["Abijah"; "Asa"; "Ahaziah"])
//         ('J', seq ["Jehoshaphat"; "Jehoram"])
//         ('R', seq ["Rehoboam"]) ]

Another example: Maybe you want to group some test scores according to grade. That is, all the students who scored in the 90s are grouped together, then all those scoring in the 80s, and so on:

type TestScore =
    { Name : string
      Score : int }

let grades = 
    seq [
        { Name = "Anna"; Score = 74 }
        { Name = "Andy"; Score = 76 }
        { Name = "Brenda"; Score = 70 }
        { Name = "Bobby"; Score = 90 }
        { Name = "Charlotte"; Score = 98 }
        { Name = "Chuck"; Score = 83 }
        { Name = "Deborah"; Score = 88 }
        { Name = "Dan"; Score = 66 }
        { Name = "Ellie"; Score = 80 }
        { Name = "Ed"; Score = 61 }
        { Name = "Frannie"; Score = 89 }
        { Name = "Frank"; Score = 96 }
    ]

let grouped = 
    grades
    |> Seq.groupBy (fun s -> s.Score / 10 * 10)
// val grouped : seq<int * seq<TestScore>> =
//   seq [
//     (70, seq [ {Name = "Anna"; Score = 74}
//                {Name = "Andy"; Score = 76}
//                {Name = "Brenda"; Score = 70} ] )
//     (90, seq [ {Name = "Bobby"; Score = 90}
//                {Name = "Charlotte"; Score = 98}
//                {Name = "Frank"; Score = 96} ] )
//     (80, seq [ {Name = "Chuck"; Score = 83}
//                {Name = "Deborah"; Score = 88}
//                {Name = "Ellie"; Score = 80}
//                {Name = "Frannie"; Score = 89} ] )
//     (60, seq [ {Name = "Dan"; Score = 66}
//                {Name = "Ed"; Score = 61} ] )
//   ]

You can take it another couple of steps to produce a histogram by counting the number of students in each group and sorting by the group key (i.e., grade level):

let histogram = 
    grouped
    |> Seq.map (fun pair -> 
                    let score = fst pair
                    let count = snd pair
                                |> Seq.length
                    score, count)
    |> Seq.sort
    |> Seq.rev
// val histogram : seq<int * int> =
//   seq [(90, 3); (80, 4); (70, 3); (60, 2)]

One more example, and this one has a little pitfall in it. I borrowed this one from one of Steven Proctor’s Ruby Tuesday posts. You can find anagrams in a list of words by sorting the characters in each word and grouping on that:

let anagrams =
    seq ["tar"; "rat"; "bar"; "rob"; "art"; "orb"]
    |> Seq.groupBy (fun word -> word |> Seq.sort)
// val anagrams : seq<seq<char> * seq<string>> =
//   seq [
//     (seq ['a'; 'r'; 't'], seq ["tar"])
//     (seq ['a'; 'r'; 't'], seq ["rat"])
//     (seq ['a'; 'b'; 'r'], seq ["bar"])
//     (seq ['b'; 'o'; 'r'], seq ["rob"])
//     (seq ['a'; 'r'; 't'], seq ["art"])
//     (seq ['b'; 'o'; 'r'], seq ["orb"])
//   ]

Well, that’s not what I expected. I expected “rat,” “tar,” and “art” to be grouped together, but they’re not. Thanks to the good folks on Stack Overflow, I know what the problem is now: seq is just an alias for a .NET IEnumerable. IEnumerables are not structurally equivalent. That is, two IEnumerable instances are not compared according to their contents. They are compared according to reference value. In other words, if two IEnumerables are not the very same instance, they are not equal.

In order to get Mr. Proctor’s example to work in F#, you have to add one more step to convert the grouping key, currently a seq, to something that is a type that supports structural equivalence, such as an array:

let anagrams =
    seq ["tar"; "rat"; "bar"; "rob"; "art"; "orb"]
    |> Seq.groupBy (fun word -> word 
                                |> Seq.sort 
                                |> Seq.toArray)
// val anagrams : seq<char [] * seq<string>> =
//   seq [
//       ([|'a'; 'r'; 't'|], seq ["tar"; "rat"; "art"])
//       ([|'a'; 'b'; 'r'|], seq ["bar"])
//       ([|'b'; 'o'; 'r'|], seq ["rob"; "orb"])
//   ]

Now, that’s better.

2 replies on “F# Friday – Seq.groupBy”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.