Sometimes you have a collection of items that you want to group according to some common property or key. Seq.groupBy
can do that job for you. It takes that collection and returns a sequence of pairs. The first element of each pair is the grouping key; the second is the sequence of all the items that fall into that group.
That’s a little hard to follow. So what’s it useful for?
Well, maybe you need to group a list of names by initial. No sweat:
let names = seq [ "Rehoboam" "Abijah" "Asa" "Jehoshaphat" "Jehoram" "Ahaziah" ] let groupedByInitial = names |> Seq.groupBy (fun s -> s.[0]) // val groupedByInitial : seq<char * seq<string>> = // seq [ ('R', seq ["Rehoboam"]) // ('A', seq ["Abijah"; "Asa"; "Ahaziah"]) // ('J', seq ["Jehoshaphat"; "Jehoram"]) ]
And of course, if you want to sort those groups, throw in a call to Seq.sortBy
to sort by the first element in the tuple:
let groupedByInitialAndSorted = names |> Seq.groupBy (fun s -> s.[0]) |> Seq.sortBy fst // val groupedByInitialAndSorted : seq<char * seq<string>> = // seq [ ('A', seq ["Abijah"; "Asa"; "Ahaziah"]) // ('J', seq ["Jehoshaphat"; "Jehoram"]) // ('R', seq ["Rehoboam"]) ]
Another example: Maybe you want to group some test scores according to grade. That is, all the students who scored in the 90s are grouped together, then all those scoring in the 80s, and so on:
type TestScore = { Name : string Score : int } let grades = seq [ { Name = "Anna"; Score = 74 } { Name = "Andy"; Score = 76 } { Name = "Brenda"; Score = 70 } { Name = "Bobby"; Score = 90 } { Name = "Charlotte"; Score = 98 } { Name = "Chuck"; Score = 83 } { Name = "Deborah"; Score = 88 } { Name = "Dan"; Score = 66 } { Name = "Ellie"; Score = 80 } { Name = "Ed"; Score = 61 } { Name = "Frannie"; Score = 89 } { Name = "Frank"; Score = 96 } ] let grouped = grades |> Seq.groupBy (fun s -> s.Score / 10 * 10) // val grouped : seq<int * seq<TestScore>> = // seq [ // (70, seq [ {Name = "Anna"; Score = 74} // {Name = "Andy"; Score = 76} // {Name = "Brenda"; Score = 70} ] ) // (90, seq [ {Name = "Bobby"; Score = 90} // {Name = "Charlotte"; Score = 98} // {Name = "Frank"; Score = 96} ] ) // (80, seq [ {Name = "Chuck"; Score = 83} // {Name = "Deborah"; Score = 88} // {Name = "Ellie"; Score = 80} // {Name = "Frannie"; Score = 89} ] ) // (60, seq [ {Name = "Dan"; Score = 66} // {Name = "Ed"; Score = 61} ] ) // ]
You can take it another couple of steps to produce a histogram by counting the number of students in each group and sorting by the group key (i.e., grade level):
let histogram = grouped |> Seq.map (fun pair -> let score = fst pair let count = snd pair |> Seq.length score, count) |> Seq.sort |> Seq.rev // val histogram : seq<int * int> = // seq [(90, 3); (80, 4); (70, 3); (60, 2)]
One more example, and this one has a little pitfall in it. I borrowed this one from one of Steven Proctor’s Ruby Tuesday posts. You can find anagrams in a list of words by sorting the characters in each word and grouping on that:
let anagrams = seq ["tar"; "rat"; "bar"; "rob"; "art"; "orb"] |> Seq.groupBy (fun word -> word |> Seq.sort) // val anagrams : seq<seq<char> * seq<string>> = // seq [ // (seq ['a'; 'r'; 't'], seq ["tar"]) // (seq ['a'; 'r'; 't'], seq ["rat"]) // (seq ['a'; 'b'; 'r'], seq ["bar"]) // (seq ['b'; 'o'; 'r'], seq ["rob"]) // (seq ['a'; 'r'; 't'], seq ["art"]) // (seq ['b'; 'o'; 'r'], seq ["orb"]) // ]
Well, that’s not what I expected. I expected “rat,” “tar,” and “art” to be grouped together, but they’re not. Thanks to the good folks on Stack Overflow, I know what the problem is now: seq
is just an alias for a .NET IEnumerable
. IEnumerable
s are not structurally equivalent. That is, two IEnumerable
instances are not compared according to their contents. They are compared according to reference value. In other words, if two IEnumerable
s are not the very same instance, they are not equal.
In order to get Mr. Proctor’s example to work in F#, you have to add one more step to convert the grouping key, currently a seq
, to something that is a type that supports structural equivalence, such as an array:
let anagrams = seq ["tar"; "rat"; "bar"; "rob"; "art"; "orb"] |> Seq.groupBy (fun word -> word |> Seq.sort |> Seq.toArray) // val anagrams : seq<char [] * seq<string>> = // seq [ // ([|'a'; 'r'; 't'|], seq ["tar"; "rat"; "art"]) // ([|'a'; 'b'; 'r'|], seq ["bar"]) // ([|'b'; 'o'; 'r'|], seq ["rob"; "orb"]) // ]
Now, that’s better.
2 replies on “F# Friday – Seq.groupBy”
[…] F# Friday – Seq.groupBy – Brad Collins […]
[…] function you want is groupBy. There are versions for each of the basic F# collection types (e.g. List.groupBy, Seq.groupBy, […]