In a recent project I had to coalesce quite significant amount of data in the following way. To simplify it for this post, consider that we have the following two lists.

val x = List(“a”, “b”, “c”, “a”) val y = List(1, 2, 6, 9)

We are about to write a function which would return the following list as the result.

val result = List((a,10), (b,2), (c,6))

Basically it would coalesce value with the same category. See for instance “b” in the above example.

Language that came up with **repl** inherently provides very nice way to try out different expression and to get to the expected outcome. In this context, as we are using scala, we can use repl-driven development quite conveniently as illustrated below.

- Define the Lists:

scala> val x = List("a", "b" , "c", "a") x: List[String] = List(a, b, c, a) scala> val y = List(1,2,6,9) y: List[Int] = List(1, 2, 6, 9)

- Zip them.

scala> val z = x zip y z: List[(String, Int)] = List((a,1), (b,2), (c,6), (a,9)) scala>

- Group them based on the values of
`x`

.

scala> val grps = z groupBy (_._1) grps: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(b -> List((b,2)), a -> List((a,1), (a,9)), c -> List((c,6))) scala>

- Map the values of
`res8`

and reduce them to compute the sum.

scala> val res = grps.values.map {_.reduce((i,j) => (i._1, (i._2+j._2)))} res: Iterable[(String, Int)] = List((b,2), (a,10), (c,6)) scala>

- Sort
`res`

based on the 1st value of the tuple.

scala> res.toList.sorted res23: List[(String, Int)] = List((a,10), (b,2), (c,6)) scala>

Thus, the function can be simply written as follows:

Thus we get the expected result.

Advertisements