Coalesce data functionally

In a recent project I had to coalesce quite significant amount of data in the following way. To simplify it for this post, consider that we have the following two lists.

val x = List(“a”, “b”, “c”, “a”)

val y = List(1, 2, 6, 9)

We are about to write a function which would return the following list as the result.

val result = List((a,10), (b,2), (c,6))

Basically it would coalesce value with the same category. See for instance “b” in the above example.

Language that came up with repl inherently provides very nice way to try out different expression and to get to the expected outcome. In this context, as we are using scala, we can use repl-driven development quite conveniently as illustrated below.

  • Define the Lists:
scala> val x = List("a", "b" , "c", "a")
x: List[String] = List(a, b, c, a)

scala> val y = List(1,2,6,9)
y: List[Int] = List(1, 2, 6, 9)
  • Zip them.
scala> val z = x zip y
z: List[(String, Int)] = List((a,1), (b,2), (c,6), (a,9))
scala>
  • Group them based on the values of x.
scala> val grps = z groupBy (_._1)
grps: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(b -> List((b,2)), a -> List((a,1), (a,9)), c -> List((c,6)))

scala>
  • Map the values of res8 and reduce them to compute the sum.
scala> val res = grps.values.map {_.reduce((i,j) => (i._1, (i._2+j._2)))}

res: Iterable[(String, Int)] = List((b,2), (a,10), (c,6))
scala>
  • Sort res based on the 1st value of the tuple.
scala> res.toList.sorted
res23: List[(String, Int)] = List((a,10), (b,2), (c,6))

scala>

Thus, the function can be simply written as follows:

Thus we get the expected result.

Advertisements

Interactive Scala Development with SBT and JRebel

Problem Statement

Scala REPL is a great tool for incrementally develop software. However, its is quite slow as noted here. Alternative: use Simple Build Tool, aka– SBT. SBT, via its tasks such as compile, run, console, facilitates a quick feedback cycle. The command sbt ~compile further provides continuous/incremental compilation model, that is– whenever a .scala file changes, it re-compiles them and generates .class files.

Problem is that in order to reload these changed class files, one has to restart sbt console. In my humble opinion, it seems a bit inconvenient.

Solution

JRebel provides dynamic reloading of modified *.class files. In combination with sbt ~compile, this setup leads to an interactive development experience as follows.

  • We run sbt ~compile in one console window
  • In other one, we run run sbt console.

Thus, when we modify Scala source codes, first SBT instance compiles it and generates class files. If we invoke any command in the second SBT instance, JRebel reloads the modified class files and afterwards the command gets executed, as shown below.

Note that JRebel is free for personal use, and it is definitely worth taking a look. For more information about the development flow using JRebel, SBT, Eclipse/IDEA, please have a look at this article which describes the setup process in details, or simply leave a comment.

Certificate of #ProgFun of @Coursera

Yay! Finally, I have received the certificate of Functional Programming Principles in Scala course. Thanks @coursera! It has been a great pleasure, and I look forward to the future courses (e.g., Discrete Optimization).

progfun certificate

In a previous post, I remember mentioning how awesome this course is, and how much I have enjoyed this course, and looking forward to the next advanced course in this topic. It’s great to know that around 70% students ‘Absolutely’ share the same feeling. So, prof. @oderskey, please hurry up. We are waiting!

Once again, thanks prof. @oderskey and his team for this excellent course.

Retrospective of #ProgFun

Yay! Just finished Functional Programming Principles in Scala (with 100% score :D) instructed by Prof. Martin Odersky et al, at Coursera. It has been an excellent experience due to its great content, amazing teacher, and interesting assignments (e.g., implementing a solver for Bloxorz).

This course focuses on providing a deeper understanding of FP paradigm, and demonstrates how it excels elegantly compare to its contemporaries. Function composition, recursion, pattern matching, concepts of persistent data structures, lazy evaluation are among the few important concepts that have been emphasized and exemplified in the seven weeks of the course. The application of term-rewriting during reduction of expressions and reasoning about it, seemed simply awesome.

Overall, it has been a wonderful experience and I highly recommend it for anyone interested in learning FP paradigm. To motivate further, note that upon successful completion, a certificate from Coursera is provided ;).

Thanks @oderskey, his team, and @coursera for this excellent course, and for this great opportunity. Looking forward to its advanced part.