Spellchecker in Scala vs Kotlin

Recently code for a spellchecker in Kotlin was posted on a blog. What is interesting about the code is that it can almost be converted to Scala using only find and replace (“fun” -> “def”, “<” -> “[“,  “>” -> “]”).

More interesting is that idiomatic Scala code is much less verbose than the Kotlin version. The Scala version (21 lines) is almost half the size of the Kotlin version (40 lines). The Java published on the blog linked above is 43 lines.

Scala version here:

package com.wordpress.capecoder

import scala.io.Source

object ScalaSpellChecker {

    private val DICTIONARY_WORDS = "/usr/share/dict/words";

    def checkSpelling(input : String) = {
        val words = readDictionary()
        !input.toLowerCase().split(" ").exists(!words.contains(_))
    }

    private def readDictionary = Source.fromFile(DICTIONARY_WORDS).getLines.toSet+"scala"

	def main(args : Array[String]) {
       val defaultInput = "scala fuses functional and OO programming"
       val valid = checkSpelling(if (args.size > 0) args(0) else defaultInput)
       println("Is the text valid? "+valid)
    }
}

Kotlin version here:

package com.richardlog.spellcheck

import java.io.*
import java.io.File
import java.util.Set

class KotlinSpellChecker {

    private val DICTIONARY_WORDS = File("/usr/share/dict/words");

    fun checkSpelling(input : String) : Boolean {
        val words = readDictionary()
        for (word in input.toLowerCase().split(" ")) {
            if (!words.contains(word)) {
                println("$word is not in the dictionary");
                return false;
            }
        }
        return true;
    }

    private fun readDictionary() : Set {
        val words = hashSet("kotlin") // add kotlin to dictionary
        val stream = FileInputStream(DICTIONARY_WORDS).buffered();
        try {
            val reader = InputStreamReader(stream, "UTF-8");
            reader.forEachLine( { words.add(it)} )
        } finally {
            stream.close();
        }
        return words;
    }
}

fun main(args : Array) {
    val defaultInput = "Kotlin is an island"
    val valid = KotlinSpellChecker().checkSpelling(if (args.size > 0) args[0] else defaultInput)
    println("Is the text valid? $valid")
}

Note that I knocked the Scala version together in about 5 minutes so there may still be some minor syntax errors/typos.

Advertisements

Language popularity: It’s not about search engine result counts

Recently there has been a lot of noise about the Tiobe index in which search engine result counts are compared for various programming languages. Looking at search engine results is an approximate, but very inaccurate method of measuring popularity. One problem with such a method is that the mere mention of a programming language on any web page (regardless of context) is interpreted as “popularity”. There is no notion of how old a web page is. If C++ gets mentioned in a blog post from 1997 that’s counted towards current “popularity” even if the author of the blog post no longer uses C++. Search results including “I hate XYX programming” and “XYZ programming sucks” get counted as “popularity”.

What we should really be measuring is which languages are actively being used. How do you measure usage? The first idea which usually springs to mind is to see how many open source projects are using language X on GitHub or Sourceforge. This logic is deeply flawed as a great deal of code being written today is not open source. Focusing only on open source projects excludes vast quantities of code being churned out by paid developers working on projects and internal systems which will never be open sourced.

We need to measure the number of developers actively writing code in a particular language *today*. What do programming languages all have in common? They all have developers trying to solve real problems. Typically when a developer has a problem he can’t solve he goes to a site like stackoverflow.com and asks for advice. If you’re asking questions about how to do something in a programming language there’s a very high probability that you are actively using that language.

Looking at stackoverflow.com data for the last week we get a picture which is very different from the Tiobe index. The first thing that stands out is that Java, C#, Javascript and PHP feature much higher in the rankings than C. This should not be surprising. While C is suited to many tasks such as operating systems and device driver development the vast majority of code being churned out by Joe Developer is not written in C.

The next thing stands out is that the next generation of JVM languages (Scala, Groovy, Clojure) feature well ahead of languages such as Ada, Nxt-g and Logo which are ranked surprisingly high in naive search engine result counts. Scala is in fact getting very close to breaking into the mainstream group.

stackoverflow questions per week by language

New Scala books

One of the signs of a healthy language is the large number of books written on the subject. Last month saw the release of Scala for the Impatient, an excellent introduction to the language for programmers who already know a language like Java, C# or Ruby. If you’ve been thinking about learning Scala this is a no-nonsense introduction.

Next month will see the release of Scala in Depth, which makes for an excellent second book on the subject, focusing on best practices, not unlike Josh Bloch’s Effective Java Programming.

A third new book being published soon is Scala in Action which is slated for release in July.

[EDIT: See the comments section for details on more Scala books about to be published which I missed!]

The existing collection of books on Scala is already impressive and includes:
Programming in Scala (Odersky 2nd Ed.) An excellent book to understand the “why” as much as the “how”
Programming Scala (O’Reilly)
Programming Scala (Pragmatic Series)

It would be nice to see an updated or new book coming out on the 2nd version of the Play Framework

Another opportunity for a new book would be a Scala cookbook including examples using the standard libraries as well as third party libraries.