Scala 2.11: AnyRefMap – benchmarking a high performance map implementation

In a previous blog post I benchmarked different Scala map implementations alongside java.util.HashMap. One of the clear conclusions was that the mutable map implementation in Scala lagged j.u.HashMap in terms of performance. This limitation was easily worked around by simply using the Scala conversion support to wrap the Java implementation. Fast-forward 18 months to 2014 and we see the emergence of two new high performance map implementations in Scala 2.11. The first is a new specialized map implementation for using Long values as keys, the other general purpose map is
AnyRefMap. The specialized collection offers very high performance for keys of type Long. The focus of this blog post is on AnyRefMap, which is likely to see very widespread use in Scala codebases.

I’ve revisited the benchmark I used in 2012 and incorporated AnyRefMap, benchmarking it alongside java.util.HashMap and collection.mutable.HashMap. I benchmarked the insertion of 3 million random (pre-generated) strings into each map, followed by looking up every one of the 3 million strings 3 times. I ran each test repeatedly to “warm up” the JVM, only taking the last 10 runs for each map. The tests were run on an Intel i3 system using a recent JDK release and Scala 2.11 M8.

The results were nothing short of impressive for AnyRefMap. The new map implementation significantly out-performed mutable.HashMap for both insertions and lookups. AnyRefMap also outperformed java.util.HashMap for insertions, and matching it for lookup performance. I took the median value of the last 10 runs for each map implementation. The results are shown in the figure below.

MapBenchmark

import collection.mutable
import collection.mutable._
import java.lang.System.currentTimeMillis
import java.util

/**
 * Code rough, but serves its purpose.
 */
object SBenchmarkMutable3 extends App {
  val NumElements = 3000000
  val RandStringLength = 10
  val OuterIterations = 40
  val WarmUpIterations = 30

  private var strings: Array[String] = _
  private var map: mutable.Map[String, String] = _
  private var jmap: util.HashMap[String, String] = _

  loadStrings

  println("collection.mutable.HashMap")
  for (i <- 1 to OuterIterations) {

    val mutInsert = benchInsertion(false, false, false)
    val mutLookup = benchLookup

    Thread.sleep(1000)

    if (i > WarmUpIterations)
      println(s"$mutInsert,$mutLookup")
  }

  println("Java HashMap")
  for (i <- 1 to OuterIterations) {

    val jInsert = benchInsertion(true, false)
    val jLookup = benchLookup

    Thread.sleep(1000)
    if (i > WarmUpIterations)
      println(s"$jInsert,$jLookup")
  }

  println("collection.mutable.AnyRefMap:")
  for (i <- 1 to OuterIterations) {

    val anyRefInsert = benchInsertion(false, false, true)
    val anyRefLookup = benchLookup

    Thread.sleep(1000)

    if (i > WarmUpIterations)
      println(s"$anyRefInsert,$anyRefLookup")
  }

  def benchInsertion(useJavaConversion: Boolean, recycle: Boolean, useAnyRefMap: Boolean = false): Long = {
    insertion(false, useJavaConversion, recycle, useAnyRefMap)
  }

  def mapPresize = (RandStringLength / 0.74).toInt

  def benchLookup = {
    val timing = lookup(false)
    map = null
    jmap = null
    timing
  }

  def insertion(warmup: Boolean, useJava: Boolean, recycle: Boolean, useAnyRefMap: Boolean): Long = {
    val start = currentTimeMillis
    if (recycle) {
      if (map != null) {
        map.clear()
      } else {
        jmap.clear()
      }
    }
    else {
      if (useJava) {
        jmap = new util.HashMap[String, String](mapPresize)
      }
      else {
        if (useAnyRefMap) {
          map = new mutable.AnyRefMap[String, String](mapPresize)
        } else {
          map = new mutable.HashMap[String, String]()
        }
      }
    }

    if (jmap == null) {
      var i = 0
      while (i < strings.size) {
        val s = strings(i)
        map.put(s, s)
        i += 1
      }
    } else {
      var i = 0
      while (i < strings.size) {
        val s = strings(i)
        jmap.put(s, s)
        i += 1
      }
    }
    currentTimeMillis - start
  }

  def lookup(warmup: Boolean) = {
    var empty = false
    val start = currentTimeMillis

    var j = 0
    while (j < 3) {
      if (jmap == null) {
        var i = 0
        while (i < strings.length) {
          empty = map(strings(i)).length != 0
          i += 1
        }
      } else {
        var i = 0
        while (i < strings.length) {
          empty = jmap.get(strings(i)).length != 0
          i += 1
        }
      }
      j += 1
    }
    currentTimeMillis - start
  }

  // Generate a bunch of random strings
  def loadStrings {
    val lines = new ArrayBuffer[String]
    val random = new scala.util.Random()

    var i = 0
    while (i < NumElements) {
      lines += random.nextString(RandStringLength)
      i += 1
    }
    strings = lines.toArray
  }
}

Many new Scala books being published

I haven’t updated my blog for a few months now, mostly because I’ve been too busy writing Scala code! The number of high quality Scala books being published has continued to swell since the last time I blogged about this topic, which bodes well for the Scala community. I’ve updated the list of Scala books to include ten new books: Scala Cookbook, Testing in Scala, Learning Scala, Atomic Scala, Akka in Action, Effective Akka, Akka Essentials, Lift Cookbook, Instant Lift Web Applications How-to, and Functional Programming Patterns in Scala and Clojure. The Scala Cookbook form O’Reilly is particularly good, a favourite of mine alongside Scala for the Impatient and Odersky & Co.’s Programming in Scala. As an aside be sure to follow the Scala subreddit: http://www.reddit.com/r/scala as traffic is starting to pick up and it is a great source of Scala news and discussion.

This brings the list of books published on Scala to at least 24 (roughly in order of preference):

  1. Scala for the Impatient perhaps the best book for learning Scala, many of the chapters available as a free sample from TypeSafe.
  2. Scala Cookbook An excellent resource covering a wide range of topics in Scala. Doubles as an excellent introduction to the language too.
  3. Programming in Scala Detailed coverage by Martin Odersky himself and Co-authors
  4. Learning Scala Available for pre-order.
  5. Scala in Action
  6. Functional Programming in Scala
  7. Testing in Scala Unit testing of Scala code, tools, libraries etc.
  8. Akka in Action detailed coverage of Akka with examples.
  9. Atomic Scala, first 100 pages available as a free PDF
  10. Play for Scala (covers version 2)
  11. Effective Akka A shorter book discussing best practices when using Akka
  12. Akka Essentials
  13. Functional Programming Patterns in Scala and Clojure
  14. Scala in Depth
  15. Programming Scala Scalability = Functional Programming + Objects (Animal Guide)
  16. Programming Scala Tackle Multi-Core Complexity on the Java Virtual Machine
  17. Actors in Scala
  18. Lift in Action: The Simply Functional Web Framework for Scala
  19. Lift Cookbook
  20. Instant Lift Web Applications How-to
  21. The Definitive Guide to Lift: A Scala-based Web Framework
  22. Introduction to the Art of Programming Using Scala (Textbook)
  23. Steps in Scala: An Introduction to Object-Functional Programming
  24. Beginning Scala

Not strictly a book, but there’s also some good documentation of Akka (including an ebook) on the Akka website: http://akka.io/docs/

If I’ve forgotten to include any new books on Scala be sure to comment about it in the comments section. Also, what is your favourite Scala book and why?

Downloads of Scala plugin for IntelliJ rocket upwards

JetBrains’ IntelliJ IDE (http://www.jetbrains.com/) is widely regarded as the best dev. environment for Scala. I’ve been very happy with the combination of the Apache-licensed free version of IntelliJ + the Scala plugin. The plugin download centre of IntelliJ provides cumulative download numbers for their various plugins. Over the last year I’ve noticed a very interesting trend. Between mid January and mid March of 2012 there were approximately 6,000 new downloads of the Scala plugin. The rate at which the cumulative download count is growing has rocketed up dramatically over the last several months. Between mid August and mid October the Scala plugin was downloaded 90,000 times. This increase represents a 15x increase in the download rate vs the same period just 7 months earlier. With the total number of downloads to date sitting at over 318,000 the Scala plugin is now the most popular plugin for IntelliJ by a very wide margin. 

I’ve plotted some of the data I gather manually from their update centre (note that April is missing as I didn’t take any samples during that month). The dramatic increase in the download rate is very clear:

Image

Turbocharge your Scala code by using scala.collection.JavaConversions

Update (2014): Scala 2.11 is set to include a very fast Hashtable implementation (AnyRefMap), see: AnyRefMap

Update (2014), part 2: For more current map benchmark data rather refer to: The updated benchmark

As noted in a previous blog post, Scala’s mutable map class is currently significantly less efficient than the corresponding HashMap class that ships with the JDK. The good news is that Scala makes it really easy to transparently use Java collections as if they were Scala collections, supporting all of the bells and whistles of the Scala collections framework.

import scala.collection.JavaConversions.mapAsScalaMap
var map: mutable.Map[String, String] = new java.util.HashMap[String, String]
map("MyKey") = "MyValue"
map += "blah" -> "blah"

I decided to benchmark the various map implementations in Scala and Java (JDK 7u5, Scala 2.10 Milestone 5). The results are rather interesting (but consistent with previous observations in real world usage). It was a pretty straightforward task. One million strings were inserted into the map/hash to measure insertion time, these strings were then each looked up in the map. The times reported are for steady state after repeated runs (to factor out any JIT effects).

What I found interesting was the extent to which using java.util.HashMap via the javaconversions package gives a major performance boost over using the default Scala Map collections. The performance gain was between 2-3x when switching from mutable.Map to java.util.HashMap for both lookups and insertions. As is to be expected using immutable.Map was even slower than mutable.Map. Given the fact that using j.u.HashMap results in a serious performance boost over the default mutable.Map one wonders why the Scala guys don’t simply use java.util.HashMap under to hood as the default mutable map (or come up with a more efficient alternative).

Ruby (1.8.7 and 1.9.3) were also benchmarked to provide a reference point as a language with rather pedestrian performance, to put the relative performance differences in perspective. Not surprisingly using java.util.HashMap from Scala using implicit javaconversions is 30-40x faster than Ruby.

Image

As requested by Andriy in the comments I’ve put the source below. Keep in mind that the code will probably need some editing as I’ve used comments to switch between various collection types. The code is designed to run the benchmark rather than targeting elegance etc. I could neaten it up a bit when I get the time, but it should be sufficient to give an idea of that tests I ran. Everything was run with default VM settings. Only the -server switch was added. I can’t find the link for the dictionary of words I used, but pretty much any 100k unique words from any dictionary will do (feel free to substitute a text file with random strings).

Java code:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.*;

import static java.lang.System.currentTimeMillis;
import static java.lang.System.out;

/**
 */
public class JBenchmark {

    private final int WARMUP_ITERATIONS = 100;
    private String[] strings;
    private Map<String, String> map;

    public static void main(String[] args) {
        JBenchmark bench = new JBenchmark();
        bench.loadStrings();
        out.println("bench HashMap");
        bench.benchInsertion(false, false);
        bench.benchLookup();
        out.println("bench HashMap, recycle map");//to avoid rehashing
        bench.benchInsertion(false, true);
        bench.benchLookup();
        out.println("bench TreeMap");//to avoid rehashing
        bench.benchInsertion(true, false);
        bench.benchLookup();
        out.println("bench TreeMap, recycling map");//to avoid rehashing
        bench.benchInsertion(true, true);
        bench.benchLookup();
        
    }

    private void benchInsertion(boolean useTree, boolean recycle) {
        //warmup
        for (int i = 0; i < WARMUP_ITERATIONS; i++) {
            try {
                insertion(true, useTree, recycle);
                Thread.sleep(50);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        insertion(false, useTree, recycle);
    }

    private void benchLookup() {
        //warmup
        for (int i = 0; i < WARMUP_ITERATIONS; i++) {
            try {
                lookup(true);
                Thread.sleep(50);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        lookup(false);
    }

    private void insertion(boolean warmup, boolean useTree, boolean recycle) {
        long start = currentTimeMillis();
        if (recycle) {
            if (map != null) {
                map.clear();
            }
        } else {
            if (useTree) {
                map = new TreeMap<>();
            } else {
                map = new HashMap<>();
            }
        }
        for (String s : strings) {
            map.put(s, s);
        }
        if (!warmup) {
            out.println("insertion millis: " + (currentTimeMillis() - start));
        }
    }

    private void lookup(boolean warmup) {
        boolean empty = false;
        long start = currentTimeMillis();
        for (int i = 0; i < strings.length; i++) {
            empty = map.get(strings[i]).length() != 0;
        }
        if (!warmup) {
            out.println("lookup millis: " + (currentTimeMillis() - start));
            out.println(empty);  //just in case the compiler tries to optimize away all the work
        }
    }

    //load dictionary from disk
    private void loadStrings() {
        List<String> lines = new ArrayList<>();
        try (Scanner sc = new Scanner(new File("pathToDictionaryWith100kWords"))) {
            while (sc.hasNext()) {
                lines.add(sc.nextLine());
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
        for (int i = 0; i < 900000; i++) {
            lines.add(i + "");
        }
        strings = lines.toArray(new String[0]);
    }

}

Scala (may require some editing)

import collection.{immutable, mutable}
import io.Source
import collection.mutable._
import java.lang.System.currentTimeMillis
import java.util
import scala.collection.JavaConversions.mapAsScalaMap

/**
 * Code is pretty rough, but serves its purpose. May require some editing
 */
object SBenchmarkMutable extends App {
  val WarmUpIterations = 100

  loadStrings
  println("bench Map")
  benchInsertion(false, false)
  benchLookup
  
  println("bench Map, recycling")
  benchInsertion(false, true)
  benchLookup
  
  println("bench java Map")
  benchInsertion(true, false)
  benchLookup
  
  println("bench java Map, recycling")
  benchInsertion(true, true)
  benchLookup
  

  def benchInsertion(useJavaConversion: Boolean, recycle: Boolean) {
    var i = 0
    while (i < WarmUpIterations) {
      insertion(true, useJavaConversion, recycle)
      Thread.sleep(50)
      i += 1
    }
    insertion(false, useJavaConversion, recycle)
  }

  def benchLookup {
    var i = 0
    while (i < WarmUpIterations) {
      lookup(true)
      Thread.sleep(50)
      i += 1
    }
    lookup(false)
  }

  def insertion(warmup: Boolean, useJavaConversion: Boolean, recycle: Boolean) {
    val start = currentTimeMillis
    if (recycle) {
      if (map != null) {
         map.clear
      }
    }
    else {
      if (useJavaConversion) {
        map = new util.HashMap[String, String]()
      }
      else {
                map = mutable.Map()
//                map = immutable.Map()
      }
    }
    var i = 0
    //    while (i < strings.size) {
    //      val s = strings(i)
    for (s <- strings) {
      map += s -> s //map(s) = s
      //      i += 1
    }
    //    }
    if (!warmup) {
      println("insertion millis: " + (currentTimeMillis - start))
    }
  }

  def lookup(warmup: Boolean) {
    var empty = false
    val start = currentTimeMillis
    //    var i = 0
        for (s <- strings) {
          empty = map(s).length != 0
        }
//    var i = 0
//    while (i < strings.length) {
//      empty = map(strings(i)).length != 0
//      i += 1
//    }
    if (!warmup) {
      println("lookup millis: " + (currentTimeMillis - start))
    }
  }

  def loadStrings {
    val lines = new ArrayBuffer[String]
    Source.fromFile("pathTo100kWords").getLines.foreach{lines += _}

    var i = 0
    while (i < 900000) {
      lines += (i + "")
      i += 1
    }
    strings = lines.toArray
  }

    private var strings: Array[String] = _
//  private var map: immutable.Map[String, String] = _
    private var map: mutable.Map[String, String] = _
}

Scala books being published at quite a rate

Since it emerged that Twitter was using Scala the number of books on the subject has risen dramatically. Today there are no fewer than 14 books either published or about to be published on the topic of Scala. I’ve previously blogged about Scala books, but since then I’ve found that the list I provided was incomplete and that new Scala books are being published at quite a rate, so I missed some.

The following list of 14 Scala books [update:added another book] should be exhaustive, but if you know of any Scala books that I left out feel free to add a note in the comments or discuss which is your favourite.

  1. Scala for the Impatient perhaps the best book for learning Scala
  2. Programming in Scala Detailed coverage by Martin Odersky himself
  3. Play for Scala (covers version 2)
  4. Scala in Depth
  5. Programming Scala Scalability = Functional Programming + Objects (Animal Guide)
  6. Programming Scala Tackle Multi-Core Complexity on the Java Virtual Machine
  7. Actors in Scala
  8. Scala in Action
  9. Lift in Action: The Simply Functional Web Framework for Scala
  10. The Definitive Guide to Lift: A Scala-based Web Framework
  11. Introduction to the Art of Programming Using Scala (Textbook)
  12. Functional Programming in Scala
  13. Steps in Scala: An Introduction to Object-Functional Programming
  14. Beginning Scala

I didn’t count this one, but note that there’s also a book on Play (2) for Java

Real world feedback from a Java dev using Scala

It’s no secret that a large proportion of Scala programmers originate from the Java developer community. I’ve been writing Java code since the late 90’s and have recently starting doing some dev work in Scala. Since I fall squarely within the target audience for Scala I’ve decided to share my experiences and impressions using Scala from a Java programmer’s perspective. Overall I’m very impressed by Scala and plan to use it more, but do have plenty of constructive criticism which I provide at the end of this blog post.

The first thing that struck me when reading up on Scala was just how many times I thought “that’s a really good way of doing things, it’s much cleaner than the way Java does it”. I’d been reading up on the language for a few months and experimenting with code, but I’ve now used Scala for actual dev work and I’ve learned a lot. The standard approach for learning to code in Scala is to start using it as “java without semicolons“. This was how I started a few months ago and I’ve gradually learned more and written more idiomatic Scala. The result of writing code the Scala way is code which is much cleaner (and more compact). At first it seems like you’re simply saving yourself a bit of typing over Java and not asking the IDE to generate code all the time. That doesn’t seem like much, but as I learned and coded more Scala I began to realise just how much easier it was to express ideas. Everything in Scala just seems to fit together better. Strings, Arrays, Lists, StringBuilders etc. all “feel” the same with common syntax and methods for common tasks such as accessing values, and adding items. Methods like groupBy, filter and map make common tasks such as processing a list of items surprisingly simple.

Overall I was very happy my decision to use Scala. The code was clear, it ran as fast as Java code and simply put the best word to describe Scala is elegant. For the first time in about many years I find myself wanting to switch my primary dev language. In terms of tooling and documentation I strongly recommend the IntelliJ Scala plugin (intelliJ has an open source community edition of their IDE these days and the plugin is free $0). In terms of books to learn Scala I found Scala for the Impatient (published a month or two ago) to be by far the best introduction to the language.

I was working on a Java codebase after the Scala work, and it struck me how dissatisfied working with Scala had made me with Java the language. It was like taking a giant leap backwards after being spoiled by the elegance of Scala.

As stated in the introduction I’m very pleased with Scala, but do have some constructive criticism to provide. It’s quite a long list, so before anyone tries to twist this as a “anti-Scala” post let me make it clear that if I were to come up with a similar list of issues for the other languages I use that list would be significantly longer. Here goes:

[Update: check Simon's response in the comments. Very informative!]

  1. The syntax for providing package level access to a method (a very common need when writing unit tests) is rather nasty: private[mypackagename] def doStuff = {…} Even the Java equivalent of void doStuff() looks better.
  2. Some Scala standard library classes are slower than their Java counterparts e.g. HashMap and StringBuilder. Scala’s mutable Map doesn’t seem to provide a constructor that lets me specify initial capacity (so I’m forced to pay the cost of rehashing). This is easily worked around by using Java collections together with the implicit conversion classes provided by Scala (which work transparently and fantastically well)
  3. Scala API documentation seems very terse and unfriendly. It’s not uncommon for a method to come with virtually no understandable information on what it actually does or a nice example. I very frequently found myself on StackOverflow to figure out how to use classes. [update: to fast-track better API docs perhaps providing a wiki-fied version would enable more community contributions]. Due to alphabetical sorting of method names overloaded operators are always at the top of the method listings. This makes the docs for a class very intimidating for beginners. Just move the operator docs to the end of the listing so that we’re not faced with the same storm of punctuation at the top the table each time.
  4. Compiling takes a lot longer than Java. For a small project I would estimate (roughly) around a 5x difference. I’m willing to put up with a slower compiler for a better language, but would appreciate it if it were much faster.
  5. There are some inconsistencies regarding which collections support the readOnly method to provide the Scala equivalent of java.util.Collections.unmodifiableList(). Do I really need to call toMap to make my mutable.Map an immutable one? I should probably check the source to see if the toMap call is expensive or not.
  6. The various collections interfaces were a source of confusion. In Java the common class to use is Collection. In Scala I’m not so sure there’s Iterable, Traversable, GenTraversable, TraversableLike, GenTraversableLike, TraversableOnce, GenTraversableOnce, Iterator etc. There seems to be an endless list of options. If I invested a day reading up on the differences I could probably figure it out. For now Traversable or Iterable seem to be reasonable choices for a java.util.Collection equivalent. [update: it seems the non-intuitively named GenTraversable is the preferred choice]
  7. Looking at my profiling data implicit conversions do appear to generate a small amount of overhead (apparently this is to be fixed in 2.10)
  8. for loops seem to introduce more overhead when iterating compared to while loops or Java for loops. There was some talk of optimizing this in 2.10