Turbocharge your Scala code by using scala.collection.JavaConversions

Update (2014): Scala 2.11 is set to include a very fast Hashtable implementation (AnyRefMap), see: AnyRefMap

Update (2014), part 2: For more current map benchmark data rather refer to: The updated benchmark

As noted in a previous blog post, Scala’s mutable map class is currently significantly less efficient than the corresponding HashMap class that ships with the JDK. The good news is that Scala makes it really easy to transparently use Java collections as if they were Scala collections, supporting all of the bells and whistles of the Scala collections framework.

import scala.collection.JavaConversions.mapAsScalaMap
var map: mutable.Map[String, String] = new java.util.HashMap[String, String]
map("MyKey") = "MyValue"
map += "blah" -> "blah"

I decided to benchmark the various map implementations in Scala and Java (JDK 7u5, Scala 2.10 Milestone 5). The results are rather interesting (but consistent with previous observations in real world usage). It was a pretty straightforward task. One million strings were inserted into the map/hash to measure insertion time, these strings were then each looked up in the map. The times reported are for steady state after repeated runs (to factor out any JIT effects).

What I found interesting was the extent to which using java.util.HashMap via the javaconversions package gives a major performance boost over using the default Scala Map collections. The performance gain was between 2-3x when switching from mutable.Map to java.util.HashMap for both lookups and insertions. As is to be expected using immutable.Map was even slower than mutable.Map. Given the fact that using j.u.HashMap results in a serious performance boost over the default mutable.Map one wonders why the Scala guys don’t simply use java.util.HashMap under to hood as the default mutable map (or come up with a more efficient alternative).

Ruby (1.8.7 and 1.9.3) were also benchmarked to provide a reference point as a language with rather pedestrian performance, to put the relative performance differences in perspective. Not surprisingly using java.util.HashMap from Scala using implicit javaconversions is 30-40x faster than Ruby.

Image

As requested by Andriy in the comments I’ve put the source below. Keep in mind that the code will probably need some editing as I’ve used comments to switch between various collection types. The code is designed to run the benchmark rather than targeting elegance etc. I could neaten it up a bit when I get the time, but it should be sufficient to give an idea of that tests I ran. Everything was run with default VM settings. Only the -server switch was added. I can’t find the link for the dictionary of words I used, but pretty much any 100k unique words from any dictionary will do (feel free to substitute a text file with random strings).

Java code:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.*;

import static java.lang.System.currentTimeMillis;
import static java.lang.System.out;

/**
 */
public class JBenchmark {

    private final int WARMUP_ITERATIONS = 100;
    private String[] strings;
    private Map<String, String> map;

    public static void main(String[] args) {
        JBenchmark bench = new JBenchmark();
        bench.loadStrings();
        out.println("bench HashMap");
        bench.benchInsertion(false, false);
        bench.benchLookup();
        out.println("bench HashMap, recycle map");//to avoid rehashing
        bench.benchInsertion(false, true);
        bench.benchLookup();
        out.println("bench TreeMap");//to avoid rehashing
        bench.benchInsertion(true, false);
        bench.benchLookup();
        out.println("bench TreeMap, recycling map");//to avoid rehashing
        bench.benchInsertion(true, true);
        bench.benchLookup();
        
    }

    private void benchInsertion(boolean useTree, boolean recycle) {
        //warmup
        for (int i = 0; i < WARMUP_ITERATIONS; i++) {
            try {
                insertion(true, useTree, recycle);
                Thread.sleep(50);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        insertion(false, useTree, recycle);
    }

    private void benchLookup() {
        //warmup
        for (int i = 0; i < WARMUP_ITERATIONS; i++) {
            try {
                lookup(true);
                Thread.sleep(50);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        lookup(false);
    }

    private void insertion(boolean warmup, boolean useTree, boolean recycle) {
        long start = currentTimeMillis();
        if (recycle) {
            if (map != null) {
                map.clear();
            }
        } else {
            if (useTree) {
                map = new TreeMap<>();
            } else {
                map = new HashMap<>();
            }
        }
        for (String s : strings) {
            map.put(s, s);
        }
        if (!warmup) {
            out.println("insertion millis: " + (currentTimeMillis() - start));
        }
    }

    private void lookup(boolean warmup) {
        boolean empty = false;
        long start = currentTimeMillis();
        for (int i = 0; i < strings.length; i++) {
            empty = map.get(strings[i]).length() != 0;
        }
        if (!warmup) {
            out.println("lookup millis: " + (currentTimeMillis() - start));
            out.println(empty);  //just in case the compiler tries to optimize away all the work
        }
    }

    //load dictionary from disk
    private void loadStrings() {
        List<String> lines = new ArrayList<>();
        try (Scanner sc = new Scanner(new File("pathToDictionaryWith100kWords"))) {
            while (sc.hasNext()) {
                lines.add(sc.nextLine());
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
        for (int i = 0; i < 900000; i++) {
            lines.add(i + "");
        }
        strings = lines.toArray(new String[0]);
    }

}

Scala (may require some editing)

import collection.{immutable, mutable}
import io.Source
import collection.mutable._
import java.lang.System.currentTimeMillis
import java.util
import scala.collection.JavaConversions.mapAsScalaMap

/**
 * Code is pretty rough, but serves its purpose. May require some editing
 */
object SBenchmarkMutable extends App {
  val WarmUpIterations = 100

  loadStrings
  println("bench Map")
  benchInsertion(false, false)
  benchLookup
  
  println("bench Map, recycling")
  benchInsertion(false, true)
  benchLookup
  
  println("bench java Map")
  benchInsertion(true, false)
  benchLookup
  
  println("bench java Map, recycling")
  benchInsertion(true, true)
  benchLookup
  

  def benchInsertion(useJavaConversion: Boolean, recycle: Boolean) {
    var i = 0
    while (i < WarmUpIterations) {
      insertion(true, useJavaConversion, recycle)
      Thread.sleep(50)
      i += 1
    }
    insertion(false, useJavaConversion, recycle)
  }

  def benchLookup {
    var i = 0
    while (i < WarmUpIterations) {
      lookup(true)
      Thread.sleep(50)
      i += 1
    }
    lookup(false)
  }

  def insertion(warmup: Boolean, useJavaConversion: Boolean, recycle: Boolean) {
    val start = currentTimeMillis
    if (recycle) {
      if (map != null) {
         map.clear
      }
    }
    else {
      if (useJavaConversion) {
        map = new util.HashMap[String, String]()
      }
      else {
                map = mutable.Map()
//                map = immutable.Map()
      }
    }
    var i = 0
    //    while (i < strings.size) {
    //      val s = strings(i)
    for (s <- strings) {
      map += s -> s //map(s) = s
      //      i += 1
    }
    //    }
    if (!warmup) {
      println("insertion millis: " + (currentTimeMillis - start))
    }
  }

  def lookup(warmup: Boolean) {
    var empty = false
    val start = currentTimeMillis
    //    var i = 0
        for (s <- strings) {
          empty = map(s).length != 0
        }
//    var i = 0
//    while (i < strings.length) {
//      empty = map(strings(i)).length != 0
//      i += 1
//    }
    if (!warmup) {
      println("lookup millis: " + (currentTimeMillis - start))
    }
  }

  def loadStrings {
    val lines = new ArrayBuffer[String]
    Source.fromFile("pathTo100kWords").getLines.foreach{lines += _}

    var i = 0
    while (i < 900000) {
      lines += (i + "")
      i += 1
    }
    strings = lines.toArray
  }

    private var strings: Array[String] = _
//  private var map: immutable.Map[String, String] = _
    private var map: mutable.Map[String, String] = _
}