What is the most efficient Java Collections library? [closed]

The question is (now) about storing lots of data, which can be represented using primitive types like int, in a Map. Some of the answers here are very misleading in my opinion. Let’s see why.

I modified the benchmark from trove to measure both runtime and memory consumption. I also added PCJ to this benchmark, which is another collections library for primitive types (I use that one extensively). The ‘official’ trove benchmark does not compare IntIntMaps to Java Collection’s Map<Integer, Integer>, probably storing Integers and storing ints is not the same from a technical point of view. But a user might not care about this technical detail, he wants to store data representable with ints efficiently.

First the relevant part of the code:

new Operation() {

     private long usedMem() {
        System.gc();
        return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
     }

     // trove
     public void ours() {
        long mem = usedMem();
        TIntIntHashMap ours = new TIntIntHashMap(SET_SIZE);
        for ( int i = dataset.size(); i-- > 0; ) {
           ours.put(i, i);
        }
        mem = usedMem() - mem;
        System.err.println("trove " + mem + " bytes");
        ours.clear();
     }

     public void pcj() {
        long mem = usedMem();
        IntKeyIntMap map = new IntKeyIntOpenHashMap(SET_SIZE);
        for ( int i = dataset.size(); i-- > 0; ) {
           map.put(i, i);
        }
        mem = usedMem() - mem;
        System.err.println("pcj " + mem + " bytes");
        map.clear();
     }

     // java collections
     public void theirs() {
        long mem = usedMem();
        Map<Integer, Integer> map = new HashMap<Integer, Integer>(SET_SIZE);
        for ( int i = dataset.size(); i-- > 0; ) {
           map.put(i, i);
        }
        mem = usedMem() - mem;
        System.err.println("java " + mem + " bytes");
        map.clear();
     }

I assume the data comes as primitive ints, which seems sane. But this implies a runtime penalty for java util, because of the auto-boxing, which is not neccessary for the primitive collections frameworks.

The runtime results (without gc() calls, of course) on WinXP, jdk1.6.0_10:

                      100000 put operations      100000 contains operations 
java collections             1938 ms                        203 ms
trove                         234 ms                        125 ms
pcj                           516 ms                         94 ms

While this might already seem drastic, this is not the reason to use such a framework.

The reason is memory performance. The results for a Map containing 100000 int entries:

java collections        oscillates between 6644536 and 7168840 bytes
trove                                      1853296 bytes
pcj                                        1866112 bytes

Java Collections needs more than three times the memory compared to the primitive collection frameworks. I.e. you can keep three times as much data in memory, without resorting to disk IO which lowers runtime performance by magnitudes. And this matters. Read highscalability to find out why.

In my experience high memory consumption is the biggest performance issue with Java, which of course results in worse runtime performance as well. Primitive collection frameworks can really help here.

So: No, java.util is not the answer. And “adding functionality” to Java collections is not the point when asking about efficiency. Also the modern JDK collections do not “out-perform even the specialized Trove collections”.

Disclaimer: The benchmark here is far from complete, nor is it perfect. It is meant to drive home the point, which I have experienced in many projects. Primitive collections are useful enough to tolerate fishy API – if you work with lots of data.

Leave a Comment