How Java do the string concatenation using “+”?

If you combine literal strings (literally "foo" + "bar"), the compiler does it at compile-time, not at runtime.

If you have two non-literal strings and join them with +, the compiler (Sun’s, anyway) will use a StringBuilder under the covers, but not necessarily in the most efficient way. So for instance, if you have this:

String repeat(String a, int count) {
    String rv;

    if (count <= 0) {
        return "";
    }

    rv = a;
    while (--count > 0) {
        rv += a;
    }
    return rv;
}

…what the Sun compiler will actually produce as bytecode looks something like this:

String repeat(String a, int count) {
    String rv;

    if (count <= 0) {
        return "";
    }

    rv = a;
    while (--count > 0) {
        rv = new StringBuilder().append(rv).append(a).toString();
    }
    return rv;
}

(Yes, really — see the disassembly at the end of this answer.) Note that it created a new StringBuilder on every iteration, and then converted the result to String. This is inefficient (but it doesn’t matter unless you’re doing it a lot) because of all of the temporary memory allocations: It allocates a StringBuilder and its buffer, quite possibly reallocates the buffer on the first append [if rv is more than 16 characters long, which is the default buffer size] and if not on the first then almost certainly on the second append, then allocates a String at the end — and then does it all again on the next iteration.

You could gain efficiency, if necessary, by rewriting it to explicitly use a StringBuilder:

String repeat(String a, int count) {
    StringBuilder rv;

    if (count <= 0) {
        return "";
    }

    rv = new StringBuilder(a.length() * count);
    while (count-- > 0) {
        rv.append(a);
    }
    return rv.toString();
}

There we’ve used an explicit StringBuilder and also set its initial buffer capacity to be large enough to hold the result. That’s more memory-efficient, but of course, marginally less clear to inexperienced code maintainers and marginally more of a pain to write. So if you find a performance issue with a tight string concat loop, this might be a way to address it.

You can see this under-the-covers StringBuilder in action with the following test class:

public class SBTest
{
    public static final void main(String[] params)
    {
        System.out.println(new SBTest().repeat("testing ", 4));
        System.exit(0);
    }

    String repeat(String a, int count) {
        String rv;

        if (count <= 0) {
            return "";
        }

        rv = a;
        while (--count > 0) {
            rv += a;
        }
        return rv;
    }
}

…which disassembles (using javap -c SBTest) like this:

Compiled from "SBTest.java"
public class SBTest extends java.lang.Object{
public SBTest();
Code:
   0: aload_0
   1: invokespecial  #1; //Method java/lang/Object."<init>":()V
   4: return

public static final void main(java.lang.String[]);
Code:
   0: getstatic   #2; //Field java/lang/System.out:Ljava/io/PrintStream;
   3: new   #3; //class SBTest
   6: dup
   7: invokespecial  #4; //Method "<init>":()V
   10: ldc   #5; //String testing
   12: iconst_4
   13: invokevirtual  #6; //Method repeat:(Ljava/lang/String;I)Ljava/lang/String;
   16: invokevirtual  #7; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   19: iconst_0
   20: invokestatic   #8; //Method java/lang/System.exit:(I)V
   23: return

java.lang.String repeat(java.lang.String, int);
Code:
   0: iload_2
   1: ifgt  7
   4: ldc   #9; //String
   6: areturn
   7: aload_1
   8: astore_3
   9: iinc  2, -1
   12: iload_2
   13: ifle  38
   16: new   #10; //class java/lang/StringBuilder
   19: dup
   20: invokespecial  #11; //Method java/lang/StringBuilder."<init>":()V
   23: aload_3
   24: invokevirtual  #12; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   27: aload_1
   28: invokevirtual  #12; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   31: invokevirtual  #13; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   34: astore_3
   35: goto  9
   38: aload_3
   39: areturn

}

Note how a new StringBuilder is created on each iteration of the loop and created using the default buffer capacity.

All of this temporary allocation stuff sounds ugly, but again, only if you’re dealing with substantial loops and/or substantial strings. Also, when the resulting bytecode is run, the JVM may well optimize it further. Sun’s HotSpot JVM, for instance, is a very mature JIT optimizing compiler. Once it’s identified the loop as a hot spot, it may well find a way to refactor it. Or not, of course. 🙂

My rule of thumb is I worry about it when I see a performance problem, or if I know I’m doing a lot of concatenation and it’s very likely to be a performance problem and the code won’t be significantly impacted from a maintainability standpoint if I use a StringBuilder instead. The rabid anti-premature-optimization league would probably disagree with me on the second of those. 🙂

Leave a Comment