How does Java store Strings and how does substring work internally? [closed]

See the comments:

    String str = "abcd";  // new String LITERAL which is interned in the pool
    String str1 = new String("abcd"); // new String, not interned: str1 != str
    String str2 = str.substring(0,2); // new String which is a view on str
    String str3 = str.substring(0,2); // same: str3 != str2
    String str7 = str1.substring(0,str1.length()); // special case: str1 is returned

Notes:

  • Since Java 7u6, substring returns a new string instead of a view on the original string (but that does not make a difference for that example)
  • Special case when you call str1.substring(0,str1.length()); – see code:

    public String substring(int beginIndex, int endIndex) {
        //some exception checking then
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }
    

EDIT

What is a view?

Until Java 7u6, a String is basically a char[] that contains the characters of the string with an offset and a count (i.e. the string is composed of count characters starting from the offset position in the char[]).

When calling substring, a new string is created with the same char[] but a different offset / count, to effectively create a view on the original string. (Except when count = length and offset = 0 as explained above).

Since java 7u6, a new char[] is created every time, because there is no more count or offset field in the string class.

Where is the common pool stored exactly?

This is implementation specific. The location of the pool has actually moved in recent versions. In more recent versions, it is stored on the heap.

How is the pool managed?

Main characteristics:

  • String literals are stored in the pool
  • Interned strings are stored in the pool (new String("abc").intern();)
  • When a string S is interned (because it is a literal or because intern() is called), the JVM will return a reference to a string in the pool if there is one that is equals to S (hence "abc" == "abc" should always return true).
  • Strings in the pool can be garbage collected (meaning that an interned string might be removed from the pool at some stage if it becomes full)

Leave a Comment