Size of Initialisation string in java

The length of a String literal (i.e. "...") is limited by the class file format’s CONSTANT_Utf8_info structure, which is referred by the CONSTANT_String_info structure.

CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1 bytes[length];
}

The limiting factor here is the length attribute, which only is 2 bytes large, i.e. has a maximum value of 65535.
This number corresponds to the number of bytes in a modified UTF-8 representation of the string (this is actually almost CESU-8, but the 0 character is also represented in a two-byte form).

So, a pure ASCII string literal can have up to 65535 characters, while a string consisting of characters in the range U+0800 …U+FFFF have only one third of these. And the ones encoded as surrogate pairs in UTF-16 (i.e. U+10000 to U+10FFFF) take up 6 bytes each (real UTF-8 would take 5 here).

(The same limit is there for identifiers, i.e. class, method and variable names, and type descriptors for these, since they use the same structure.)

The Java Language Specification does not mention any limit for string literals:

A string literal consists of zero or more characters enclosed in double quotes.

So in principle a compiler could split a longer string literal into more than one CONSTANT_String_info structure and reconstruct it on runtime by concatenation (and .intern()-ing the result). I have no idea if any compiler is actually doing this.


It shows that the problem does not relate to string literals, but to array initializers.

When passing an object to BMethod.invoke (and similarly to BConstructor.newInstance), it can either be a BObject (i.e. a wrapper around an existing object, it will then pass the wrapped object), a String (which will be passed as is), or anything else. In the last case, the object will be converted to a string (by toString()), and this string then interpreted as a Java expression.

To do this, BlueJ will wrap this expression in a class/method and compile this method. In the method, the array initializer is simply converted to a long list of array assignments … and this finally makes the method longer than the maximum bytecode size of a Java method:

The value of the code_length item must be less than 65536.

This is why it breaks for longer arrays.


So, to pass larger arrays, we have to find some other way to pass them to BMethod.invoke. The BlueJ extension API has no way to create or access arrays wrapped in a BObject.

One idea we found in chat is this:

  1. Create a new class inside the project (or in a new project, if they can interoperate), something like this:

     public class IntArrayBuilder {
         private ArrayList<Integer> list;
         public void addElement(int el) {
             list.add(el);
         }
         public int[] makeArray() {
             int[] array = new int[list.size()];
             for(int i = 0; i < array.length; i++) {
                array[i] = list.get(i);
             }
             return array;
         }
     }
    

    (This is for the case of creating an int[] – if you need other types of array, too, it can
    also be made more generic. Also, it could be made more efficient by using an
    internal int[] as storage, enlarging it sporadically as it grows, and int makeArray
    doing a final arraycopy. This is a sketch, thus this is the simplest implementation.)

  2. From our extension, create an object of this class ,
    and add elements to this object by calling its .addElement method.

     BObject arrayToBArray(int[] a) {
         BClass builderClass = package.getClass("IntArrayBuilder");
         BObject builder = builderClass.getConstructor(new Class<?>[0]).newInstance(new Object[0]);
         BMethod addMethod = builderClass.getMethod("addElement", new Class<?>[]{int.class});
         for(int e : a) {
             addMethod.invoke(builder, new Object[]{ e });
         }
         BMethod makeMethod = builderClass.getMethod("addElement", new Class<?>[0]);
         BObject bArray = (BObject)makeMethod.invoke(builder, new Object[0]);
         return bArray;
     }
    

    (For efficiency, the BClass/BMethod objects could actually be retrieved once and cached instead of once for each array conversion.)
    If you generate the arrays contents by some algorithm, you can do this generation here instead of first creating another wrapping object.

  3. In our extension, call the method we actually want to call with the long array, passing our wrapped array:

     Object result = method.invoke(obj, new Object[] { bArray });
    

Leave a Comment