Deep copy, shallow copy, clone

Unfortunately, “shallow copy”, “deep copy” and “clone” are all rather ill-defined terms.


In the Java context, we first need to make a distinction between “copying a value” and “copying an object”.

int a = 1;
int b = a;     // copying a value
int[] s = new int[]{42};
int[] t = s;   // copying a value (the object reference for the array above)

StringBuffer sb = new StringBuffer("Hi mom");
               // copying an object.
StringBuffer sb2 = new StringBuffer(sb);

In short, an assignment of a reference to a variable whose type is a reference type is “copying a value” where the value is the object reference. To copy an object, something needs to use new, either explicitly or under the hood.


Now for “shallow” versus “deep” copying of objects. Shallow copying generally means copying only one level of an object, while deep copying generally means copying more than one level. The problem is in deciding what we mean by a level. Consider this:

public class Example {
    public int foo;
    public int[] bar;
    public Example() { };
    public Example(int foo, int[] bar) { this.foo = foo; this.bar = bar; };
}

Example eg1 = new Example(1, new int[]{1, 2});
Example eg2 = ... 

The normal interpretation is that a “shallow” copy of eg1 would be a new Example object whose foo equals 1 and whose bar field refers to the same array as in the original; e.g.

Example eg2 = new Example(eg1.foo, eg1.bar);

The normal interpretation of a “deep” copy of eg1 would be a new Example object whose foo equals 1 and whose bar field refers to a copy of the original array; e.g.

Example eg2 = new Example(eg1.foo, Arrays.copy(eg1.bar));

(People coming from a C / C++ background might say that a reference assignment produces a shallow copy. However, that’s not what we normally mean by shallow copying in the Java context …)

Two more questions / areas of uncertainty exist:

  • How deep is deep? Does it stop at two levels? Three levels? Does it mean the whole graph of connected objects?

  • What about encapsulated data types; e.g. a String? A String is actually not just one object. In fact, it is an “object” with some scalar fields, and a reference to an array of characters. However, the array of characters is completely hidden by the API. So, when we talk about copying a String, does it make sense to call it a “shallow” copy or a “deep” copy? Or should we just call it a copy?


Finally, clone. Clone is a method that exists on all classes (and arrays) that is generally thought to produce a copy of the target object. However:

  • The specification of this method deliberately does not say whether this is a shallow or deep copy (assuming that is a meaningful distinction).

  • In fact, the specification does not even specifically state that clone produces a new object.

Here’s what the javadoc says:

“Creates and returns a copy of this object. The precise meaning of “copy” may depend on the class of the object. The general intent is that, for any object x, the expression x.clone() != x will be true, and that the expression x.clone().getClass() == x.getClass() will be true, but these are not absolute requirements. While it is typically the case that x.clone().equals(x) will be true, this is not an absolute requirement.”

Note, that this is saying that at one extreme the clone might be the target object, and at the other extreme the clone might not equal the original. And this assumes that clone is even supported.

In short, clone potentially means something different for every Java class.


Some people argue (as @supercat does in comments) that the Java clone() method is broken. But I think the correct conclusion is that the concept of clone is broken in the context of OO. AFAIK, it is impossible to develop a unified model of cloning that is consistent and usable across all object types.

Leave a Comment