Why is string a reference type?

In addition to the reasons posted by Dan:

Value types are, by definition those types which store their values in themselves, rather than referring to a value somewhere else. That’s why value types are called “value types” and reference types are called “reference types”. So your question is really “why does a string refer to its contents rather than simply containing its contents?”

It’s because value types have the nice property that every instance of a given value type is of the same size in memory.

So what? Why is this a nice property? Well, suppose strings were value types that could be of any size and consider the following:

string[] mystrings = new string[3];

What are the initial contents of that array of three strings? There is no “null” for value types, so the only sensible thing to do is to create an array of three empty strings. How would that be laid out in memory? Think about that for a bit. How would you do it?

Now suppose you say

string[] mystrings = new string[3];
mystrings[1] = "hello";

Now we have “”, “hello” and “” in the array. Where in memory does the “hello” go? How large is the slot that was allocated for mystrings[1] anyway? The memory for the array and its elements has to go somewhere.

This leaves the CLR with the following choices:

  • resize the array every time you change one of its elements, copying the entire thing, which could be megabytes in size
  • disallow creating arrays of value types of unknown size
  • disallow creating value types of unknown size

The CLR team chose the latter one. Making strings into reference types means that you can create arrays of them efficiently.

Leave a Comment