How is std::string implemented?

Virtually every compiler I’ve used provides source code for the runtime – so whether you’re using GCC or MSVC or whatever, you have the capability to look at the implementation. However, a large part or all of std::string will be implemented as template code, which can make for very difficult reading.

Scott Meyer’s book, Effective STL, has a chapter on std::string implementations that’s a decent overview of the common variations: “Item 15: Be aware of variations in string implementations”.

He talks about 4 variations:

  • several variations on a ref-counted implementation (commonly known as copy on write) – when a string object is copied unchanged, the refcount is incremented but the actual string data is not. Both object point to the same refcounted data until one of the objects modifies it, causing a ‘copy on write’ of the data. The variations are in where things like the refcount, locks etc are stored.

  • a “short string optimization” (SSO) implementation. In this variant, the object contains the usual pointer to data, length, size of the dynamically allocated buffer, etc. But if the string is short enough, it will use that area to hold the string instead of dynamically allocating a buffer

Also, Herb Sutter’s “More Exceptional C++” has an appendix (Appendix A: “Optimizations that aren’t (in a Multithreaded World)”) that discusses why copy on write refcounted implementations often have performance problems in multithreaded applications due to synchronization issues. That article is also available online (but I’m not sure if it’s exactly the same as what’s in the book):

Both those chapters would be worthwhile reading.

Leave a Comment