Examples of when a bitwise swap() is a bad idea?

This is not specifically about swap but an example showing that low level optimizations are maybe not worth the trouble. The compiler often figures it out anyway.

Of course, this is my favorite example where the compiler is exceptionally lucky, but anyway we shouldn’t assume that compilers are stupid and that we can easily improve on the generated code with some simple tricks.

My test code is – construct a std::string and copy it.

std::string whatever = "abcdefgh";
std::string whatever2 = whatever;

The first constructor looks like this

  basic_string(const value_type* _String,
               const allocator_type& _Allocator = allocator_type() ) : _Parent(_Allocator)
  {
     const size_type _StringSize = traits_type::length(_String);

     if (_MySmallStringCapacity < _StringSize)
     {
        _AllocateAndCopy(_String, _StringSize);
     }
     else
     {
        traits_type::copy(_MySmallString._Buffer, _String, _StringSize);

        _SetSmallStringCapacity();
        _SetSize(_StringSize);
     }
  }

The generated code is

   std::string whatever = "abcdefgh";
000000013FCC30C3  mov         rdx,qword ptr [string "abcdefgh" (13FD07498h)]  
000000013FCC30CA  mov         qword ptr [whatever],rdx  
000000013FCC30D2  mov         byte ptr [rsp+347h],0  
000000013FCC30DA  mov         qword ptr [rsp+348h],8  
000000013FCC30E6  mov         byte ptr [rsp+338h],0  

Here traits_type::copycontains a call to memcpy, which is optimized into a single register copy of the whole string (carefully selected to fit). The compiler also transforms a call to strlen into a compile time 8.

Then we copy it into a new string. The copy constructor looks like this

  basic_string(const basic_string& _String)
     : _Parent(std::allocator_traits<allocator_type>::select_on_container_copy_construction(_String._MyAllocator))
  {
     if (_MySmallStringCapacity < _String.size())
     {
        _AllocateAndCopy(_String);
     }
     else
     {
        traits_type::copy(_MySmallString._Buffer, _String.data(), _String.size());

        _SetSmallStringCapacity();
        _SetSize(_String.size());
     }
  }

and results in just 4 machine instructions:

   std::string whatever2 = whatever;
000000013FCC30EE  mov         qword ptr [whatever2],rdx  
000000013FCC30F6  mov         byte ptr [rsp+6CFh],0  
000000013FCC30FE  mov         qword ptr [rsp+6D0h],8  
000000013FCC310A  mov         byte ptr [rsp+6C0h],0  

Note that the optimizer remembers that the char‘s are still in register rdx and that the string length must be the same, 8.

It is after seeing things like this that I like to trust my compiler, and avoid trying to improve code with bit fiddling. It doesn’t help, unless profiling finds an unexpected bottleneck.

(featuring MSVC 10 and my std::string implementation)

Leave a Comment