Does the C++ standard mandate poor performance for iostreams, or am I just dealing with a poor implementation?

Not answering the specifics of your question so much as the title: the 2006 Technical Report on C++ Performance has an interesting section on IOStreams (p.68). Most relevant to your question is in Section 6.1.2 (“Execution Speed”):

Since certain aspects of IOStreams processing are
distributed over multiple facets, it
appears that the Standard mandates an
inefficient implementation. But this
is not the case — by using some form
of preprocessing, much of the work can
be avoided. With a slightly smarter
linker than is typically used, it is
possible to remove some of these
inefficiencies. This is discussed in
§6.2.3 and §6.2.5.

Since the report was written in 2006 one would hope that many of the recommendations would have been incorporated into current compilers, but perhaps this is not the case.

As you mention, facets may not feature in write() (but I wouldn’t assume that blindly). So what does feature? Running GProf on your ostringstream code compiled with GCC gives the following breakdown:

  • 44.23% in std::basic_streambuf<char>::xsputn(char const*, int)
  • 34.62% in std::ostream::write(char const*, int)
  • 12.50% in main
  • 6.73% in std::ostream::sentry::sentry(std::ostream&)
  • 0.96% in std::string::_M_replace_safe(unsigned int, unsigned int, char const*, unsigned int)
  • 0.96% in std::basic_ostringstream<char>::basic_ostringstream(std::_Ios_Openmode)
  • 0.00% in std::fpos<int>::fpos(long long)

So the bulk of the time is spent in xsputn, which eventually calls std::copy() after lots of checking and updating of cursor positions and buffers (have a look in c++\bits\streambuf.tcc for the details).

My take on this is that you’ve focused on the worst-case situation. All the checking that is performed would be a small fraction of the total work done if you were dealing with reasonably large chunks of data. But your code is shifting data in four bytes at a time, and incurring all the extra costs each time. Clearly one would avoid doing so in a real-life situation – consider how negligible the penalty would have been if write was called on an array of 1m ints instead of on 1m times on one int. And in a real-life situation one would really appreciate the important features of IOStreams, namely its memory-safe and type-safe design. Such benefits come at a price, and you’ve written a test which makes these costs dominate the execution time.

Leave a Comment