Compelling examples of custom C++ allocators?

As I mention here, I’ve seen Intel TBB’s custom STL allocator significantly improve performance of a multithreaded app simply by changing a single

std::vector<T>

std::vector<T,tbb::scalable_allocator<T> >

(this is a quick and convenient way of switching the allocator to use TBB’s nifty thread-private heaps; see page 7 in this document)