What data structure is inside std::map in C++?

Step debug into g++ 6.4 stdlibc++ source

Did you know that on Ubuntu’s 16.04 default g++-6 package or a GCC 6.4 build from source you can step into the C++ library without any further setup?

By doing that we easily conclude that a Red-black tree used in this implementation.

This makes sense, since std::map, unlike std::unordered_map, can be traversed in key order, which would not be efficient in if a hash map were used.

main.cpp

#include <cassert>
#include <map>

int main() {
    std::map<int, int> m;
    m.emplace(1, -1);
    m.emplace(2, -2);
    assert(m[1] == -1);
    assert(m[2] == -2);
}

Compile and debug:

g++ -g -std=c++11 -O0 -o main.out main.cpp
gdb -ex 'start' -q --args main.out

Now, if you step into s.emplace(1, -1) you immediately reach /usr/include/c++/6/bits/stl_map.h:

556       template<typename... _Args>
557     std::pair<iterator, bool>
558     emplace(_Args&&... __args)
559     { return _M_t._M_emplace_unique(std::forward<_Args>(__args)...); }

which clearly just forwards to _M_t._M_emplace_unique.

So we open the source file in vim and find the definition of _M_t:

    typedef _Rb_tree<key_type, value_type, _Select1st<value_type>,
            key_compare, _Pair_alloc_type> _Rep_type;

    /// The actual tree structure.
    _Rep_type _M_t;

So _M_t is of type _Rep_type and _Rep_type is a _Rb_tree.

OK, now that is enough evidence for me. If you don’t believe that _Rb_tree is a Black-red tree, step a bit further and read the algorithm

unordered_map uses hash table

Same procedure, but replace map with unordered_map on the code.

This makes sense, since std::unordered_map cannot be traversed in order, so the standard library chose hash map instead of Red-black tree, since hash map has a better amortized insert time complexity.

Stepping into emplace leads to /usr/include/c++/6/bits/unordered_map.h:

377       template<typename... _Args>
378     std::pair<iterator, bool>
379     emplace(_Args&&... __args)
380     { return _M_h.emplace(std::forward<_Args>(__args)...); }

So we open the source file in vim and search for the definition of _M_h:

      typedef __umap_hashtable<_Key, _Tp, _Hash, _Pred, _Alloc>  _Hashtable;
      _Hashtable _M_h;

So hash table it is.

std::set and std::unordered_set

Analogous to std::map vs std::unordered_map: What is the underlying data structure of a STL set in C++?

Performance characteristics

You could also infer the data structure used by timing them:

enter image description here

Graph generation procedure and Heap vs BST analysis and at: Heap vs Binary Search Tree (BST)

Since std::map is analogous to std::set we clearly see for:

  • std::map, a logarithmic insertion time
  • std::unordered_map, a more complex hashmap pattern:

    • on the non-zoomed plot, we clearly see the backing dynamic array doubling on huge one off linearly increasing spikes
    • on the zoomed plot, we see that the times are basically constant and going towards 250ns, therefore much faster than the std::map, except for very small map sizes

      Several strips are clearly visible, and their inclination becomes smaller whenever the array doubles.

      I believe this is due to average linearly increasing linked list walks withing each bin. Then when the array doubles, we have more bins, so shorter walks.

Leave a Comment