Is it possible to map string to int faster than using hashmap?

A hashtable[1] is in principle the fastest way.

You could however compile a Perfect Hash Function given the fact that you know the full domain ahead of time.

With a perfect hash, there need not be a collision, so you can store the hash table in a linear array!

With proper tweaking you can then

  • fit all of the hash elements in a limited space, making direct addressing a potential option
  • have a reverse lookup in O(1)

The ‘old-school’ tool for generating Perfect Hash functions would be gperf(1). The wikipedia lists more resources on the subject.

Because of all the debate I ran a demo:

Downloading NASDAQ ticker symbols and getting 100 random samples from that set, applying gperf as follows:

gperf -e ' \015' -L C++ -7 -C -E -k '*,1,$' -m 100 selection > perfhash.cpp

Results in a hash-value MAX_HASH_VALUE of 157 and a direct string lookup table of as many items. Here’s just the hash function for demonstration purposes:

inline unsigned int Perfect_Hash::hash (register const char *str, register unsigned int len) {
  static const unsigned char asso_values[] = {
      156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
      156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
      156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
      156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
      156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
      156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
      156, 156, 156, 156, 156,  64,  40,   1,  62,   1,
       41,  18,  47,   0,   1,  11,  10,  57,  21,   7,
       14,  13,  24,   3,  33,  89,  11,   0,  19,   5,
       12,   0, 156, 156, 156, 156, 156, 156, 156, 156,
      156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
      156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
      156, 156, 156, 156, 156, 156, 156, 156, 156
    };
  register int hval = len;

  switch (hval) {
      default: hval += asso_values[(unsigned char)str[4]];   /*FALLTHROUGH*/
      case 4:  hval += asso_values[(unsigned char)str[3]];   /*FALLTHROUGH*/
      case 3:  hval += asso_values[(unsigned char)str[2]+1]; /*FALLTHROUGH*/
      case 2:  hval += asso_values[(unsigned char)str[1]];   /*FALLTHROUGH*/
      case 1:  hval += asso_values[(unsigned char)str[0]];   break;
  }
  return hval;
}

It really doesn’t get much more efficient. Do have a look at the full source at github: https://gist.github.com/sehe/5433535

Mind you, this is a perfect hash, too, so there will be no collisions


Q. […] it’s obviosly “DELL”. Such lookup must be significantly faster than “hashmap lookup”.

A: If you use a simple std::map the net effect is prefix search (because lexicographical string comparison shortcuts on the first character mismatch). The same thing goes for binary search in a sorted container.


[1] PS. For 100 strings, a sorted array of string with std::search or std::lower_bound would potentially be as fast/faster due to the improved Locality of Reference. Consult your profile results to see whether this applies.

Leave a Comment