A hashtable[1] is in principle the fastest way.
You could however compile a Perfect Hash Function given the fact that you know the full domain ahead of time.
With a perfect hash, there need not be a collision, so you can store the hash table in a linear array!
With proper tweaking you can then
- fit all of the hash elements in a limited space, making direct addressing a potential option
- have a reverse lookup in O(1)
The ‘old-school’ tool for generating Perfect Hash functions would be gperf(1). The wikipedia lists more resources on the subject.
Because of all the debate I ran a demo:
Downloading NASDAQ ticker symbols and getting 100 random samples from that set, applying gperf as follows:
gperf -e ' \015' -L C++ -7 -C -E -k '*,1,$' -m 100 selection > perfhash.cpp
Results in a hash-value MAX_HASH_VALUE of
157
and a direct string lookup table of as many items. Here’s just the hash function for demonstration purposes:inline unsigned int Perfect_Hash::hash (register const char *str, register unsigned int len) { static const unsigned char asso_values[] = { 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 64, 40, 1, 62, 1, 41, 18, 47, 0, 1, 11, 10, 57, 21, 7, 14, 13, 24, 3, 33, 89, 11, 0, 19, 5, 12, 0, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156, 156 }; register int hval = len; switch (hval) { default: hval += asso_values[(unsigned char)str[4]]; /*FALLTHROUGH*/ case 4: hval += asso_values[(unsigned char)str[3]]; /*FALLTHROUGH*/ case 3: hval += asso_values[(unsigned char)str[2]+1]; /*FALLTHROUGH*/ case 2: hval += asso_values[(unsigned char)str[1]]; /*FALLTHROUGH*/ case 1: hval += asso_values[(unsigned char)str[0]]; break; } return hval; }
It really doesn’t get much more efficient. Do have a look at the full source at github: https://gist.github.com/sehe/5433535
Mind you, this is a perfect hash, too, so there will be no collisions
Q. […] it’s obviosly “DELL”. Such lookup must be significantly faster than “hashmap lookup”.
A: If you use a simple std::map
the net effect is prefix search (because lexicographical string comparison shortcuts on the first character mismatch). The same thing goes for binary search in a sorted container.
[1] PS. For 100 strings, a sorted array of string with std::search
or std::lower_bound
would potentially be as fast/faster due to the improved Locality of Reference. Consult your profile results to see whether this applies.