Hash Tables as a means of fast lookup in STL / C++.

Introduction

In a previous life I was involved in the design of routing optimization software for the telecoms industry. Finding the least cost route for a traffic demand between communicating network sites necessitates a search for all the tariffs provided by all of the carriers. There are a huge number of them and they are frequently subject to change. Of concern during that time was the problem of creating a mapping between strings representing telephone numbers (geographic, mobile, Freephone, premium rate etc) and the numbers representing the tariffs for using those calls in ways that were accurate and quick.

When trying to look up tariffs for tens or even hundreds of thousands of numbers on large networks, using a brute force search for anything but trivial examples is impractical for a commercial least cost routing package. Binary searches on sorted lists are a big improvement, giving a good performance even on very large lists, akin to an average of O( log n ).

Hash tables are among the fastest means of lookup, usually executing in constant time O( 1 ). A hash table is a software data structure based on the idea of using a indexing lookup table to greatly speed up the mapping. Given that the telephone numbers are usually represented by strings and the tariffs represented by floating point numbers, we cannot directly use the strings to index an array. What we can do however, is use a special hash function to convert the string into an index.

To summarize: we calculate the string’s index value using a hash function, and use this index value to access the array containing the pre-defined tariffs that we want.

Real-world hashing

The hashing just described is known as perfect hashing and is usually less than straightforward to implement. What happens in real life is that the hashing function can often map two or more different strings on to the same index value, known as collisions. To get around this, our hash table needs to map each string onto a linked list of potential choices, not just a single one, and by searching this candidate list we are then able to find the string we are after, as well as the tariff value it contains.

When using brute force searches (for loops) the average number of comparisons needed to find what we are after is n/2. When using hash tables, the average number of comparions required is one, plus one hash function calculation.

Defining a hash table class

Here is the class definition of the class to implement the hash table. The hash table itself is simply an indexed array of linked lists. Most of these linked lists will contain none or one entry. When collisions occur with more than one string getting hashed to the same index, the entry gets added to the linked list via the Add function:

#include "List.h"

const int TableSize = 200;

class HashTable
{
private:
	int hash( const char* str ) const;
	LinkedList list[ TableSize ];

public:
	LinkedList const & FindList( const char* str ) const;
	void Add( const char* str, int id );
};

Here is the choice of hash function, which is based on a shift-and-add algorithm as means of randomizing the strings.

// Hashing function to convert string into integer index
int HashTable::hash( const char* str ) const
{
	assert( str != NULL && str[ 0 ] != NULL );

	unsigned h = str[ 0 ];

	for ( int i = 1; str[ i ] != NULL; ++i )
	{
		h = (h << 4) + str[ i ];
	}

	// Return remainder
	return h % TableSize;
}

To find the appropriate linked list in the hash table it is a simple matter of calculating the hash value and using this value to index the correct linked list:

// Return the linked list containing one or more items
LinkedList const & HashTable::FindList( const char* str ) const
{
	int h = hash( str );
	return list[ h ];
}

Example usage:

Here is some code outlining how to use the HashTable APIs to add new items and search for items using the hashing algorithm:

HashTable hTable;
int value = 0;	

// Add some entries
hTable.Add( "01316530969", 11 );
hTable.Add( "01314405220", 12 );
hTable.Add( "02920591318", 10 );
hTable.Add( "02920565967", 14 );

// Get the value from the return linked list
bool success = hTable.Get( "01314405220", value );

Example Visual Studio project downloadable from here.

This same topic is also posted over at The Code Project®.