Mapping Words to Line Numbers in Text Files in STL / C++

Following on from the previous post, this example shows an example of how to use an STL multimap to track the line number(s) associated with each word in a text file.

This program essentially reads in text line-by-line, while stripping out all occurrences of punctuation and other non-alphanumeric charcters. Each pair is inserted into the multimap container using the insert function.

As with the previous posting, which deals with counting the frequency of words in a file, this example also uses the sample Hamlet.txt file.

Code listing as follows:

#include <iostream>
#include <sstream>
#include <fstream>
#include <map>

using namespace std;

int main()
{
    const string path = "/home/andy/NetBeansProjects/Hamlet.txt"; //Linux
    //const string path = "C:\\Dump\\Hamlet.txt";   
    ifstream input( path.c_str() );

	if ( !input )
	{
		cout << "Error opening file." << endl;
		return 0;
	}

	multimap< string, int, less<string> >  words;
	int line;
	string word;

	// For each line of text
	for ( line = 1; input; line++ )
	{
		char buf[ 255 ];
		input.getline( buf, 128 );

		// Discard all punctuation characters, leaving only words
		for ( char *p = buf;
			  *p != '\0';
			  p++ )
		{
			if ( !isalpha( *p ) )
				*p = ' ';
		}

		istringstream i( buf );

		while ( i )
		{
			i >> word;
			if ( word != "" )
			{
				words.insert( pair<const string,int>( word, line ) );
			}
		}					
	}

	input.close();

	// Output results
	multimap< string, int, less<string> >::iterator it1;
	multimap< string, int, less<string> >::iterator it2;

	for ( it1 = words.begin(); it1 != words.end(); )
	{
		it2 = words.upper_bound( (*it1).first );

		cout << (*it1).first << " : ";

		for ( ; it1 != it2; it1++ )
		{
			cout << (*it1).second << " ";
		}
		cout << endl;
	}
	
	return 0;
}

Giving the following output. Notice that multiple occurrences of words per line are mapped.

Related post: Counting the Number of Words in a Text File in STL / C++