Following on from the previous post, this example shows an example of how to use an STL multimap to track the line number(s) associated with each word in a text file.
This program essentially reads in text line-by-line, while stripping out all occurrences of punctuation and other non-alphanumeric charcters. Each pair is inserted into the multimap container using the insert function.
As with the previous posting, which deals with counting the frequency of words in a file, this example also uses the sample Hamlet.txt file.
Code listing as follows:
#include <iostream>
#include <sstream>
#include <fstream>
#include <map>
using namespace std;
int main()
{
const string path = "/home/andy/NetBeansProjects/Hamlet.txt"; //Linux
//const string path = "C:\\Dump\\Hamlet.txt";
ifstream input( path.c_str() );
if ( !input )
{
cout << "Error opening file." << endl;
return 0;
}
multimap< string, int, less<string> > words;
int line;
string word;
// For each line of text
for ( line = 1; input; line++ )
{
char buf[ 255 ];
input.getline( buf, 128 );
// Discard all punctuation characters, leaving only words
for ( char *p = buf;
*p != '\0';
p++ )
{
if ( !isalpha( *p ) )
*p = ' ';
}
istringstream i( buf );
while ( i )
{
i >> word;
if ( word != "" )
{
words.insert( pair<const string,int>( word, line ) );
}
}
}
input.close();
// Output results
multimap< string, int, less<string> >::iterator it1;
multimap< string, int, less<string> >::iterator it2;
for ( it1 = words.begin(); it1 != words.end(); )
{
it2 = words.upper_bound( (*it1).first );
cout << (*it1).first << " : ";
for ( ; it1 != it2; it1++ )
{
cout << (*it1).second << " ";
}
cout << endl;
}
return 0;
}
Giving the following output. Notice that multiple occurrences of words per line are mapped.
Related post: Counting the Number of Words in a Text File in STL / C++
First of all, great work appreciate it.
Would like to point out the few bugs in the program.
1) The last word on every line from the text file, the line number is printed twice.
For example : “admirable” from line 9, number 9 is printed twice. Yet there’s only one “admirable” .
admirable : 9 9
2) The last word from the last line, prints an extra line number.
For example : “neither” which is on line 13, prints an extra line number 14 with it.
neither : 13 13 14
I hope there is a solution for this, thank you.