Problem

The field of information retrieval is concerned with finding relevant electronic documents...

The field of information retrieval is concerned with finding relevant electronic documents based on a query. For example, given a group of keywords, a search engine retrieves Web pages (documents) and displays them in order, with the most relevant documents listed first. This technology requires a way to compare a document with the query to see which is most relevant to the query.

A simple way to make this comparison is to compute the binary cosine coefficient. The coefficient is a value between 0 and 1, where 1 indicates that the query is very similar to the document and 0 indicates that the query has no keywords in common with the document. This approach treats each document as a set of words. For example, consider the following sample document:

“Cows are big. Cows go moo. I love cows.”

This document would be parsed into keywords where case is ignored and punctuation discarded and turned into the set containing the words “{cows, are, big, go, moo, i, love}”. An identical process is performed on the query.

Once we have a query Q represented as a set of words and a document D represented as a set of words, the similarity between the query and document is computed by

For example, if D = {cows, are, big, go, moo, i, love} and Q = {love, holstein, cows} then

Write a program that allows the user to input a set of strings that represents a document and a set of strings that represents a query. (If you are more ambitious, you could write a program that parses an actual text file and computes the set of unique strings.) Represent the document and query as an STL set of strings. Then compute and print out the similarity between the query and document using the binary cosine coefficient. The sqrt function is in cmath. Use the generic set_intersection function to compute the intersection of Q and D.

Here is an example of set_intersection to intersect set A with B and store the result in C, where all sets are sets of strings:

#include

#include

#include

#include

. . .

Step-by-Step Solution

Request Professional Solution

Request Solution!

We need at least 10 more requests to produce the solution.

0 / 10 have requested this problem solution

The more requests, the faster the answer.

Request! (Login Required)


All students who have requested the solution will be notified once they are available.
Add your Solution
Textbook Solutions and Answers Search
Solutions For Problems in Chapter 19