Question

Recursion and Trees Application – Building a Word Index Make sure you have read and understood...

Recursion and Trees Application – Building a Word Index

Make sure you have read and understood

·         lesson modules week 10 and 11

·         chapters 9 and 10 of our text

·         module - Lab Homework Requirements

before submitting this assignment. Hand in only one program, please.

Background:

In many applications, the composition of a collection of data items changes over time. Not only are new data items added and existing ones removed, but data items may be duplicated. A list data structure is a member of the general category of abstract data types called containers, whose purpose is to hold other objects. In C++, lists are provided in the Standard Template Library. However, for this assignment you will design and write your own linked list tree based implementation to support the ADT operations specified below.

Objective:

Working from a problem description design the data structures and operations needed to develop an application to solve the specified task.

Requirements:

Define a linked list tree based container and develop a set of operations for creating a word index that satisfies the application requirement specifications.

Problem Description:

A publisher is in need of an index to be produced for an upcoming data structures textbook publication. You have agreed to take on the task.

The first step in this process is to decide which words should go in the index. The second step is to generate a list of the pages where each word occurs.

The publisher has aided us on our way with this task by providing a list of all unique words used in the manuscript and their frequency of occurrence. Our job now becomes to review the list and choose which words to include in the index.

Discussion:

The primary goal in this problem is to identify a word with its associated frequency. Therefore, the first thing we must do is to define a “word”.

Looking through our own data structures textbook, you may want to consider what is a tentative definition of “word” in this context? Considerations may include “something between two blanks” or a “character string between two blanks”. The definition that works here is for you to decide. You want a definition that works for most words in an index. However, all words before “.” and “,” would have the “.” and “,” attached. Also, words surrounded by quotes would cause a problem. You need to create a working definition to take care of the problem at hand.

A suggestion could be:  

A word is a string of alphanumeric characters between markers where markers are whitespace and all punctuation marks.

Yes, let’s decide this is a feasible working definition of the kind of word that would be a candidate for an index term in this project. We can use function isalnum, available in <cctype>, to determine if a character is an alphanumeric character. We can skip leading nonnumeric characters and store and read characters until we encounter a nonalphanumeric character (isalnum returns false) or inFile goes into the fail state. If we do not encounter any alphanumeric characters we return the empty string.

This process ignores quotation marks, but leaves contractions as a problem. A possible strategy here is to examine a few candidate words and see if a solution can be found. For instance, the common contractions “let’s,” “couldn’t,” “can’t,” and “that’s” all have only one letter after the single quote. The algorithm would return the characters up to the single quote as one word and the character following the single quote as one word. Hmmm..what we want is to do is ignore the characters after the single quote. One idea is to require that words must be at least three characters long to be considered for the index. Besides, ignoring words of fewer than three letters also removes from consideration such words as “a,” “is,” “to,” “do,” and “by” that do not belong in an index.

Brainstorming:

It is now time to list objects that will prove to be useful in solving this problem. Scanning the problem description, the following nouns can be identified: publisher, index, text, word, list, pages, manuscript, frequency, and occurrence. It is now our job to select nouns that would serve as part of the solution. Suggestions here to pick include manuscript, word, list, and frequency.

Scenarios:

At this point a single scenario for this problem can be defined to allow us to begin designing our solution to build this index. Here we go: Read a file (i.e. the manuscript), break it into words, process the words, and output the results. To process each word, check its length. If it is three characters or longer, then check whether it is a word that has been processed already. If it is, increment its frequency; if not, add it to the list of words with a frequency of 1. In order to allow the publisher the additional flexibility to consider ‘qualifying words’ let the user enter the minimum number of characters.

Looking closer, the noun frequency can be seen as a property of a word in this case. This suggests to combine word and frequency into a WordType object. A container object (list) in which to store the items of WordType will be needed. To have the output file list the words in alphabetical order, a Sorted List ADT makes sense.  

This is a good time to introduce a commonly used brainstorming tool used in the design of object-oriented software called class-responsibility-collaboration (CRC) cards.  

The WordType card should look something like this:

Classname

Word Type

Responsibilities

Initialize(word)

Increment frequency

GetWord

GetFrequency

Collaborators

The ListType card should look something like this:

Classname

ListType

Responsibilities

Initialize

PutOrIncrement

Print its contents in alphabetical order

Collaborators

WordType

String

Before going any further, it is time to decide on an implementation structure for ListType. No limit has been placed on the number of items in the list, so a linked implementation is appropriate. For each word, the list must be searched, either to insert a new word or to increment the frequency of an already identified word. A tree-based list would be the most efficient because its search has O(log2 N) complexity. A struct can be used to define a tree node.  

In our original discussion, WordType was developing into a class with member functions to initialize itself, compare itself, and increment its frequency. Since the ‘container’ class is being designed especially for this problem, make WordType astruct rather than a class and let the list be responsible for the processing.

We are now ready to draw out the algorithms to develop our solution:

Algorithms:

Main

Open input file

Open output file

Get file label

Print file label on output file

Get minimum word length

Set letters to GetString(input file)

while more data

    if letters.GetLength() >= minimum word length

        list.InsertOrIncrement(tree, letters)

    Set letters to GetString(input file)

list.Print(output file)

GetString(inFile) returns string

Set letters to empty string

Get a letter

while (NOT isalnum(letter) AND inFile)

     Get a letter

if (NOT inFile)

    return letters

else

   do

    Set letter to tolower(letter);

    Set letters to letters + letter;

    Get a letter

   while (isalnum(letter) AND inFile);

PutOrIncrement

if tree is NULL

Get a new node for tree to point to

Set word memberof Info(tree) to letters

Set count member of Info(tree) to 1

Set Left(tree) to NULL

Set Right(tree) to NULL

else if word memberof Info(tree) is equal to letters

Increment count member of Info(tree)

else if letters is less than the word member of Info(tree)

PutOrIncrement(Left(tree), letters)

else

PutOrincrement(Right(tree), letters)

Print

if tree is not NULL

Print(tree, outFile)

word member of Info(tree).PrintToFile(TRUE, outFile)

outFile << word

Print(tree, outFile)

You are now ready to code the algorithms.

Notes:

Use a defined const MAX_LETTERS to set the maximum word length to 20 characters.

Optional: Your solution can be in one file index.cpp to include data structures, operations and driver or in separate files.

Test Run Requirements:

As a test plan for this program use the provided index.in file to manually calculate the words and frequencies. Write the test results to the a4indexFirstnameLastname file (for FirstnameLastname your Firstname and Lastname).

Grading Criteria:

ListType class is correctly defined and implemented.

structs WordType and TreeNode are correctly defined.

Defined const used to set maximum word length to 20 characters.

Program compiles and runs.

Implementation supports the operations given in the algorithm functional requirements.

A test driver is included to satisfy the test run demonstration.

A copy of your test run output display is included in an output file named a4indexFirstnameLastname

Be sure to include (at minimum) 2 separate files:

index.cpp (complete solution in one file) or (separate files) AND

a4indexFirstnameLastname

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Solution: #include«iostream.h> #include<conio.h> #include<stdlib.h> #include-C stdio.h> #inchude<string.h> #include StrType.

struct TreeNode WordType info TreeNode left, TreeNode right; string.GetStringFile(true, ALPHA NUM, inFile); while (inFile) if

else if (tree->info.word- string) tree->info.count+ else if (string < tree->info.word) Process(tree->left, string); else Proc

int mainO clrscr0: using namespace std; ListType list; string inFileName; string outFileName string outputLabel; ifstream inF

Add a comment
Know the answer?
Add Answer to:
Recursion and Trees Application – Building a Word Index Make sure you have read and understood...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Question 2: Finding the best Scrabble word with Recursion using java Scrabble is a game in...

    Question 2: Finding the best Scrabble word with Recursion using java Scrabble is a game in which players construct words from random letters, building on words already played. Each letter has an associated point value and the aim is to collect more points than your opponent. Please see https: //en.wikipedia.org/wiki/Scrabble for an overview if you are unfamiliar with the game. You will write a program that allows a user to enter 7 letters (representing the letter tiles they hold), plus...

  • Assignment 1 In this assignment you will be writing a tool to help you play the...

    Assignment 1 In this assignment you will be writing a tool to help you play the word puzzle game AlphaBear. In the game, certain letters must be used at each round or else they will turn into rocks! Therefore, we want to create a tool that you can provide with a list of letters you MUST use and a list of available letters and the program returns a list of all the words that contain required letters and only contain...

  • For this week's lab, you will use two of the classes in the Java Collection Framework:...

    For this week's lab, you will use two of the classes in the Java Collection Framework: HashSet and TreeSet. You will use these classes to implement a spell checker. Set Methods For this lab, you will need to use some of the methods that are defined in the Set interface. Recall that if set is a Set, then the following methods are defined: set.size() -- Returns the number of items in the set. set.add(item) -- Adds the item to the...

  • For this week's lab, you will use two of the classes in the Java Collection Framework:...

    For this week's lab, you will use two of the classes in the Java Collection Framework: HashSet and TreeSet. You will use these classes to implement a spell checker. Set Methods For this lab, you will need to use some of the methods that are defined in the Set interface. Recall that if set is a Set, then the following methods are defined: set.size() -- Returns the number of items in the set. set.add(item) -- Adds the item to the...

  • please write a C++ code Problem A: Word Shadow Source file: shadow.cpp f or java, or.cj...

    please write a C++ code Problem A: Word Shadow Source file: shadow.cpp f or java, or.cj Input file: shadow.in in reality, when we read an English word we normally do not read every single letter of that word but rather the word's "shadow" recalls its pronunciation and meaning from our brain. The word's shadow consists of the same number of letters that compose the actual word with first and last letters (of the word) in their original positions while the...

  • SCREENSHOTS OF CODE ONLY!! PLEASE DON'T POST TEXT!! C++ CODE! PLEASE DON'T REPOST OLD POSTS! Objective...

    SCREENSHOTS OF CODE ONLY!! PLEASE DON'T POST TEXT!! C++ CODE! PLEASE DON'T REPOST OLD POSTS! Objective To gain experience with the operations involving binary search trees. This data structure as linked list uses dynamic memory allocation to grow as the size of the data set grows. Unlike linked lists, a binary search tree is very fast to insert, delete and search. Project Description When an author produce an index for his or her book, the first step in this process...

  • You are given a set of ABC cubes for kids (like the one shown below) and a word. On each side of ...

    You are given a set of ABC cubes for kids (like the one shown below) and a word. On each side of each cube a letter is written 7 FI 0 You need to find out, whether it it possible to form a given word by the cubes For example, suppose that you have 5 cubes: B1: MXTUAS B2:OQATGE ВЗ: REwMNA B4: MBDFAC В5: IJKGDE (here for each cube the list of letters written on its sides is given) You...

  • You will be writing a simple Java program that implements an ancient form of encryption known...

    You will be writing a simple Java program that implements an ancient form of encryption known as a substitution cipher or a Caesar cipher (after Julius Caesar, who reportedly used it to send messages to his armies) or a shift cipher. In a Caesar cipher, the letters in a message are replaced by the letters of a "shifted" alphabet. So for example if we had a shift of 3 we might have the following replacements: Original alphabet: A B C...

  • Indexing With Trees Andrew Rosen Abstract a this lab you ate an ind r the wkd...

    Indexing With Trees Andrew Rosen Abstract a this lab you ate an ind r the wkd W Sa e (gt00.tat) althogh yos c tr w t yor ablity to pet together hiple deta trate and ots Cle ah fe to se what you t glete Your Assignment We will reate an Indertree, aspecial type of Binary Sdh Tr The IndeTree dos a bit moee than your staadand tree, as w wll to bald teatlook wee eah topic is lided in...

  • CSC110 Lab 6 (ALL CODING IN JAVA) Problem: A text file contains a paragraph. You are to read the contents of the file, store the UNIQUEwords and count the occurrences of each unique word. When the fil...

    CSC110 Lab 6 (ALL CODING IN JAVA) Problem: A text file contains a paragraph. You are to read the contents of the file, store the UNIQUEwords and count the occurrences of each unique word. When the file is completely read, write the words and the number of occurrences to a text file. The output should be the words in ALPHABETICAL order along with the number of times they occur and the number of syllables. Then write the following statistics to...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT