Question

Copy the file, HP.txt1 , on the course web page to Readme.txt in your BlueJ project....

Copy the file, HP.txt1 , on the course web page to Readme.txt in your BlueJ project. Develop and test a Java program, AnalyzeText, that reads Readme.txt and reports the following: a) The 3 most common words that are not in this stop-list: a,am,an,and,any,are,as,a,be,by,he,her,hers,him,his,i,if,in,into,is,it,its, me,my,no,nor,not,of,on,or,she,than,that,the,their,them,then,there,these, they,to,too,us,was,we,were,what,when,where,which,while,who,whom,why,you An example of output: Word Frequency wand 21 potion 10 wizard 9 b) The 3 most common names that are in this list of names: Harry, Dumbledore, Voldemort, Snape, Sirius, Hermione, Ron, Draco, Hagrid, Neville, Dobby, Moody, Lupin, Bellatrix, McGonagall, Grindelwald, Tina An example of output: Name Frequency Harry 15 Snape 8 Dobby 5 Regarding Part A: Consider the following approach to Part A. For Part A you determine the vocabulary of the text, and for each word you determine the frequency. On paper the vocabulary and frequencies comprises two parallel lists, one for words and one for frequency. word frequency star 5 potion 10 wizard 9 wand 21 etc An algorithm in pseudocode to create the two lists: For each token in Readme.txt: get the token (as lowercase) with punctuation removed if the token is not in the stop list if the token is in the word list increase its frequency count otherwise add the token to the word list set its frequency to 1. // // determine 3 most frequent words Repeat the following 3 times: Find the largest frequency and then: report the corresponding word set that word’s frequency to -1 1 Rowling, J. K. Harry Potter And the Deathly Hallows. New York, NY : Arthur A. Levine Books, 2007. http://hpread.scholastic.com/HP_Book7_Chapter_Excerpt.pdf With the above methods the code to process a single token is String token = getToken(s); if (! stopList.contains(token)){ int i = find(word, token); if (i>=0) { incrementFrequency(freq, i); } else { addNewWord(word, freq, token);

0 0
Add a comment Improve this question Transcribed image text
Answer #1

//----------- AnalyzeText.java ----------

import java.io.*;
import java.util.*;
class AnalyzeText
{
   private final ArrayList<String> stopList = new ArrayList<>(
       Arrays.asList("a","am","an","and","any","are","as","a","be"
       ,"by","he","her","hers","him","his","i","if","in","into","is"
       ,"it","its"," me","my","no","nor","not","of","on","or","she"
       ,"than","that","the","their","them","then","there","these"
       ," they","to","too","us","was","we","were","what","when"
       ,"where","which","while","who","whom","why","you"));
      
   private final ArrayList<String> names = new ArrayList<>(
       Arrays.asList("harry","dumbledore","voldemort","snape","sirius","hermione"
       ,"ron","draco","hagrid","neville","dobby","moody","lupin","bellatrix"
       ,"mcgonagall","grindelwald","tina"));
      
   public String getToken(String str)
   {
       //convert the string into lower case.
       //replace all Punctuation symbols with empty string.
       return str.toLowerCase().replaceAll("\\W", "");
   }
  
   public int find(ArrayList<String> word, String token)
   {
       for(int i = 0;i<word.size();i++)
       {
           if(word.get(i).equals(token))
           {
               return i;
           }
       }
       return -1;
   }
  
   public void incrementFrequency(ArrayList<Integer> freq, int i)
   {
       freq.set(i,freq.get(i) + 1);
   }
  
   public void addNewWord(ArrayList<String> word, ArrayList<Integer> freq, String token)
   {
       word.add(token);
       freq.add(1);
   }
  
   public void analyzeText(String fileName)
   {
       ArrayList<String> word = new ArrayList<String>();
       ArrayList<Integer> freq = new ArrayList<Integer>();
      
       File f = new File(fileName);
       if(!f.exists())
       {
           System.err.println("File Not Found: "+fileName);
           return;
       }
       try
       {
           BufferedReader br = new BufferedReader(new FileReader(f));
           String line = br.readLine();
           String token;
           int i;
           while(line!=null)
           {
               String[] words = line.split("\\W");
               for(String s: words)
               {
                   token = getToken(s);
                   if(!stopList.contains(token) && !token.equals(""))
                   {
                      
                       i = find(word, token);
                       if( i >= 0)
                       {
                           incrementFrequency(freq, i);
                       }
                       else
                       {
                           addNewWord(word, freq, token);
                       }
                   }
               }
              
               line = br.readLine();
           }
       }
       catch(Exception e)
       {
           System.err.println("Unable to read file: "+fileName);
       }
      
       generateReports(word,freq);
   }
   public void generateReports(ArrayList<String> words, ArrayList<Integer> freq)
   {

       ArrayList<String> top3Words = new ArrayList<>();
       ArrayList<Integer> top3WordsFreq = new ArrayList<>();
       boolean found;
      
       for(int i =0;i<3;i++)
       {
           int index = 0;
           int maxFreq = freq.get(index);
           found = false;
           for(int j = 1;j<words.size();j++)
           {
               if(!(names.contains(words.get(j))) && freq.get(j) > maxFreq)
               {
                   maxFreq = freq.get(j);
                   index = j;
                   found = true;
               }
           }
           if(found && maxFreq != -1)
           {
               top3Words.add(words.get(index));
               top3WordsFreq.add(freq.get(index));
               freq.set(index,-1);
           }
       }
       System.out.println("\nTop 3 Words\n");
       System.out.printf("%-20s %-20s\n\n","Word","Frequency");
       for(int i = 0;i<top3Words.size();i++)
       {
           System.out.printf("%-20s %-20d\n",top3Words.get(i),top3WordsFreq.get(i));
       }
      
       ArrayList<String> top3Names = new ArrayList<>();
       ArrayList<Integer> top3NamesFreq = new ArrayList<>();
      
      
       for(int i =0;i<3;i++)
       {
           int index = 0;
           int maxFreq = freq.get(index);
           found = false;
           for(int j = 1;j<words.size();j++)
           {
               if(names.contains(words.get(j)) && freq.get(j) > maxFreq)
               {
                   maxFreq = freq.get(j);
                   index = j;
                   found = true;
               }
           }
           if(found && maxFreq != -1)
           {
               top3Names.add(words.get(index));
               top3NamesFreq.add(freq.get(index));
               freq.set(index,-1);
           }
       }
       System.out.println("\nTop 3 Names\n");
       System.out.printf("%-20s %-20s\n\n","Name","Frequency");
       for(int i = 0;i<top3Names.size();i++)
       {
           System.out.printf("%-20s %-20d\n",top3Names.get(i),top3NamesFreq.get(i));
       }
   }
   public static void main(String[] args)
   {
       AnalyzeText tester = new AnalyzeText();
       tester.analyzeText("Readme.txt");
   }
}

//SINCE navigating to the urls posted in question is illegal in HomeworkLib, i have used the question you have posted as input.

//you can test in any input file output will be correct and COMMENT IF YOU HAVE DOUBTS AND LIKE THE ANSWER,

//Sample input file: Readme.txt -------------

Copy the file, HP.txt1 , on the course web page to Readme.txt in your BlueJ project.
Develop and test a Java program, AnalyzeText, that reads Readme.txt and reports the following
: a) The 3 most common words that are not in this stop-list:
a,am,an,and,any,are,as,a,be,by,he,her,hers,him,his,i,if,in,into,is,
it,its, me,my,no,nor,not,of,on,or,she,than,that,the,their,them,then,
there,these, they,to,too,us,was,we,were,what,when,where,which,while,who,whom,why,you
An example of output: Word Frequency wand 21 potion 10 wizard 9 b) The 3 most common names
that are in this list of names: Harry, Dumbledore, Voldemort, Snape, Sirius, Hermione, Ron,
Draco, Hagrid, Neville, Dobby, Moody, Lupin, Bellatrix, McGonagall, Grindelwald,
Tina An example of output: Name Frequency Harry 15 Snape 8 Dobby 5 Regarding Part
A: Consider the following approach to Part A. For Part A you determine the vocabulary
of the text, and for each word you determine the frequency. On paper the vocabulary
and frequencies comprises two parallel lists, one for words and one for frequency.
word frequency star 5 potion 10 wizard 9 wand 21 etc An algorithm in pseudocode to
create the two lists: For each token in Readme.txt: get the token (as lowercase) with
punctuation removed if the token is not in the stop list if the token is in the word list
increase its frequency count otherwise add the token to the word list set its frequency to 1.
// // determine 3 most frequent words Repeat the following 3 times:
Find the largest frequency and then: report the corresponding word set that word’s
frequency to -1 1 Rowling, J. K. Harry Potter And the Deathly Hallows. New York, NY :
Arthur A. Levine Books, 2007. http://hpread.scholastic.com/HP_Book7_Chapter_Excerpt.pdf
With the above methods the code to process a single token is String token = getToken(s);
if (! stopList.contains(token)){ int i = find(word, token); if (i>=0) { incrementFrequency(freq, i); } else { addNewWord(word, freq, token);

//SAMPEL OUTPUT

Add a comment
Know the answer?
Add Answer to:
Copy the file, HP.txt1 , on the course web page to Readme.txt in your BlueJ project....
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT