Copy the file, HP.txt1 , on the course web page to Readme.txt in your BlueJ project. Develop and test a Java program, AnalyzeText, that reads Readme.txt and reports the following: a) The 3 most common words that are not in this stop-list: a,am,an,and,any,are,as,a,be,by,he,her,hers,him,his,i,if,in,into,is,it,its, me,my,no,nor,not,of,on,or,she,than,that,the,their,them,then,there,these, they,to,too,us,was,we,were,what,when,where,which,while,who,whom,why,you An example of output: Word Frequency wand 21 potion 10 wizard 9 b) The 3 most common names that are in this list of names: Harry, Dumbledore, Voldemort, Snape, Sirius, Hermione, Ron, Draco, Hagrid, Neville, Dobby, Moody, Lupin, Bellatrix, McGonagall, Grindelwald, Tina An example of output: Name Frequency Harry 15 Snape 8 Dobby 5 Regarding Part A: Consider the following approach to Part A. For Part A you determine the vocabulary of the text, and for each word you determine the frequency. On paper the vocabulary and frequencies comprises two parallel lists, one for words and one for frequency. word frequency star 5 potion 10 wizard 9 wand 21 etc An algorithm in pseudocode to create the two lists: For each token in Readme.txt: get the token (as lowercase) with punctuation removed if the token is not in the stop list if the token is in the word list increase its frequency count otherwise add the token to the word list set its frequency to 1. // // determine 3 most frequent words Repeat the following 3 times: Find the largest frequency and then: report the corresponding word set that word’s frequency to -1 1 Rowling, J. K. Harry Potter And the Deathly Hallows. New York, NY : Arthur A. Levine Books, 2007. http://hpread.scholastic.com/HP_Book7_Chapter_Excerpt.pdf With the above methods the code to process a single token is String token = getToken(s); if (! stopList.contains(token)){ int i = find(word, token); if (i>=0) { incrementFrequency(freq, i); } else { addNewWord(word, freq, token);
//----------- AnalyzeText.java ----------
import java.io.*;
import java.util.*;
class AnalyzeText
{
private final ArrayList<String> stopList = new
ArrayList<>(
Arrays.asList("a","am","an","and","any","are","as","a","be"
,"by","he","her","hers","him","his","i","if","in","into","is"
,"it","its","
me","my","no","nor","not","of","on","or","she"
,"than","that","the","their","them","then","there","these"
,"
they","to","too","us","was","we","were","what","when"
,"where","which","while","who","whom","why","you"));
private final ArrayList<String> names = new
ArrayList<>(
Arrays.asList("harry","dumbledore","voldemort","snape","sirius","hermione"
,"ron","draco","hagrid","neville","dobby","moody","lupin","bellatrix"
,"mcgonagall","grindelwald","tina"));
public String getToken(String str)
{
//convert the string into lower
case.
//replace all Punctuation symbols
with empty string.
return
str.toLowerCase().replaceAll("\\W", "");
}
public int find(ArrayList<String> word, String
token)
{
for(int i =
0;i<word.size();i++)
{
if(word.get(i).equals(token))
{
return i;
}
}
return -1;
}
public void
incrementFrequency(ArrayList<Integer> freq, int i)
{
freq.set(i,freq.get(i) + 1);
}
public void addNewWord(ArrayList<String> word,
ArrayList<Integer> freq, String token)
{
word.add(token);
freq.add(1);
}
public void analyzeText(String fileName)
{
ArrayList<String> word = new
ArrayList<String>();
ArrayList<Integer> freq = new
ArrayList<Integer>();
File f = new File(fileName);
if(!f.exists())
{
System.err.println("File Not Found: "+fileName);
return;
}
try
{
BufferedReader
br = new BufferedReader(new FileReader(f));
String line =
br.readLine();
String
token;
int i;
while(line!=null)
{
String[] words = line.split("\\W");
for(String s: words)
{
token = getToken(s);
if(!stopList.contains(token)
&& !token.equals(""))
{
i =
find(word, token);
if( i
>= 0)
{
incrementFrequency(freq, i);
}
else
{
addNewWord(word, freq, token);
}
}
}
line = br.readLine();
}
}
catch(Exception e)
{
System.err.println("Unable to read file: "+fileName);
}
generateReports(word,freq);
}
public void generateReports(ArrayList<String>
words, ArrayList<Integer> freq)
{
ArrayList<String>
top3Words = new ArrayList<>();
ArrayList<Integer>
top3WordsFreq = new ArrayList<>();
boolean found;
for(int i =0;i<3;i++)
{
int index =
0;
int maxFreq =
freq.get(index);
found =
false;
for(int j =
1;j<words.size();j++)
{
if(!(names.contains(words.get(j))) &&
freq.get(j) > maxFreq)
{
maxFreq = freq.get(j);
index = j;
found = true;
}
}
if(found
&& maxFreq != -1)
{
top3Words.add(words.get(index));
top3WordsFreq.add(freq.get(index));
freq.set(index,-1);
}
}
System.out.println("\nTop 3
Words\n");
System.out.printf("%-20s
%-20s\n\n","Word","Frequency");
for(int i =
0;i<top3Words.size();i++)
{
System.out.printf("%-20s
%-20d\n",top3Words.get(i),top3WordsFreq.get(i));
}
ArrayList<String> top3Names =
new ArrayList<>();
ArrayList<Integer>
top3NamesFreq = new ArrayList<>();
for(int i =0;i<3;i++)
{
int index =
0;
int maxFreq =
freq.get(index);
found =
false;
for(int j =
1;j<words.size();j++)
{
if(names.contains(words.get(j)) &&
freq.get(j) > maxFreq)
{
maxFreq = freq.get(j);
index = j;
found = true;
}
}
if(found
&& maxFreq != -1)
{
top3Names.add(words.get(index));
top3NamesFreq.add(freq.get(index));
freq.set(index,-1);
}
}
System.out.println("\nTop 3
Names\n");
System.out.printf("%-20s
%-20s\n\n","Name","Frequency");
for(int i =
0;i<top3Names.size();i++)
{
System.out.printf("%-20s
%-20d\n",top3Names.get(i),top3NamesFreq.get(i));
}
}
public static void main(String[] args)
{
AnalyzeText tester = new
AnalyzeText();
tester.analyzeText("Readme.txt");
}
}
//SINCE navigating to the urls posted in question is illegal in HomeworkLib, i have used the question you have posted as input.
//you can test in any input file output will be correct and COMMENT IF YOU HAVE DOUBTS AND LIKE THE ANSWER,
//Sample input file: Readme.txt -------------
Copy the file, HP.txt1 , on the course web page to Readme.txt in
your BlueJ project.
Develop and test a Java program, AnalyzeText, that reads Readme.txt
and reports the following
: a) The 3 most common words that are not in this stop-list:
a,am,an,and,any,are,as,a,be,by,he,her,hers,him,his,i,if,in,into,is,
it,its,
me,my,no,nor,not,of,on,or,she,than,that,the,their,them,then,
there,these,
they,to,too,us,was,we,were,what,when,where,which,while,who,whom,why,you
An example of output: Word Frequency wand 21 potion 10 wizard 9 b)
The 3 most common names
that are in this list of names: Harry, Dumbledore, Voldemort,
Snape, Sirius, Hermione, Ron,
Draco, Hagrid, Neville, Dobby, Moody, Lupin, Bellatrix, McGonagall,
Grindelwald,
Tina An example of output: Name Frequency Harry 15 Snape 8 Dobby 5
Regarding Part
A: Consider the following approach to Part A. For Part A you
determine the vocabulary
of the text, and for each word you determine the frequency. On
paper the vocabulary
and frequencies comprises two parallel lists, one for words and one
for frequency.
word frequency star 5 potion 10 wizard 9 wand 21 etc An algorithm
in pseudocode to
create the two lists: For each token in Readme.txt: get the token
(as lowercase) with
punctuation removed if the token is not in the stop list if the
token is in the word list
increase its frequency count otherwise add the token to the word
list set its frequency to 1.
// // determine 3 most frequent words Repeat the following 3
times:
Find the largest frequency and then: report the corresponding word
set that word’s
frequency to -1 1 Rowling, J. K. Harry Potter And the Deathly
Hallows. New York, NY :
Arthur A. Levine Books, 2007.
http://hpread.scholastic.com/HP_Book7_Chapter_Excerpt.pdf
With the above methods the code to process a single token is String
token = getToken(s);
if (! stopList.contains(token)){ int i = find(word, token); if
(i>=0) { incrementFrequency(freq, i); } else { addNewWord(word,
freq, token);
//SAMPEL OUTPUT
Copy the file, HP.txt1 , on the course web page to Readme.txt in your BlueJ project....