Do writings by individual authors have statistical signatures?
They certainly do, and while such signatures say little about an
author's art, they can say something about literary styles
of an era, and can even help clarify historical controversies about
authorship. Statistical studies, for example, have shown that the
Illiad and the Odyssey were not written by a single
individual.
For this assignment you are to create a program that analyzes text
files -- novels perhaps, or newspaper articles -- and produces two
statistics about these texts: word size frequency, and average
sentence length.
In particular, you should write a two class application that
produces such statistics. Your classes should be called
WordMap.java, and WordMapDriver.java. The driver class should read
in the name of a file that holds a text, and then provide an
analysis for that text. Here is a sample:
> java WordMapDriver
enter name of a file
Analyzed text: /Users/moll/CS121/AliceInWonderland.txt
words of length 1: 7.257%
words of length 2: 14.921%
words of length 3: 24.073%
words of length 4: 20.847%
words of length 5: 12.769%
words of length 6: 7.374%
words of length 7: 6.082%
words of length 8: 3.012%
words of length 9: 1.812%
words of length 10: 0.820%
words of length 11: 0.501%
words of length 12: 0.236%
words of length 13: 0.134%
words of length 14: 0.083%
words of length 15 or larger: 0.001%
average sentence length: 16.917
Your job, then, is to code a solution to this problem, and provide
these two statistics - word size percentage, and average sentence
length (thus in the example given, 7.257 percent of the words are
of length 1, 14.921 percent of the words are of length 2, and so
forth, and the average sentence length is 16.917 words).
You can obtain interesting sample texts by, for example, visiting
the Gutenberg foundation website (Gutenberg.org), and downloading
books from there.
Tips
You should read external files by extending the Echo class from
Chapter 10.
An easy and acceptable way to calculate the average length of
sentences: count the number of words, count the number of
end-of-sentence markers -- !,.,?, then divide the first by the
second. Thus if a text has 21 words and 2 periods and a question
mark, then its average sentence length is 7.
Show percentages using printf, as described in Chapter 5 of the
text. Precision: as above in the example, 3 places to right of
decimal point. To include a % symbol in a format string, include
two percent symbols. The control character \n generates a carriage
return. This statement:
System.out.printf("%4.2f%%\n",40.23);
prints
40.23%
and then advances to the next line.
Additional Requirements
You must use a try/catch harness for your
WordMapDriver code.
You must use The String method split to extract
the individual words on each line of the text you are examining. In
addition to the space symbol, use these characters as delimiters:
,.!?;:
You must comment your classes: add a one line
comment for every method and for every instance variable. This
comment should clearly state the role of that Java constituent.
Copyable code:
//Import files
import java.io.*;
import java.lang.*;
import java.util.*;
//WordMap class
class WordMap
{
//Declare
int sentenceMarkerCount=0;
int wordCount=0;
int[] wordLength;
String filename;
//Define constructor.
public WordMap(String ip)
{
filename=ip;
wordLength=new int[15];
//For loop
for(int kk=0;kk<15;kk++)
wordLength[kk]=0;
}
//Define method to read from file
public void readFile()
{
//Try
try
{
//Scanner
Scanner mapSan=new Scanner(new File(filename));
String line;
//Loop
while(mapSan.hasNextLine())
{
line=mapSan.nextLine();
int tpk=0;
//Loop to find sentence end markers
for(int kk=0;kk<line.length();kk++)
{
char c=line.charAt(kk);
if(c=='.'||c=='!'||c=='?')
tpk++;
}
sentenceMarkerCount+=tpk;
String[] tp=line.split(" ");
wordCount +=tp.length;
//Loop
for(int kk=0;kk<tp.length;kk++)
{
for(int aa=1;aa<15;aa++)
{
if(tp[kk].length()==aa)
wordLength[aa-1]++;
}
if(tp[kk].length()>=15)
wordLength[14]++;
}
}
//Close
mapSan.close();
}
//Catch
catch(Exception pxe)
{
pxe.printStackTrace();
}
}
//Define method to print data
public void printStatisticsInfo()
{
System.out.println("Analysed text:"+filename);
//Loop
for(int kk=0;kk<14;kk++)
{
System.out.printf("words of length " + (kk+1) + ": %5.3f%%\n",((float)wordLength[kk]/wordCount)*100);
}
System.out.printf("words of length 15 or larger : %5.3f%%\n",((float)wordLength[14]/wordCount)*100);
System.out.printf("Average sentence-length : %5.3f%%\n",((float)wordCount/sentenceMarkerCount));
}
}
//WordMapDriver class
public class WordMapDriver
{
//main
public static void main(String[] args)
{
String filename;
Scanner tpSan=new Scanner(System.in);
System.out.print("Enter name of a file:");
filename=tpSan.nextLine();
WordMap wm=new WordMap(filename);
wm.readFile();
wm.printStatisticsInfo();
}
}
Do writings by individual authors have statistical signatures? They certainly do, and while such signatures say...
This is a java homework for my java class. Write a program to perform statistical analysis of scores for a class of students.The class may have up to 40 students.There are five quizzes during the term. Each student is identified by a four-digit student ID number. The program is to print the student scores and calculate and print the statistics for each quiz. The output is in the same order as the input; no sorting is needed. The input is...
Overview: file you have to complete is
WordTree.h, WordTree.cpp, main.cpp
Write a program in C++ that reads an input text
file and counts the occurrence of individual words in the file. You
will see a binary tree to keep track of words and their counts.
Project description:
The program should open and read an input file (named
input.txt) in turn, and build a binary search tree
of the words and their counts. The words will be stored in
alphabetical order...
Overview: The goal of this assignment is to implement a simple spell checker using a hash table. You will be given the basic guidelines for your implementation, but other than that you are free to determine and implement the exact classes and methods that you might need. Your spell-checker will be reading from two input files. The first file is a dictionary containing one word per line. The program should read the dictionary and insert the words into a hash...
Program is in C++. Write a function named wordStatsPlus that accepts as its parameter a string holding a file name, opens that file and reads its contents as a sequence of words, and produces a particular group of statistics about the input. You should report: the total number of lines; total number of words; the number of unique letters used from A-Z, case-insensitively, and its percentage of the 26-letter alphabet; the average number of words per line (as an un-rounded...
Program is in C++. Write a function named wordStatsPlus that accepts as its parameter a string holding a file name, opens that file and reads its contents as a sequence of words, and produces a particular group of statistics about the input. You should report: the total number of lines; total number of words; the number of unique letters used from A-Z, case-insensitively, and its percentage of the 26-letter alphabet; the average number of words per line (as an un-rounded...
I need a program in fortran 95 that can detect if two strings
are anagrams
Anagram Detector Due by Friday 12 April 2019 11:59 PM Program Overview: This program will be able to determine if two inputted texts are anagrams of one another Relevant Details and Formulas: An anagram of a text is a rearrangement of the letters such that it forms another, usually intelligible, set of words. Capitalization is not important and any white space, punctuation, or other non-letter...
have to create five different functions above and call it in the
main fucntion.
Project Exam Statistics A CIS 22A class has two midterm exams with a score between 0 and 100 each. Fractional scores, such as 88.3 are not allowed. The students' ids and midterm exam scores are stored in a text file as shown below // id exam1 exam2 DH232 89 92 Write a program that reads data from an input file named exams.txt, calculates the average of...
Anagram Detector Due by Friday 12 April 2019 11:59 PM Program Overview: This program will be able to determine if two inputted texts are anagrams of one another. Relevant Details and Formulas: An anagram of a text is a rearrangement of the letters such that it forms another, usually intelligible, set of words. Capitalization is not important and any white space, punctuation, or other non-letter symbols should be ignored. Program Specification: * Prompt the user for a pair of text...
GENERAL INSTRUCTIONS All requirements specified on page 64-67 of the course packet “2.5 Programming Assignment Submission Requirements” and “2.6 Flow Chart Symbols” should be followed. Plan the programs on paper before you attempt to write any programs (it will take less time to complete the assignment overall). Electronic version of your programs (the .m files you create) must be uploaded to Canvas. Attach multiple files to one submission. All files must be received by the beginning of...
please put the comment each line, make sure i will have output too. write a program, Summarize (Summarize.java), containing the main() method, that first writes 10,000 random positive double type numbers, to the full accuracy of the number (15/16 decimal places), to a text file named DataValues.txt, one number per line. The program must then close that file, and reopen it for reading. Read back the values from the file and write the following information to a file named Summary.txt:...