how to find most repeated bi-grams (pairs of words) in the text by using java. (without using Hashmap )
Program:
import java.util.*;
import java.io.*;
//Bigrams class
class Bigrams
{
//main method
public static void main (String[] args) throws
IOException
{
//open the text file
Scanner sc = new Scanner(new
File("bigrams.txt"));
//create an array of
strings
String text[] = new
String[1000];
int n;
for(n=0; sc.hasNext(); n++)
{
//read a String
from the file
String s =
sc.next();
//convert to
lower case
s =
s.toLowerCase();
//remove the
punctuation
s =
s.replaceAll("\\p{Punct}","");
text[n] =
s;
}
//declare the array of count
int count[] = new int[n-1];
//declare the array of
bigrams
String bigrams[][] = new
String[n-1][2];
int m = 0, j;
//processing
for(int i=0; i<n-1; i++)
{
for(j=0; j<m;
j++)
{
//check for existing bigrams
if(text[i].equalsIgnoreCase(bigrams[j][0])
&& text[i+1].equalsIgnoreCase(bigrams[j][1]))
{
count[j]++;
break;
}
}
//for
non-existing bigrams
if(j==m)
{
bigrams[m][0] = text[i];
bigrams[m][1] = text[i+1];
count[j] = 1;
m++;
}
}
int max=0;
j = 0;
//calculate maximum frequency
for(int i=0; i<m; i++)
{
if(count[i]>max)
{
max = count[i];
j = i;
}
}
//print the most repeated
bi-grams
System.out.println("Most repeated
bi-grams: " + bigrams[j][0] + " " + bigrams[j][1]);
}
}
bigrams.txt
The book I read was called A Wrinkle In Time. In the book there is a main character named Meg. Meg and her brother Charles Wallace and a guy named Calvin go on a trip across time and space. They are trying to save their father, a scientist. The dad has been captured by a creature in another galaxy. The kids save the dad and go home using a tesseract.
Output:
Most repeated bi-grams: the book
how to find most repeated bi-grams (pairs of words) in the text by using java. (without...
how can i reword or state this nursing diagnosis without have too many words repeated? “potential preterm labor related to previous preterm labor as evidence by previous preterm at 21 weeks (demise)”
1) If there are N words after the tokenization process, how many bi-grams and tri-grams can be generated a) N-1, N-2 b) N-2, N-1 c) N, N-1 d)N-2,N-3 ------------------------------------------------------------------------ ------------------------------------------------------------------------ 2) Regarding the Document Term Matrix(DTM) which of the following is true? a) Each value(typically) contains the number of appearances of that term in that document b) each row represents one term c) each column represents one document ------------------------------------------------------------------------ ------------------------------------------------------------------------ 3) “unnest_tokens" function is used to reduce the words to...
Find the Nearest Repeated Entries in an Array People do not like reading text in which a word is used multiple times in a short paragraph. You are to write a program which helps identify such a problem. Write a program that takes as input an array and finds the distance between closest pairs of equal entries. For example if s = <"All, "work", "and", "no", "play", "makes", "for", "no", "work", "and", "no", "fun", "and", "no", "results">, then the second...
Using a doubly linked list, create a list L1 with words from a text file in Java.
How do I write a java code that mimics charAt without using java API just primitives and no charAt to be used? I know it comes from primitives but I am confused on how to assemble the loops to derive my own charAt code
I need help parsing a large text file in order to create a map using Java. I have a text file named weather_report.txt which is filled with hundreds of different indexes. For example: one line is "POMONA SUNNY 49 29 46 NE3 30.46F". There are a few hundred more indexes like that line with different values in the text file and they are not delimited by commas but instead by spaces. Therefore, in this list of indexes we only care...
without using map 1. Write a C++ program to find out the top 10 words in terms of number of appearances in a given file, named “picasso.txt”. The data file is to be downloaded from iLMS system (http://lms.nthu.edu.tw). (Hint: The most efficient way to handle this problem is to build a word dictionary using class map in STL (Standard Template Library) if you know how to do it. On the other hand, without using map, it is still possible to...
using java find the third most frequent word in a paragraph in an array list. also print the sentences that include this word. the paragraph is stored in an array list. you have to search within the array list the third most used word.
Using Java how would I write a program that reads and writes from binary or text files and gives me an output similar to this? Example Output: --------------------Configuration: <Default>-------------------- Enter the file name: kenb Choose binary or text file(b/t): b Choose read or write(r/w): w Enter a line of information to write to the file: lasdklj Would you like to enter another line? Y/N only n Continue? (y/n)y Enter the file name: kenb Choose binary or text file(b/t): b Choose...
Problem How many four-letter code words are possible using the letters in IOWA if (a) The letters may not be repeated? (b) The letters may be repeated?