Question

    Find the top frequently used words in the book of “Sense and Sensibility”. The book...

    Find the top frequently used words in the book of “Sense and Sensibility”. The book is in the sense_andsensibility.txt file.

  1. The words should not be case sensitive, meaning “Mother” and “mother” are considered the same word.
  2. Replace all the punctuation marks with a space.
  3. Use the “stopwords.txt” file to remove all the stop words in text. (Do NOT modify the stopwords.txt file)
  4. Create a histogram similar to the “histogram.jpg” file. The diagram should contain the ranking, the top 30 words, the number of times they appeared in the book. The number of stars will be the number of appearance divided by 10. For example, “mother” appears 263 times; there are 26 stars displayed. (You may not have the exactly the same result as in the histogram.jpg)

Would like to know the answer in Python 3 code

0 0
Add a comment Improve this question Transcribed image text
Answer #1

#package to rmove stopwords
from nltk.corpus import stopwords

#package for plotting
import matplotlib.pyplot as plt

#package to sort dictionary by value
import operator

#package to remove punctuations
import re

#initializing filename
filename = "sense_andsensibility.txt";

#opening file in read mode
f = open(filename, "r")

#reading file
text = f.read()
             
#converting all characters to lowercase
text = text.lower()

#removing punctuation
text = re.sub(r'[^\w\s]',' ',text)

#removing digits
text = re.sub(r'[0-9]',' ',text)
text = re.sub(r'\_',' ',text)

#converting text(string) into words(list)
text = text.split()

#removing special symbols
text = [txt for txt in text if not txt in stopwords.words('english')]

#creating frequency dictionary
frequency = {}

#creating n gram
for i in range(len(text)):

    #if word is not in dictionary, then creating a new key-value pair
    if text[i] not in frequency.keys():
        # intilizing value to 1
        frequency[text[i]] = 1
    
    #if word is in dictionary, increments value by 1
    else:
        frequency[text[i]]+=1

#sorting dictionary in reverse order
sorted_x = sorted(frequency.items(), key=operator.itemgetter(1),reverse=True)
sorted_x = dict(sorted_x)

#if no. of words are more than 30
if(len(sorted_x)>30):
    size = len(sorted_x)-30

#if no. of words are lesser than 30
else:
    size = len(sorted_x)

#reducing dectinary size to 30
for i in range(size):
    sorted_x.popitem()

#Displays top 30 frequent words from console
'''
print("Most Frequent 30 Words: \n")
for x,val in sorted_x.items():
    stars = int(val/10)
    print(x+" : "+str(stars)+" stars")
'''

#getting keys, values in lists
key = list(sorted_x.keys())
val = list(sorted_x.values())

#Getting star value each frequent word
for i in range(len(val)):
    val[i] = int(val[i]/10)

#plotting a bar
plt.bar(key,val)

#setting vertical style for xlabel
plt.xticks(rotation='vertical')

#fixing bottom margin problem
plt.tight_layout()

#adding ylabel as star
plt.ylabel("Stars")

#displaying the plot
plt.show()

#Sample Output histogram

Add a comment
Answer #2

o achieve this task, you can follow these steps using Python 3 code:

  1. Read the contents of the "sense_andsensibility.txt" file.

  2. Convert the text to lowercase to make it case-insensitive.

  3. Remove punctuation marks from the text.

  4. Read the stopwords from the "stopwords.txt" file.

  5. Tokenize the text into words.

  6. Remove stopwords from the list of words.

  7. Count the frequency of each word.

  8. Sort the words based on their frequency in descending order.

  9. Generate the histogram with the top 30 words and their frequencies, represented by stars.

Here's a Python code to accomplish these steps:

pythonCopy codeimport string# Step 1: Read the contents of the "sense_andsensibility.txt" filewith open("sense_andsensibility.txt", "r", encoding="utf-8") as file:
    text = file.read()# Step 2: Convert the text to lowercasetext = text.lower()# Step 3: Remove punctuation markstranslator = str.maketrans("", "", string.punctuation)
text = text.translate(translator)# Step 4: Read the stopwords from the "stopwords.txt" filewith open("stopwords.txt", "r", encoding="utf-8") as file:
    stopwords = set(file.read().splitlines())# Step 5: Tokenize the text into wordswords = text.split()# Step 6: Remove stopwords from the list of wordsfiltered_words = [word for word in words if word not in stopwords]# Step 7: Count the frequency of each wordword_freq = {}for word in filtered_words:
    word_freq[word] = word_freq.get(word, 0) + 1# Step 8: Sort the words based on their frequency in descending ordersorted_words = sorted(word_freq.items(), key=lambda x: x[1], reverse=True)# Step 9: Generate the histogram with the top 30 words and their frequenciesprint("Rank\tWord\t\tFrequency\tHistogram")print("----------------------------------------------")for rank, (word, freq) in enumerate(sorted_words[:30], 1):
    stars = "*" * (freq // 10)    print(f"{rank}\t{word}\t\t{freq}\t\t{stars}")

This code will read the text from the provided files, process the data as described, and then print the histogram with the top 30 words and their frequencies, represented by stars. Note that the output may not exactly match the "histogram.jpg" file due to the random nature of the data and the counting of word occurrences.


answered by: Mayre Yıldırım
Add a comment
Know the answer?
Add Answer to:
    Find the top frequently used words in the book of “Sense and Sensibility”. The book...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • this is python do it in pycharm Programming Practice 8.3: Most Frequent Character 15 pts Not...

    this is python do it in pycharm Programming Practice 8.3: Most Frequent Character 15 pts Not Submitted Due Mar 15, 2020 at 11:59 PM Submission Types Website URL Grade O Out of 15 pts Points Submission & Rubric Description Lesson Objective(s): • Use the .upper function • Use loops with strings • Use lists with strings Lesson: Write a program that lets the user enter a string and displays the letter that appears most frequently in the string. Ignore spaces,...

  • In this assignment, you will explore more on text analysis and an elementary version of sentiment...

    In this assignment, you will explore more on text analysis and an elementary version of sentiment analysis. Sentiment analysis is the process of using a computer program to identify and categorise opinions in a piece of text in order to determine the writer’s attitude towards a particular topic (e.g., news, product, service etc.). The sentiment can be expressed as positive, negative or neutral. Create a Python file called a5.py that will perform text analysis on some text files. You can...

  • Implement the histogram function to complete the desired program. You must use dynamically allocated arrays for...

    Implement the histogram function to complete the desired program. You must use dynamically allocated arrays for this purpose. For your initial implementation, use ordered insertion to keep the words in order and ordered sequential search when looking for words. Note that the array utility functions from the lecture notes are available to you as art of the provided code. Although we are counting words in this program, the general pattern of counting occurrences of things is a common analysis step...

  • Create a Python script file called hw12.py. Add your name at the top as a comment,...

    Create a Python script file called hw12.py. Add your name at the top as a comment, along with the class name and date. Ex. 1. a. Texting Shortcuts When people are texting, they use shortcuts for faster typing. Consider the following list of shortcuts: For example, the sentence "see you before class" can be written as "c u b4 class". To encode a text using these shortcuts, we need to perform a replace of the text on the left with...

  • Python program This assignment requires you to write a single large program. I have broken it...

    Python program This assignment requires you to write a single large program. I have broken it into two parts below as a suggestion for how to approach writing the code. Please turn in one program file. Sentiment Analysis is a Big Data problem which seeks to determine the general attitude of a writer given some text they have written. For instance, we would like to have a program that could look at the text "The film was a breath of...

  • In this assignment you’ll implement a data structure called a trie, which is used to answer...

    In this assignment you’ll implement a data structure called a trie, which is used to answer queries regarding the characteristics of a text file (e.g., frequency of a given word). This write-up introduces the concept of a trie, specifies the API you’re expected to implement, and outlines submission instructions as well as the grading rubric. Please carefully read the entire write-up before you begin coding your submission. Tries A trie is an example of a tree data structure that compactly...

  • I am having problems with the following assignment. It is done in the c language. The...

    I am having problems with the following assignment. It is done in the c language. The code is not reading the a.txt file. The instructions are in the picture below and so is my code. It should read the a.txt file and print. The red car hit the blue car and name how many times those words appeared. Can i please get some help. Thank you. MY CODE: #include <stdio.h> #include <stdlib.h> #include <string.h> struct node { char *str; int...

  • For this lab you will write a Java program that plays a simple Guess The Word...

    For this lab you will write a Java program that plays a simple Guess The Word game. The program will prompt the user to enter the name of a file containing a list of words. These words mustbe stored in an ArrayList, and the program will not know how many words are in the file before it starts putting them in the list. When all of the words have been read from the file, the program randomly chooses one word...

  • This project is meant to give you experience writing linked lists and graphs. As such, you...

    This project is meant to give you experience writing linked lists and graphs. As such, you are not permitted to use arrays or any data structure library. You may, however, make use of code presented in class and posted to Blackboard. Objective Your goal for this project is to take a block of text, analyze it, and produce random sentences in the style of the original text. For example, your program, given Wizard of Oz, might produce: how quite lion...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT