Question

Could you please do this in Python to run in Spark. For your second assignment I...

Could you please do this in Python to run in Spark.

For your second assignment I would like you to write code that does a word and character count on some input:

URL
Input File
The word count should not only count all occurrences of an individual word, the total number of representations of that word, and a frequency analysis of that word versus the total document.

Occurence are considered for spelling, without care of capitalization

for example -- "word Word word WoRd apple:
word 80% 4 total occurrences, 3 representations {word, Word, WoRd,}
apple 20% 1 total occurence, 1 representation {apple}
The character count similar analyzes the occurrences, representations and frequency analysis of character in the document.
0 0
Add a comment Improve this question Transcribed image text
Answer #1

Before running this code, make sure that validators, requests packages are installed.

validators is used to check whether the given file_name is url or not.

remove_word(): this function is used to remove the specific word from the list and returns the modified list

along with the list of different representation of the specific word.

import os

import validators

import requests as req

# DELIMITERS is used to split the string based on the spacial characters it contains.

# Update DELIMITERS as per your needs

DELIMITERS = ['.',',','\t','\n']

def remove_word(data, word):

    # this list is used to store the different representations of the word

    representations = []

    for d in data:

        if d.lower() == word:

            if d not in representations:

                representations.append(d)

            data.remove(d)

    return data, representations

def word_stats(data: str):

    # Replace all delimiters with space

    for char in DELIMITERS:

        data = data.replace(char,' ')

    # coverting the string into list of words by splitting the string

    words = data.split()

    words_copy = data.lower()

    total_words = len(words)

    while len(words) != 0:

        word = words[0].lower()

        word_count = words_copy.count(word)

        words, representations = remove_word(words, word)

        print("{0}:{1}%, {2} total occurrences, {3} representations {4}".format(word,(word_count/total_words)*100,word_count, len(representations), str(representations)))

if __name__ == "__main__":

    file_name = input()

    data = None

    if validators.url(file_name):

        data = req.get(file_name).text

    else:

        data = open(file_name, mode = 'r').read()

    

    word_stats(data)

Output:

Add a comment
Know the answer?
Add Answer to:
Could you please do this in Python to run in Spark. For your second assignment I...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Could anyone please help with this Python code assignment? In this programming assignment you are to...

    Could anyone please help with this Python code assignment? In this programming assignment you are to create a program called numstat.py that reads a series of integer numbers from a file and determines and displays the following: The name of the file. The sum of the numbers. The count of how many numbers are in the file. The average of the numbers. The average is the sum of the numbers divided by how many there are. The maximum value. The...

  • Programming Assignment 6: A Python Class, Attributes, Methods, and Objects Obiectives .Be able to...

    I need some help with programming this assignment. Programming Assignment 6: A Python Class, Attributes, Methods, and Objects Obiectives .Be able to write a Python class Be able to define class attributes Be able to define class methods .Be able to process input from a text file .Be able to write an application using objects The goal of this programming assignment is to develop a simple image processing application. The application will import a class that creates image objects with...

  • In this assignment, you will explore more on text analysis and an elementary version of sentiment...

    In this assignment, you will explore more on text analysis and an elementary version of sentiment analysis. Sentiment analysis is the process of using a computer program to identify and categorise opinions in a piece of text in order to determine the writer’s attitude towards a particular topic (e.g., news, product, service etc.). The sentiment can be expressed as positive, negative or neutral. Create a Python file called a5.py that will perform text analysis on some text files. You can...

  • Assignment 3: Word Frequencies Prepare a text file that contains text to analyze. It could be...

    Assignment 3: Word Frequencies Prepare a text file that contains text to analyze. It could be song lyrics to your favorite song. With your code, you’ll read from the text file and capture the data into a data structure. Using a data structure, write the code to count the appearance of each unique word in the lyrics. Print out a word frequency list. Example of the word frequency list: 100: frog 94: dog 43: cog 20: bog Advice: You can...

  • Python program This assignment requires you to write a single large program. I have broken it...

    Python program This assignment requires you to write a single large program. I have broken it into two parts below as a suggestion for how to approach writing the code. Please turn in one program file. Sentiment Analysis is a Big Data problem which seeks to determine the general attitude of a writer given some text they have written. For instance, we would like to have a program that could look at the text "The film was a breath of...

  • Could anyone help add to my python code? I now need to calculate the mean and...

    Could anyone help add to my python code? I now need to calculate the mean and median. In this programming assignment you are to extend the program you wrote for Number Stats to determine the median and mode of the numbers read from the file. You are to create a program called numstat2.py that reads a series of integer numbers from a file and determines and displays the following: The name of the file. The sum of the numbers. The...

  • Python Help Please! This is a problem that I have been stuck on.I am only suppose...

    Python Help Please! This is a problem that I have been stuck on.I am only suppose to use the basic python coding principles, including for loops, if statements, elif statements, lists, counters, functions, nested statements, .read, .write, while, local variables or global variables, etc. Thank you! I am using python 3.4.1. ***( The bottom photo is a continuation of the first one)**** Problem statement For this program, you are to design and implement text search engine, similar to the one...

  • I am having problems with the following assignment. It is done in the c language. The...

    I am having problems with the following assignment. It is done in the c language. The code is not reading the a.txt file. The instructions are in the picture below and so is my code. It should read the a.txt file and print. The red car hit the blue car and name how many times those words appeared. Can i please get some help. Thank you. MY CODE: #include <stdio.h> #include <stdlib.h> #include <string.h> struct node { char *str; int...

  • Python Assignment In this assignment, you will use Pandas library to perform analysis on the dataset stored in the following csv file: breast-cancer-wisconsin.csv. Please write script(s) to do the fol...

    Python Assignment In this assignment, you will use Pandas library to perform analysis on the dataset stored in the following csv file: breast-cancer-wisconsin.csv. Please write script(s) to do the following: 1. Read the csv file and covert the dataset into a DataFrame object. 2. Persist the dataset into a SQL table and a JASON file. • Write the content of the DataFrame object into an SQLite database table. This will convert the dataset into a SQL table format. You can...

  • You need not run Python programs on a computer in solving the following problems. Place your...

    You need not run Python programs on a computer in solving the following problems. Place your answers into separate "text" files using the names indicated on each problem. Please create your text files using the same text editor that you use for your .py files. Answer submitted in another file format such as .doc, .pages, .rtf, or.pdf will lose least one point per problem! [1] 3 points Use file math.txt What is the precise output from the following code? bar...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT