Python program This assignment requires you to write a single large program. I have broken it...

Question

Question

Python program This assignment requires you to write a single large program. I have broken it...

Python program

This assignment requires you to write a single large program. I have broken it into two parts below as a suggestion for how to approach writing the code. Please turn in one program file.

Sentiment Analysis is a Big Data problem which seeks to determine the general attitude of a writer given some text they have written. For instance, we would like to have a program that could look at the text "The film was a breath of fresh air" and realize that it was a positive statement while "It made me want to poke out my eye balls" is negative.

One algorithm that we can use for this is to assign a numeric value to any given word based on how positive or negative that word is and then score the statement based on the values of the words. But, how do we come up with our word scores in the first place?

That's the problem that we’ll solve in this assignment. You are going to search through a file containing movie reviews from the Rotten Tomatoes website which have both a numeric score as well as text. You’ll use this to learn which words are positive and which are negative.

The data file is movie_reviews.txt, and looks like this:

4 This quiet , introspective and entertaining independent is worth seeking .    
1 Aggressive self-glorification and a manipulative whitewash .    
4 Best indie of the year , so far .    
2 Nothing more than a run-of-the-mill action flick .    
2 Reeks of rot and hack work from start to finish .

Note that each review starts with a number 0 through 4 with the following meaning:

0 : negative
1 : somewhat negative
2 : neutral
3 : somewhat positive
4 : positive

You are going to write a program that prompts the user to enter a phrase and then indicates whether that phrase is generally "positive" or "negative", by using the sentiment data contained in the data file.

To begin, your program has to compute the average score for all words in the movie_reviews.txt file. You should do this by writing code to do the following:

Set up a new dictionary variable called 'words'
Iterate over every review in the text file.
Examine every word in every review. If this is the first time you have seen this word (i.e. it is not in your dictionary yet) you should add a new entry into your dictionary for that word (i.e. the word becomes a new key in the dictionary). The value to store at this key should be a list that contains two elements - the review and the number 1 (indicating that you've seen this word 1 time)
If you have seen the word before (i.e. it is already in your dictionary) then you should add the new score to the score already stored in your list and increase the number of times that you have seen this word. for example:
```
4 I loved it
1 I hated it
```
... might look like this as a dictionary:
```
words['i']     = [5,2]
words['loved'] = [4,1]
words['it']    = [5,2]
words['hated'] = [1,1]
```
Report to the user that the analysis of the 'movie_reviews.txt' file has been completed. Report how many lines were processed, and how many unique words were recorded. Also give them a summary of how long this took (hint: import the time module and use time.time() to compute the current time before and after your analysis algorithm and then compute the difference). For example:
```
Initializing sentiment database.
Sentiment database initialization complete.
Read 8529 lines.
Total unique words analyzed: 16442
Analysis took 0.142 seconds to complete.
```
What is a word? When designing a program like this, you need to make sure that you and the program's end users agree on what counts as a unique word. For this assignment:
- Ignore capitalization: "A" and "a" should be counted as the same word.
- Do not worry about punctuation symbols or numbers. Just put everything you find into your dictionary. You do not need to strip out punctuation. (So your dictionary will contain entries that are just symbols, like "." or "," and also entries that are just numbers.
- Be sure to strip out all whitespace. For example, you should not have words in your dictionary that contain a space or tab character ("\t").
Also: make sure you are not counting empty lines!
Also note: your analysis time may vary depending on your computer, but you should get the same number of lines and words as shown above.

Now, your program should:

Repeatedly ask the user for a phrase to analyze.
Convert all words to lowercase for analysis. Also, ignore punctuation in the entered phrase except for apostrophes "'" and hyphens "-" .
Analyze each word in this phrase and use your dictionary to compute the average score for each word, and report this to the user.
Compute whether the overall phrase is positive or negative by averaging together the scores for each word that is contained within the phrase. Anything less than 2 should be considered negative, and anything greater than 2 is positive. Note: any words that are not in the dictionary should not be counted when computing the score for the phrase.
Continue to prompt for phrases until the user types "quit", at which point your program should end.

Here is an example session:

Initializing sentiment database.
Sentiment database initialization complete.
Read 8529 lines.
Total unique words analyzed: 16442
Analysis took 0.130 seconds to complete.

Enter a phrase to test: i loved it
* 'i' appears 383 times with an average score of 1.8302872062663185
* 'loved' appears 9 times with an average score of 2.6666666666666665
* 'it' appears 2405 times with an average score of 1.99002079002079
Average score for this phrase is: 2.1623248876512586
This is a POSITIVE phrase.

Enter a phrase to test: this movie was awful
* 'this' appears 994 times with an average score of 1.9657947686116701
* 'movie' appears 969 times with an average score of 1.8286893704850362
* 'was' appears 169 times with an average score of 1.621301775147929
* 'awful' appears 23 times with an average score of 1.0869565217391304
Average score for this phrase is: 1.6256856089959415
This is a NEGATIVE phrase.

Enter a phrase to test: pikachu is watching you
* 'pikachu' does not appear in any movie reviews.
* 'is' appears 2409 times with an average score of 2.0568700705687006
* 'watching' appears 80 times with an average score of 1.875
* 'you' appears 850 times with an average score of 2.050588235294118
Average score for this phrase is: 1.9941527686209397
This is a NEGATIVE phrase.

Enter a phrase to test: pikachu charmander
* 'pikachu' does not appear in any movie reviews.
* 'charmander' does not appear in any movie reviews.
Not enough words to determine sentiment.

Enter a phrase to test: happy birthday sad kitten
* 'happy' appears 17 times with an average score of 2.588235294117647
* 'birthday' appears 9 times with an average score of 2.7777777777777777
* 'sad' appears 33 times with an average score of 2.212121212121212
* 'kitten' appears 1 times with an average score of 2.0
Average score for this phrase is: 2.3945335710041595
This is a POSITIVE phrase.

Enter a phrase to test: it made me want to poke out my eyeballs
* 'it' appears 2405 times with an average score of 1.99002079002079
* 'made' appears 148 times with an average score of 1.945945945945946
* 'me' appears 81 times with an average score of 1.5802469135802468
* 'want' appears 67 times with an average score of 1.8208955223880596
* 'to' appears 2996 times with an average score of 1.9589452603471296
* 'poke' does not appear in any movie reviews.
* 'out' appears 298 times with an average score of 1.8187919463087248
* 'my' appears 83 times with an average score of 2.036144578313253
* 'eyeballs' appears 1 times with an average score of 1.0
Average score for this phrase is: 1.7688738696130188
This is a NEGATIVE phrase.

Enter a phrase to test: I would not, could not, Sam I Am
* 'i' appears 383 times with an average score of 1.8302872062663185
* 'would' appears 213 times with an average score of 1.6431924882629108
* 'not' appears 596 times with an average score of 1.919463087248322
* 'could' appears 155 times with an average score of 1.8838709677419354
* 'not' appears 596 times with an average score of 1.919463087248322
* 'sam' appears 2 times with an average score of 1.5
* 'i' appears 383 times with an average score of 1.8302872062663185
* 'am' appears 7 times with an average score of 2.7142857142857144
Average score for this phrase is: 1.90510621966498
This is a NEGATIVE phrase.

Enter a phrase to test: quit
Quitting.

Some notes:

You must use a dictionary to solve this problem, and you may only analyze the 'moview_review.txt' file ONE TIME. You CANNOT re-analyze the file over and over again (i.e. for the phrase 'happy birthday' you can't iterate over every movie review to find all occurrences of 'happy' and then repeat this process to find all occurrences of 'birthday'). You will lose points for inefficient code.
Important: this program will be tested automatically, so your output should match the examples I give in all formatting, and your sentiment scores should match the values I've computed to at least two decimal places.
We will also examine your code, so remember to put clear comments to explain what you're doing.

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

#Sentiment Analysis
#Data of movie ratings followed by the review given by the critic.
#Program takes into acount the rating and every individual word in the review.

import time
begin_time = time.time()
#set up empty dictionary to hold words
sentiment = {}

#open reviews
file_object = open('movie_reviews.txt', 'r')

#grab data from file
alldata = str.lower(file_object.read())
#close file
file_object.close()

#cut based on new line character to analyze each review
split_reviews = alldata.split('\n')

print('Initializing sentiment database')
#examine every review in database
for review in split_reviews:
words = review.split(' ')

for word in words:
if word not in sentiment:
sentiment[word] = [1, int(words[0])]
else:
sentiment[word][0] += 1
sentiment[word][1] += int(words[0])
#examine every word in this review
#add to sentiment dictionary if neccessary, update if exists already

end_time = time.time()

#display stats
time = format(end_time - begin_time, '.2f')
print('Sentiment database initilization complete')
print('Total unique words analyzed:', len(sentiment))
print('Analysis took', time, 'seconds to complete')
print('')

#convert to lowercase
phrase = str.lower(input('Enter a phrase to test: '))
phrase_split = phrase.split()

total_avg = 0
amount = 0

#count values to figure out the average score for the phrase
for word in phrase_split:
if word in sentiment:
avg_score = sentiment[word][1] / sentiment[word][0]
print('* \'', word, '\' appears ', sentiment[word][0], ' times with an average score of ', avg_score, sep = '')
total_avg += avg_score
amount += 1
else:
print('* \'', word, '\' does not appear in any movie reviews', sep = '')

#if no words appear in reviews
if amount == 0:
print('Not enough words to determine sentiment.')
#else display the average and if > 2 display as a positive statement.
#if less, display asnegative
else:
print('Average score for this phrase is:', total_avg / amount)
if (total_avg / amount) > 2:
print('This is a POSITIVE phrase')
else:
print('This is a NEGATIVE phrase')

Add a comment

Answer 2

Answer #2

To solve this problem, we need to follow the steps mentioned in the assignment. We'll first compute the average score for all words in the 'movie_reviews.txt' file and store the data in a dictionary. Then, we'll prompt the user to enter phrases and analyze them using the dictionary to compute sentiment scores. Finally, we'll check whether the overall phrase is positive or negative based on the average scores of the words in the phrase.

Here's the Python program to accomplish this:

pythonCopy codeimport timedef read_movie_reviews(file_name):
    words = {}
    total_lines = 0
    with open(file_name, 'r') as file:        for line in file:
            total_lines += 1
            review = line.strip().split(' ', 1)[1]
            words_in_review = set(review.lower().replace("'", "").replace("-", "").split())            for word in words_in_review:                if word in words:
                    words[word][0] += int(line[0])
                    words[word][1] += 1
                else:
                    words[word] = [int(line[0]), 1]    return words, total_linesdef main():    print("Initializing sentiment database.")
    start_time = time.time()
    words, total_lines = read_movie_reviews('movie_reviews.txt')
    end_time = time.time()    print("Sentiment database initialization complete.")    print(f"Read {total_lines} lines.")    print(f"Total unique words analyzed: {len(words)}")    print(f"Analysis took {end_time - start_time:.3f} seconds to complete.")    while True:
        user_input = input("Enter a phrase to test (type 'quit' to exit): ")        if user_input.lower() == 'quit':            print("Quitting.")            break
        
        words_in_phrase = user_input.lower().replace("'", "").replace("-", "").split()
        num_words_in_phrase = len(words_in_phrase)
        total_score = 0

        for word in words_in_phrase:            if word in words:
                total_score += words[word][0] / words[word][1]        if num_words_in_phrase == 0:            print("Not enough words to determine sentiment.")        else:
            average_score = total_score / num_words_in_phrase            print(f"Average score for this phrase is: {average_score:.2f}")            if average_score >= 2:                print("This is a POSITIVE phrase.")            else:                print("This is a NEGATIVE phrase.")if __name__ == "__main__":
    main()

This Python program reads the 'movie_reviews.txt' file, analyzes it, and stores the word scores in a dictionary. It then prompts the user to enter phrases and computes the sentiment scores based on the average word scores. The program continues to prompt for phrases until the user enters "quit" to exit.

Please make sure to have the 'movie_reviews.txt' file in the same directory as the Python program before running it. The output of the program will match the examples provided in the assignment.

answered by: Hydra Master

Add a comment

Answer 3

Python program This assignment requires you to write a single large program. I have broken it...

Homework Answers

Add Answer to:
Python program This assignment requires you to write a single large program. I have broken it...

Post as a guest

Earn Coins

using Java program please copy and paste the code don't screenshot it import java.util.Scanner; import java.io.File;...

(Python 3) Write a program that reads the contents of a text file. The program should...

In this assignment, you will explore more on text analysis and an elementary version of sentiment...

Python 3.7 Coding assignment This Program should first tell users that this is a word analysis...

Write a program IN PYTHON that checks the spelling of all words in a file. It...

Write a program that employs the four letter word dictionary to check the spelling of an...

Homework description::::: Write JAVA program with following description. Sample output with code will be helful... A...

This program is in python and thanks fro whoever help me. In this program, you will...

Dictionary.java DictionaryInterface.java Spell.java SpellCheck.java In this lab you will write a spell check program. The program...

Description: Overview: You will write a program (says wordcountfreq.c) to find out the number of words and how many times each word appears (i.e., the frequency) in multiple text files. Specifically,...

Python program This assignment requires you to write a single large program. I have broken it...

Homework Answers

Add Answer to: Python program This assignment requires you to write a single large program. I have broken it...

Post as a guest

Earn Coins

Add Answer to:
Python program This assignment requires you to write a single large program. I have broken it...