Question

Python program This assignment requires you to write a single large program. I have broken it...

Python program

This assignment requires you to write a single large program. I have broken it into two parts below as a suggestion for how to approach writing the code. Please turn in one program file.

Sentiment Analysis is a Big Data problem which seeks to determine the general attitude of a writer given some text they have written. For instance, we would like to have a program that could look at the text "The film was a breath of fresh air" and realize that it was a positive statement while "It made me want to poke out my eye balls" is negative.

One algorithm that we can use for this is to assign a numeric value to any given word based on how positive or negative that word is and then score the statement based on the values of the words. But, how do we come up with our word scores in the first place?

That's the problem that we’ll solve in this assignment. You are going to search through a file containing movie reviews from the Rotten Tomatoes website which have both a numeric score as well as text. You’ll use this to learn which words are positive and which are negative.

The data file is movie_reviews.txt, and looks like this:

4 This quiet , introspective and entertaining independent is worth seeking .    
1 Aggressive self-glorification and a manipulative whitewash .    
4 Best indie of the year , so far .    
2 Nothing more than a run-of-the-mill action flick .    
2 Reeks of rot and hack work from start to finish .    

Note that each review starts with a number 0 through 4 with the following meaning:

  • 0 : negative
  • 1 : somewhat negative
  • 2 : neutral
  • 3 : somewhat positive
  • 4 : positive

You are going to write a program that prompts the user to enter a phrase and then indicates whether that phrase is generally "positive" or "negative", by using the sentiment data contained in the data file.

To begin, your program has to compute the average score for all words in the movie_reviews.txt file. You should do this by writing code to do the following:

  • Set up a new dictionary variable called 'words'
  • Iterate over every review in the text file.
  • Examine every word in every review. If this is the first time you have seen this word (i.e. it is not in your dictionary yet) you should add a new entry into your dictionary for that word (i.e. the word becomes a new key in the dictionary). The value to store at this key should be a list that contains two elements - the review and the number 1 (indicating that you've seen this word 1 time)
  • If you have seen the word before (i.e. it is already in your dictionary) then you should add the new score to the score already stored in your list and increase the number of times that you have seen this word. for example:
    4 I loved it
    1 I hated it
    

    ... might look like this as a dictionary:

    words['i']     = [5,2]
    words['loved'] = [4,1]
    words['it']    = [5,2]
    words['hated'] = [1,1]
    
  • Report to the user that the analysis of the 'movie_reviews.txt' file has been completed. Report how many lines were processed, and how many unique words were recorded. Also give them a summary of how long this took (hint: import the time module and use time.time() to compute the current time before and after your analysis algorithm and then compute the difference). For example:
    Initializing sentiment database.
    Sentiment database initialization complete.
    Read 8529 lines.
    Total unique words analyzed: 16442
    Analysis took 0.142 seconds to complete.
    
  • What is a word? When designing a program like this, you need to make sure that you and the program's end users agree on what counts as a unique word. For this assignment:
    • Ignore capitalization: "A" and "a" should be counted as the same word.
    • Do not worry about punctuation symbols or numbers. Just put everything you find into your dictionary. You do not need to strip out punctuation. (So your dictionary will contain entries that are just symbols, like "." or "," and also entries that are just numbers.
    • Be sure to strip out all whitespace. For example, you should not have words in your dictionary that contain a space or tab character ("\t").
  • Also: make sure you are not counting empty lines!
  • Also note: your analysis time may vary depending on your computer, but you should get the same number of lines and words as shown above.

Now, your program should:

  • Repeatedly ask the user for a phrase to analyze.
  • Convert all words to lowercase for analysis. Also, ignore punctuation in the entered phrase except for apostrophes "'" and hyphens "-" .
  • Analyze each word in this phrase and use your dictionary to compute the average score for each word, and report this to the user.
  • Compute whether the overall phrase is positive or negative by averaging together the scores for each word that is contained within the phrase. Anything less than 2 should be considered negative, and anything greater than 2 is positive. Note: any words that are not in the dictionary should not be counted when computing the score for the phrase.
  • Continue to prompt for phrases until the user types "quit", at which point your program should end.

Here is an example session:

Initializing sentiment database.
Sentiment database initialization complete.
Read 8529 lines.
Total unique words analyzed: 16442
Analysis took 0.130 seconds to complete.

Enter a phrase to test: i loved it
* 'i' appears 383 times with an average score of 1.8302872062663185
* 'loved' appears 9 times with an average score of 2.6666666666666665
* 'it' appears 2405 times with an average score of 1.99002079002079
Average score for this phrase is: 2.1623248876512586
This is a POSITIVE phrase.

Enter a phrase to test: this movie was awful
* 'this' appears 994 times with an average score of 1.9657947686116701
* 'movie' appears 969 times with an average score of 1.8286893704850362
* 'was' appears 169 times with an average score of 1.621301775147929
* 'awful' appears 23 times with an average score of 1.0869565217391304
Average score for this phrase is: 1.6256856089959415
This is a NEGATIVE phrase.

Enter a phrase to test: pikachu is watching you
* 'pikachu' does not appear in any movie reviews.
* 'is' appears 2409 times with an average score of 2.0568700705687006
* 'watching' appears 80 times with an average score of 1.875
* 'you' appears 850 times with an average score of 2.050588235294118
Average score for this phrase is: 1.9941527686209397
This is a NEGATIVE phrase.

Enter a phrase to test: pikachu charmander
* 'pikachu' does not appear in any movie reviews.
* 'charmander' does not appear in any movie reviews.
Not enough words to determine sentiment.

Enter a phrase to test: happy birthday sad kitten
* 'happy' appears 17 times with an average score of 2.588235294117647
* 'birthday' appears 9 times with an average score of 2.7777777777777777
* 'sad' appears 33 times with an average score of 2.212121212121212
* 'kitten' appears 1 times with an average score of 2.0
Average score for this phrase is: 2.3945335710041595
This is a POSITIVE phrase.

Enter a phrase to test: it made me want to poke out my eyeballs
* 'it' appears 2405 times with an average score of 1.99002079002079
* 'made' appears 148 times with an average score of 1.945945945945946
* 'me' appears 81 times with an average score of 1.5802469135802468
* 'want' appears 67 times with an average score of 1.8208955223880596
* 'to' appears 2996 times with an average score of 1.9589452603471296
* 'poke' does not appear in any movie reviews.
* 'out' appears 298 times with an average score of 1.8187919463087248
* 'my' appears 83 times with an average score of 2.036144578313253
* 'eyeballs' appears 1 times with an average score of 1.0
Average score for this phrase is: 1.7688738696130188
This is a NEGATIVE phrase.

Enter a phrase to test: I would not, could not, Sam I Am
* 'i' appears 383 times with an average score of 1.8302872062663185
* 'would' appears 213 times with an average score of 1.6431924882629108
* 'not' appears 596 times with an average score of 1.919463087248322
* 'could' appears 155 times with an average score of 1.8838709677419354
* 'not' appears 596 times with an average score of 1.919463087248322
* 'sam' appears 2 times with an average score of 1.5
* 'i' appears 383 times with an average score of 1.8302872062663185
* 'am' appears 7 times with an average score of 2.7142857142857144
Average score for this phrase is: 1.90510621966498
This is a NEGATIVE phrase.

Enter a phrase to test: quit
Quitting.

Some notes:

  • You must use a dictionary to solve this problem, and you may only analyze the 'moview_review.txt' file ONE TIME. You CANNOT re-analyze the file over and over again (i.e. for the phrase 'happy birthday' you can't iterate over every movie review to find all occurrences of 'happy' and then repeat this process to find all occurrences of 'birthday'). You will lose points for inefficient code.
  • Important: this program will be tested automatically, so your output should match the examples I give in all formatting, and your sentiment scores should match the values I've computed to at least two decimal places.
  • We will also examine your code, so remember to put clear comments to explain what you're doing.
0 0
Add a comment Improve this question Transcribed image text
Answer #1


#Sentiment Analysis
#Data of movie ratings followed by the review given by the critic.
#Program takes into acount the rating and every individual word in the review.

import time
begin_time = time.time()
#set up empty dictionary to hold words
sentiment = {}

#open reviews
file_object = open('movie_reviews.txt', 'r')

#grab data from file
alldata = str.lower(file_object.read())
#close file
file_object.close()

#cut based on new line character to analyze each review
split_reviews = alldata.split('\n')

print('Initializing sentiment database')
#examine every review in database
for review in split_reviews:
words = review.split(' ')

for word in words:
if word not in sentiment:
sentiment[word] = [1, int(words[0])]
else:
sentiment[word][0] += 1
sentiment[word][1] += int(words[0])
#examine every word in this review
#add to sentiment dictionary if neccessary, update if exists already

end_time = time.time()

#display stats
time = format(end_time - begin_time, '.2f')
print('Sentiment database initilization complete')
print('Total unique words analyzed:', len(sentiment))
print('Analysis took', time, 'seconds to complete')
print('')

#convert to lowercase
phrase = str.lower(input('Enter a phrase to test: '))
phrase_split = phrase.split()

total_avg = 0
amount = 0

#count values to figure out the average score for the phrase
for word in phrase_split:
if word in sentiment:
avg_score = sentiment[word][1] / sentiment[word][0]
print('* \'', word, '\' appears ', sentiment[word][0], ' times with an average score of ', avg_score, sep = '')
total_avg += avg_score
amount += 1
else:
print('* \'', word, '\' does not appear in any movie reviews', sep = '')

#if no words appear in reviews
if amount == 0:
print('Not enough words to determine sentiment.')
#else display the average and if > 2 display as a positive statement.
#if less, display asnegative
else:
print('Average score for this phrase is:', total_avg / amount)
if (total_avg / amount) > 2:
print('This is a POSITIVE phrase')
else:
print('This is a NEGATIVE phrase')

  

Add a comment
Answer #2

To solve this problem, we need to follow the steps mentioned in the assignment. We'll first compute the average score for all words in the 'movie_reviews.txt' file and store the data in a dictionary. Then, we'll prompt the user to enter phrases and analyze them using the dictionary to compute sentiment scores. Finally, we'll check whether the overall phrase is positive or negative based on the average scores of the words in the phrase.

Here's the Python program to accomplish this:

pythonCopy codeimport timedef read_movie_reviews(file_name):
    words = {}
    total_lines = 0
    with open(file_name, 'r') as file:        for line in file:
            total_lines += 1
            review = line.strip().split(' ', 1)[1]
            words_in_review = set(review.lower().replace("'", "").replace("-", "").split())            for word in words_in_review:                if word in words:
                    words[word][0] += int(line[0])
                    words[word][1] += 1
                else:
                    words[word] = [int(line[0]), 1]    return words, total_linesdef main():    print("Initializing sentiment database.")
    start_time = time.time()
    words, total_lines = read_movie_reviews('movie_reviews.txt')
    end_time = time.time()    print("Sentiment database initialization complete.")    print(f"Read {total_lines} lines.")    print(f"Total unique words analyzed: {len(words)}")    print(f"Analysis took {end_time - start_time:.3f} seconds to complete.")    while True:
        user_input = input("Enter a phrase to test (type 'quit' to exit): ")        if user_input.lower() == 'quit':            print("Quitting.")            break
        
        words_in_phrase = user_input.lower().replace("'", "").replace("-", "").split()
        num_words_in_phrase = len(words_in_phrase)
        total_score = 0

        for word in words_in_phrase:            if word in words:
                total_score += words[word][0] / words[word][1]        if num_words_in_phrase == 0:            print("Not enough words to determine sentiment.")        else:
            average_score = total_score / num_words_in_phrase            print(f"Average score for this phrase is: {average_score:.2f}")            if average_score >= 2:                print("This is a POSITIVE phrase.")            else:                print("This is a NEGATIVE phrase.")if __name__ == "__main__":
    main()

This Python program reads the 'movie_reviews.txt' file, analyzes it, and stores the word scores in a dictionary. It then prompts the user to enter phrases and computes the sentiment scores based on the average word scores. The program continues to prompt for phrases until the user enters "quit" to exit.

Please make sure to have the 'movie_reviews.txt' file in the same directory as the Python program before running it. The output of the program will match the examples provided in the assignment.

answered by: Hydra Master
Add a comment
Know the answer?
Add Answer to:
Python program This assignment requires you to write a single large program. I have broken it...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • using Java program please copy and paste the code don't screenshot it import java.util.Scanner; import java.io.File;...

    using Java program please copy and paste the code don't screenshot it import java.util.Scanner; import java.io.File; public class { public static void main(String[] args) { // Create a new Scanner object to obtain // input from System.in // --> TODO // Ask user for a word to search for. Print // out a prompt // --> TODO // Use the Scanner object you created above to // take a word of input from the user. // --> TODO // ***...

  • (Python 3) Write a program that reads the contents of a text file. The program should...

    (Python 3) Write a program that reads the contents of a text file. The program should then create a dictionary in which the keys are individual words found in the file and the values are the number of times each word appears and a list that contains the line numbers in the file where the word (the key) is found. Then the program will create another text file. The file should contain an alphabetical listing of the words that are...

  • In this assignment, you will explore more on text analysis and an elementary version of sentiment...

    In this assignment, you will explore more on text analysis and an elementary version of sentiment analysis. Sentiment analysis is the process of using a computer program to identify and categorise opinions in a piece of text in order to determine the writer’s attitude towards a particular topic (e.g., news, product, service etc.). The sentiment can be expressed as positive, negative or neutral. Create a Python file called a5.py that will perform text analysis on some text files. You can...

  • Python 3.7 Coding assignment This Program should first tell users that this is a word analysis...

    Python 3.7 Coding assignment This Program should first tell users that this is a word analysis software. For any user-given text file, the program will read, analyze, and write each word with the line numbers where the word is found in an output file. A word may appear in multiple lines. A word shows more than once at a line, the line number will be only recorded one time. Ask a user to enter the name of a text file....

  • Write a program IN PYTHON that checks the spelling of all words in a file. It...

    Write a program IN PYTHON that checks the spelling of all words in a file. It should read each word of a file and check whether it is contained in a word list. A word list available below, called words.txt. The program should print out all words that it cannot find in the word list. Requirements Your program should implement the follow functions: main() The main function should prompt the user for a path to the dictionary file and a...

  • Write a program that employs the four letter word dictionary to check the spelling of an...

    Write a program that employs the four letter word dictionary to check the spelling of an input word (test word). You will need to save the dictionary file to a folder on your computer. For this program you will prompt the user to enter a four letter word (or four characters). Then using a loop read each word from the dictionary and compare it to the input test word. If there is a match then you have spellchecked the word....

  • Homework description::::: Write JAVA program with following description. Sample output with code will be helful... A...

    Homework description::::: Write JAVA program with following description. Sample output with code will be helful... A compiler must examine tokens in a program and decide whether they are reserved words in the Java language, or identifiers defined by the user. Design a program that reads a Java program and makes a list of all the identifiers along with the number of occurrences of each identifier in the source code. To do this, you should make use of a dictionary. The...

  • This program is in python and thanks fro whoever help me. In this program, you will...

    This program is in python and thanks fro whoever help me. In this program, you will build an English to Hmong translator program. Hmong is a language widely spoken by most Southeast Asian living in the twin cities. The program lets the user type in a sentence in English and then translate it to a Hmong sentence. The program does not care about grammar or punctuation marks. That means your program should remove punctuation marks from the English words before...

  • Dictionary.java DictionaryInterface.java Spell.java SpellCheck.java In this lab you will write a spell check program. The program...

    Dictionary.java DictionaryInterface.java Spell.java SpellCheck.java In this lab you will write a spell check program. The program has two input files: one is the dictionary (a list of valid words) and the other is the input file to be spell checked. The program will read in the words for the dictionary, then will read the input file and check whether each word is found in the dictionary. If not, the user will be prompted to leave the word as is, add...

  • Description: Overview: You will write a program (says wordcountfreq.c) to find out the number of words and how many times each word appears (i.e., the frequency) in multiple text files. Specifically,...

    Description: Overview: You will write a program (says wordcountfreq.c) to find out the number of words and how many times each word appears (i.e., the frequency) in multiple text files. Specifically, the program will first determine the number of files to be processed. Then, the program will createmultiple threads where each thread is responsible for one file to count the number of words appeared in the file and report the number of time each word appears in a global linked-list....

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT