Please explain/demonstrate how to use NLTK to test unigram, bigram, and trigram character models on guessing...

Question

Question

Please explain/demonstrate how to use NLTK to test unigram, bigram, and trigram character models on guessing...

Please explain/demonstrate how to use NLTK to test unigram, bigram, and trigram character models on guessing the language of new, unseen words.

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

Unigrams
• Each individual word (instance of punctuation, etc.) is a token
• There are 16 tokens in this sentence, including the period
– A fact about the unicorn is the same as an alternative fact about the unicorn.
• The counts of these words in the Brown Corpus using NLTK
– a 23195 fact 447 about 1815 the 69971 unicorn 0 is 10109 the 69971 same 686 as 7253
– an 3740 alternative 34 fact 447 about 1815 the 69971 unicorn 0 . 49346
• Probability of each token chosen randomly (and independently of other tokens)
– This is called the unigram probability.
– a 0.02 fact 0.000385 about 0.00156 the 0.0603 unicorn 0.0 is 0.00871 the 0.0603
– Same 0.000591 as 0.00625 an 0.00322 alternative 2.93e-05 fact 0.000385 about 0.00156
– the 0.0603 unicorn 0.0 . 0.0425
• Converting counts to unigram probabilities
– count/total_words ≈ probability
– Assumes that (Brown) corpus is representative of future occurrences

A Unigram Model of a Sentence
• Unigram probability of sentence = product of probabilities of individual words.
• If 1 word has probability of 0, than the probability of the sentence is 0, unless we model Out-of-Vocabulary (OOV) items.
• One OOV model: assume words occurring once are OOV and recalculate tcounts, e.g., unicorn now has a non-zero probability
• New Unigram Probabilities:
– a 0.02 fact 0.000385 about 0.00156 the 0.0603
– unicorn 0.0135 is 0.00871 the 0.0603 same 0.000591
– as 0.00625 an 0.00322 alternative 2.93e-05
– fact 0.000385 about 0.00156 the 0.0603

Bigrams
• Bigram = probability of wordN, given wordN-1
– bigram(the,same) = count(the,same)/count(the)
– count(the,same) = 628
– count(the) = 69,971
– bigram_probability = 628/69971 = 0.00898
• Additional steps
– Include probability that a word occurs a the beginning of a sentence, i.e., bigram(the,START)
– Include probability that a token occurs at the end of a sentence, e.g.,bigram(END,.)
– Include non-zero probability for case when an unknown word follows a known one.
• Backoff Model
– If a bigram has a zero count, “backoff” (use) the unigram of the word
• replace bigram(current_word,previous_word) with unigram(current_word)

NLTK bigram probability of sample sentence
• *start_end* a 0.0182 a fact 0.000388 fact about 0.00447
• about the 0.182 the *oov* 0.0293 *oov* is 0.00485
• is the 0.0786 the same 0.00898 same as 0.035 as an 0.029
• an alternative 0.00241
• alternative fact 0.000385 (Backing off to unigram probability for fact)
• fact about 0.00447 about the 0.182 the *oov* 0.0293
• *oov* . 0.0865 . *start_end* 1.0
• Total = product of the above probabilities = 1.12e-30

Trigrams, 4-grams, N-grams
• Trigram Probability
– Prob(3 token sequence | first 2 tokens)
– count(w−2,w−1,w)/count (w−2,w−1)
– count(the, same, as)/count(the, same)

In this way,we use NLTK to test unigram, bigram, and trigram character models using these probability
functions.However,Markov assumptions also helps to find the probability of differrent model.

Add a comment

Answer 2

Please explain/demonstrate how to use NLTK to test unigram, bigram, and trigram character models on guessing...

Homework Answers

Add Answer to:
Please explain/demonstrate how to use NLTK to test unigram, bigram, and trigram character models on guessing...

Post as a guest

Earn Coins

22.1 This exercise explores the quality of the n-gram model of language. Find or create a...

Please use python Programming Language: Select one of the following topics: Band Character Account Create a...

I got the right answer just by guessing. Please explain how to do it! i will...

Culturally competence according to definitions and concepts? Demonstrate critical thinking and explain how the future of...

Please explain how one would use the Prony Brake test to determine the stall torque (Nm)...

USE C++ Demonstrate how to write an interface for a VEHICLE class using C++. In your...

please explain your answer. 4. If I enter a 8 character string in a field of...

Please use java language to answer these qustions, and then test the code if it need...

please show how you use the character table (a) Draw the possible isomers of Ru(CO)4CI2 and...

please explain thanks Search 20:14 2. Let a, b, c, d). Express the next language on...

Please explain/demonstrate how to use NLTK to test unigram, bigram, and trigram character models on guessing...

Homework Answers

Add Answer to: Please explain/demonstrate how to use NLTK to test unigram, bigram, and trigram character models on guessing...

Post as a guest

Earn Coins

Add Answer to:
Please explain/demonstrate how to use NLTK to test unigram, bigram, and trigram character models on guessing...