1) If there are N words after the tokenization process, how many bi-grams and tri-grams can be ge...

Question

Question

1) If there are N words after the tokenization process, how many bi-grams and tri-grams can be generated

a) N-1, N-2

b) N-2, N-1

c) N, N-1

d)N-2,N-3

------------------------------------------------------------------------

2) Regarding the Document Term Matrix(DTM) which of the following is true?

a) Each value(typically) contains the number of appearances of that term in that document

b) each row represents one term

c) each column represents one document

------------------------------------------------------------------------

3) “unnest_tokens" function is used to reduce the words to their base or root from.

True or False

------------------------------------------------------------------------

4) Which of the following lexicon categorizes words into categories of positive, negative, anger, anticipation, disgust etc?

a) nrc

b) AFINN

c) bing

d) LDA

Answer 1

Answer #1

1) C = N, N - 1

2) A = Each value(typically) contains the number of appearances of that term in that document.

3) False

● unnest_ tokens used to Split a column into tokens.

4) A = nrc

Answer 2