3. Explain the following terms in detail : i) Tokens ii) Pattern iii) Lexemes iv) Sentinels...

Question

Question

3. Explain the following terms in detail : i) Tokens ii) Pattern iii) Lexemes iv) Sentinels...

3. Explain the following terms in detail :

i) Tokens

ii) Pattern

iii) Lexemes

iv) Sentinels

v) Sentential foam

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

i)
In Natural Language Processing(NLP) before processing the text data, we need to break down the text data into the smallest meaningful units. These units can be made of a string of characters, numbers, punctuation marks, etc. The process of dividing the text data into tokens is called tokenization and after the process of tokenization, the tokens can be used for advanced text processing. The tokens are identified by understanding the word boundaries, basically to understand where does a meaningful token starts and ends. It can be done by checking the token next tp spaces in sentences or by delimiters depending upon the document.
For e.g, A particular sentence could be This is life. Here if we start to divide this sentence in terms of tokens, we can have 3 meaningful tokens which are This | is | life. All 3 tokens are meaningful.
Further, there could be another example, involving digits, There are 7 sisters. The tokens will be, There | are | 7 | sisters. Here even the digits are classified as a token.

The tokenization process is done using libraries such as NLTK and Spacey in Python.
It can be also Identifiers, Keywords, Constant, etc.

ii)
Patterns are nothing but a set of Regular Expression that is used to identify the Regular Expression in the text document. The Regular Expression is also known as Regex and is used for a variety of purposes such as string replacement, string search, string extraction, etc. The text document is not as organized as a database where useful values can be extracted from a column by applying correct filters. In-text documents, the process of finding a particular pattern can be very exhausting and manual activity. It becomes really hard when the text size is really large. In these cases, we come up with string matching operations using the Pattern matching feature.
For this process, we need to first identify the appropriate pattern. Let us say if we need to identify the money values from a document. We will first need to understand, what currency we are searching for. If it is a dollar value that we are looking at, we will use a $ sign.Next, we have to specify, what we are looking after $ sign, is the money value present in letters or numbers. If the money value is present in numbers, we will use the expression [0-9] to find those numbers. So our regular expression or pattern will be using $[0-9] as a pattern to identify the dollar values in the text document.

iii)

Lexemes are the most basic unit in language processing. These are nothing but a group of characters that are used to match a pattern which in turn is used to identify tokens.

Lexemes ----> Pattern Matching -----> Tokens

They are equipped to handle alphanumeric characters. Once a set of strings is identified then it uses pattern matching to put up these sets of strings in different types of tokens.

For e.g. The line int num1=5;

First, we will identify all the different lexemes in this line. Based on the space delimiter the lexemes are int | num1 | = | 5 |;

Here the int can be matched with the existing pattern of keywords and can be classified as a keyword.
The word num1 will not match any set of keyword pattern, thus this lexeme would be classified into variables.
= and ; will be classified as operators and 5 would be classified as constant.

iv) Got no information from search

v) A sentential form is nothing but derived from the start symbol. It is a string value that consists of terminals and non-terminal values. The difference between sentential form and sentence is that sentences don't consist of non-terminals.
The sentential form consists of only terminal symbols. There could be right and left sentential form based on the direction of expansion.

For e.g a relation

S→aSa ∣ bSb∣ ϵ

Here the sentential form can be derived using the derivation process. The Sentential form can be abbSbba while the Sentence, in this case, will be abbbba because it doesn't have any terminal values.

Add a comment

Answer 2

3. Explain the following terms in detail : i) Tokens ii) Pattern iii) Lexemes iv) Sentinels...

Homework Answers

Add Answer to:
3. Explain the following terms in detail : i) Tokens ii) Pattern iii) Lexemes iv) Sentinels...

Post as a guest

Earn Coins

Briefly explain the meaning of the following terms: (i) primary photochemical process (ii) quantum yield (iii)...

19. Which of the following molecules are achiral? A) II, III B) I, II C) I, IV D) III, IV 20. How many stereogenic...

Arrange the indicated protons by acidity, most acidic first: H I II III IV IV>II> III...

3. Rank in order of decreasing basicity: HC=C: CHCHO CHC HINO H₂C=Cif I III IV V A. III >I>V > II > IV...

Which of the following compounds have conjugated double bonds? I II III IV II and v...

And explain why? 3. Which pair of the following compounds are constitutional isomers? III IV I...

Which of the following is the best method to prepare aspirin? I. II. III. IV. V....

Question 1 Which of the following hydrogens is the most acidic? II IV I III A....

Which terms from this list BEST describe the following compound? H-C=c- I. Alkane II. Alkene III....

Rank the following compounds from lowest to highest heat of hydrogenation: I, II, III, IV IV,...

3. Explain the following terms in detail : i) Tokens ii) Pattern iii) Lexemes iv) Sentinels...

Homework Answers

Add Answer to: 3. Explain the following terms in detail : i) Tokens ii) Pattern iii) Lexemes iv) Sentinels...

Post as a guest

Earn Coins

Add Answer to:
3. Explain the following terms in detail : i) Tokens ii) Pattern iii) Lexemes iv) Sentinels...