3. Explain the following terms in detail :
i) Tokens
ii) Pattern
iii) Lexemes
iv) Sentinels
v) Sentential foam
i)
In Natural Language Processing(NLP) before processing the text
data, we need to break down the text data into the smallest
meaningful units. These units can be made of a string of
characters, numbers, punctuation marks, etc. The process of
dividing the text data into tokens is called tokenization and after
the process of tokenization, the tokens can be used for advanced
text processing. The tokens are identified by understanding the
word boundaries, basically to understand where does a meaningful
token starts and ends. It can be done by checking the token next tp
spaces in sentences or by delimiters depending upon the
document.
For e.g, A particular sentence could be This is life. Here if we
start to divide this sentence in terms of tokens, we can have 3
meaningful tokens which are This | is | life. All 3 tokens are
meaningful.
Further, there could be another example, involving digits, There
are 7 sisters. The tokens will be, There | are | 7 | sisters. Here
even the digits are classified as a token.
The tokenization process is done using libraries such as NLTK
and Spacey in Python.
It can be also Identifiers, Keywords, Constant, etc.
ii)
Patterns are nothing but a set of Regular Expression that is used
to identify the Regular Expression in the text document. The
Regular Expression is also known as Regex and is used for a variety
of purposes such as string replacement, string search, string
extraction, etc. The text document is not as organized as a
database where useful values can be extracted from a column by
applying correct filters. In-text documents, the process of finding
a particular pattern can be very exhausting and manual activity. It
becomes really hard when the text size is really large. In these
cases, we come up with string matching operations using the Pattern
matching feature.
For this process, we need to first identify the appropriate
pattern. Let us say if we need to identify the money values from a
document. We will first need to understand, what currency we are
searching for. If it is a dollar value that we are looking at, we
will use a $ sign.Next, we have to specify, what we are looking
after $ sign, is the money value present in letters or numbers. If
the money value is present in numbers, we will use the expression
[0-9] to find those numbers. So our regular expression or pattern
will be using $[0-9] as a pattern to identify the dollar values in
the text document.
iii)
Lexemes are the most basic unit in language processing. These are nothing but a group of characters that are used to match a pattern which in turn is used to identify tokens.
Lexemes ----> Pattern Matching -----> Tokens
They are equipped to handle alphanumeric characters. Once a set of strings is identified then it uses pattern matching to put up these sets of strings in different types of tokens.
For e.g. The line int num1=5;
First, we will identify all the different lexemes in this line. Based on the space delimiter the lexemes are int | num1 | = | 5 |;
Here the int can be matched with the existing pattern of
keywords and can be classified as a keyword.
The word num1 will not match any set of keyword pattern, thus this
lexeme would be classified into variables.
= and ; will be classified as operators and 5 would be classified
as constant.
iv) Got no information from search
v) A sentential form is nothing but derived from the
start symbol. It is a string value that consists of terminals and
non-terminal values. The difference between sentential form and
sentence is that sentences don't consist of non-terminals.
The sentential form consists of only terminal symbols. There could
be right and left sentential form based on the direction of
expansion.
For e.g a relation
S→aSa ∣ bSb∣ ϵ
Here the sentential form can be derived using the derivation process. The Sentential form can be abbSbba while the Sentence, in this case, will be abbbba because it doesn't have any terminal values.
3. Explain the following terms in detail : i) Tokens ii) Pattern iii) Lexemes iv) Sentinels...
Briefly explain the meaning of the following terms: (i) primary photochemical process (ii) quantum yield (iii) Stark-Einstein law (iv) active medium (v) optical feedback
19. Which of the following molecules are achiral? A) II, III B) I, II C) I, IV D) III, IV 20. How many stereogenic centers are present in ephedrine, a bronchodilator and decongestant? ephedrine A) 0 B) 1 C) 2 D) 3 21. Rank the following groups in order of decreasing priority according to the Cahn-Ingold- Prelog system. -NH2 -NHCH3 -CH2NH2 CH2NHCH3 п V . A) B) I>II> III > IV II >I>IV > III II C II >I> III...
Arrange the indicated protons by acidity, most acidic first: H I II III IV IV>II> III >I OI>IV>III > II OIV >I> III > II I> IV > II > III What product is expected from the following reaction? но 1. LiAIH ether 2. H,00 Оно он он н Arrange the indicated protons by acidity, most acidic first: H I II III IV IV>II> III >I OI>IV>III > II OIV >I> III > II I> IV > II > III...
3. Rank in order of decreasing basicity: HC=C: CHCHO CHC HINO H₂C=Cif I III IV V A. III >I>V > II > IV D. III > IV > II >V>I B. III > V > IV >> II E. IV > II >I> III > V C. V>I> III > II > IV
Which of the following compounds have conjugated double bonds? I II III IV II and v II, IV, and v I and III I, III, and IV all of them
And explain why? 3. Which pair of the following compounds are constitutional isomers? III IV I and II II and III III and IV I and IV
Which of the following is the best method to prepare aspirin? I. II. III. IV. V. A) I B) II C) III D) IV E) V
Question 1 Which of the following hydrogens is the most acidic? II IV I III A. IV B. V C. III D.I E. II Question 2 What is the product, Z, of the following reaction sequence? iz H30+ T-H20)* ia. CI H307 + Z مل I II III IV A. IV B. II C. III D.V E.
Which terms from this list BEST describe the following compound? H-C=c- I. Alkane II. Alkene III. Alkyne IV. Aromatic V. Saturated hydrocarbon VI. Unsaturated hydrocarbon I and V B. II and VI II and V C. III and VI D. E. III and V
Rank the following compounds from lowest to highest heat of hydrogenation: I, II, III, IV IV, III, II, I II, III, IV, I I, IV, III, II none of these choices