consider following text in english: I've to say, iGuess, Apple has by far the best customer...

Question

Question

consider following text in english: I've to say, iGuess, Apple has by far the best customer...

consider following text in english:

I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS 7 is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone 5S!!!!! #apple #iPhone5S pic.twitter.com/XmHJCU4pcb @apple Omg the iPhone 5S is so cooool it can read your finger print to unlock you iPhone 5S and to make purchases without a passcode.

Identify the BASIC TEXT PREPROCESSING required on the text?? Give EXAMPLES from the text given ABOVE for the same.

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS 7 is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone 5S!!!!! #apple #iPhone5S pic.twitter.com/XmHJCU4pcb @apple Omg the iPhone 5S is so cooool it can read your finger print to unlock you iPhone 5S and to make purchases without a passcode.

From the above text, we start with text normalization first. Text normalization includes:

converting all letters to lower or upper case
removing punctuations, accent marks
removing white spaces
converting numbers into words or removing numbers
expanding abbreviations
removing stop words, sparse terms, and particular words
text canonicalization

Now, I am gonna tell you normalization steps:
Example 1. Convert text to lowercase

Python code:

input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS 7 is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone 5S!!!!! #apple #iPhone5S pic.twitter.com/XmHJCU4pcb @apple Omg the iPhone 5S is so cooool it can read your finger print to unlock you iPhone 5S and to make purchases without a passcode."
input_str = input_str.lower()
print(input_str)

Output:

i've to say, iguess, apple has by far the best customer care service i have ever received! @apple @appstore ios 7 is so s m o o t h & beautiful!! #thanxapple @apple luv u @apple thank you @apple, loving my new iphone 5s!!!!! #apple #iphone5s pic.twitter.com/xmhjcu4pcb @apple omg the iphone 5s is so cooool it can read your finger print to unlock you iphone 5s and to make purchases without a passcode.

Remove numbers

Remove numbers if they are not relevant to your analyses. Usually, regular expressions are used to remove numbers.

Example 2. Numbers removing

Python code:

import re
input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS 7 is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone 5S!!!!! #apple #iPhone5S pic.twitter.com/XmHJCU4pcb @apple Omg the iPhone 5S is so cooool it can read your finger print to unlock you iPhone 5S and to make purchases without a passcode."
result = re.sub("\d+","", input_str)
print(result)

Output:

I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode.

Remove punctuation

The following code removes this set of symbols [!”#$%&’()*+,-./:;<=>?@[\]^_`{|}~]:

Example 3. Punctuation removal

Python code:

import string
string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
input_str="I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode."
result=input_str.strip(string.punctuation)
print(result)

Output:

I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode

Remove whitespaces

To remove leading and ending spaces, you can use the strip() function:

Example 4. White spaces removal

Python code:

input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode"
input_str = input_str.strip()
print(input_str)

Output:

I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode

Remove stop words

“Stop words” are the most common words in a language like “the”, “a”, “on”, “is”, “all”. These words do not carry important meaning and are usually removed from texts. It is possible to remove stop words using Natural Language Toolkit (NLTK), a suite of libraries and programs for symbolic and statistical natural language processing.

Tokenization

Tokenization is the process of splitting the given text into smaller pieces called tokens. Words, punctuation marks, numbers can be considered as tokens.

Example 6. Stop words removal

Code:

input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode"

stop_words = set(stopwords.words('english'))
from nltk.tokenize import word_tokenize
tokens = word_tokenize(input_str)
result = [i for i in tokens if not i in stop_words]
print (result)

Output:

['make', 'H', 'iGuess,', 'has', 'ever', '@Appstore', 'received!', 'passcode', 'purchases', 'can', 'you', 'finger', 'Apple', 'best', 'I', '@APPLE', 'read', 'customer', 'your', 'Thank', 'far', 'care', 'T', 'S', 'and', 'to', 'so', '@Apple', 'service', 'O', '#ThanxApple', '@apple,', 'cooool', 'Luv', 'unlock', 'have', 'is', 'beautiful!!', 'pic.twitter.com/XmHJCUpcb', 'S!!!!!', "I've", 'by', 'my', 'U', 'it', 'a', '@apple', 'say,', 'Omg', 'iOS', 'loving', 'without', '#apple', 'new', '&', 'the', '#iPhoneS', 'M', 'print', 'iPhone']

Remove sparse terms and particular words

In some cases, it’s necessary to remove sparse terms or particular words from texts. This task can be done using stop words removal techniques considering that any group of words can be chosen as the stop words.

Stemming

Stemming is a process of reducing words to their word stem, base or root form (for example, cars — car, booked— book). The main two algorithms are the Porter stemming algorithm (removes common inflexional endings from words ) and Lancaster stemming algorithm (a deeper stemming algorithm).

Code:

from nltk.stem import PorterStemmer

from nltk.tokenize import word_tokenize

ps = PorterStemmer()

input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode"

words = word_tokenize(input_str)

for w in words:

print(w, " : ", ps.stem(w))

output-

I ' ve to say , iGuess , Appl ha by far the best custom care servic I have ever receiv ! @ Appl @ Appstor iO is so S M O O T H & beauti !! # ThanxAppl @ Appl Luv U @ APPL Thank you @ appl , love my new iPhon S !!!!! # appl # iPhon pic . twitter . com / XmHJCUpcb @ appl Omg the iPhon S is so cooool it can read your finger print to unlock you iPhon S and to make purchas without a passcod

Chunking (shallow parsing)

Chunking is a natural language process that identifies constituent parts of sentences (nouns, verbs, adjectives, etc.) and links them to higher order units that have discrete grammatical meanings (noun groups or phrases, verb groups, etc.) [23]. Chunking tools: NLTK, TreeTagger chunker, Apache OpenNLP, General Architecture for Text Engineering (GATE), FreeLing.

Example 7. Chunking using NLTK:

The first step is to determine the part of speech for each word:

Code:

input_str="I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode"
from textblob import TextBlob
result = TextBlob(input_str)
print(result.tags)

Output:

[I/PRP '/VBP ve/JJ to/TO say/VB ,/, iGuess/NNP ,/, Apple/NNP has/VBZ by/IN far/RB the/DT best/JJS customer/NN care/NN service/NN I/PRP have/VBP ever/RB received/VBN !/.. @/IN Apple/NNP @/NNP Appstore/NNP iOS/NNP Â/NNP is/VBZ so/RB S/NNP M/NNP O/NNP O/NNP T/NNP H/NNP &/CC beautiful/JJ !!/NN. #/# ThanxApple/NNP @/NNP Apple/NNP Luv/NNP U/NNP @/NNP APPLE/NNP Thank/NNP you/PRP @/VBP apple/JJ ,/, loving/VBG my/PRP$ new/JJ iPhone/NNP S/NNP !!!!!/NNP. #/# apple/NN #/# iPhoneS/NNP pic/JJ ./. twitter/NN ./. com/NN //: XmHJCUpcb/NNP @/: apple/NN Omg/NNP the/DT iPhone/NNP S/NNP is/VBZ so/RB cooool/NN it/PRP can/MD read/VB your/PRP$ finger/NN print/NN to/TO unlock/VB you/PRP iPhone/NNP S/NNP and/CC to/TO make/VB purchases/NNS without/IN a/DT passcode/NN]

After the text preprocessing is done, the result may be used for more complicated NLP tasks, for example, machine translation or natural language generation.

Add a comment

Answer 2

consider following text in english: I've to say, iGuess, Apple has by far the best customer...

Homework Answers

Add Answer to:
consider following text in english: I've to say, iGuess, Apple has by far the best customer...

Post as a guest

Earn Coins

CASE 8 Unlocking the Secrets of the Apple iPhone in the Name of access the male...

Problem 1C: Chapter: CHCC Problem: 1C Say It Ain't So! Is This the Real Thing? INTRODUCTION...

consider following text in english: I've to say, iGuess, Apple has by far the best customer...

Homework Answers

Add Answer to: consider following text in english: I've to say, iGuess, Apple has by far the best customer...

Post as a guest

Earn Coins

CASE 8 Unlocking the Secrets of the Apple iPhone in the Name of access the male...

Problem 1C: Chapter: CHCC Problem: 1C Say It Ain't So! Is This the Real Thing? INTRODUCTION...

Add Answer to:
consider following text in english: I've to say, iGuess, Apple has by far the best customer...