Question

consider following text in english: I've to say, iGuess, Apple has by far the best customer...

consider following text in english:

I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS 7 is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone 5S!!!!! #apple #iPhone5S pic.twitter.com/XmHJCU4pcb @apple Omg the iPhone 5S is so cooool it can read your finger print to unlock you iPhone 5S and to make purchases without a passcode.

Identify the BASIC TEXT PREPROCESSING required on the text?? Give EXAMPLES from the text given ABOVE for the same.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS 7 is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone 5S!!!!! #apple #iPhone5S pic.twitter.com/XmHJCU4pcb @apple Omg the iPhone 5S is so cooool it can read your finger print to unlock you iPhone 5S and to make purchases without a passcode.


From the above text, we start with text normalization first. Text normalization includes:

  1. converting all letters to lower or upper case
  2. removing punctuations, accent marks
  3. removing white spaces
  4. converting numbers into words or removing numbers
  5. expanding abbreviations
  6. removing stop words, sparse terms, and particular words
  7. text canonicalization

Now, I am gonna tell you normalization steps:
Example 1. Convert text to lowercase

Python code:

input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS 7 is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone 5S!!!!! #apple #iPhone5S pic.twitter.com/XmHJCU4pcb @apple Omg the iPhone 5S is so cooool it can read your finger print to unlock you iPhone 5S and to make purchases without a passcode."
input_str = input_str.lower()
print(input_str)

Output:

i've to say, iguess, apple has by far the best customer care service i have ever received! @apple @appstore ios 7 is so s m o o t h & beautiful!! #thanxapple @apple luv u @apple thank you @apple, loving my new iphone 5s!!!!! #apple #iphone5s pic.twitter.com/xmhjcu4pcb @apple omg the iphone 5s is so cooool it can read your finger print to unlock you iphone 5s and to make purchases without a passcode.

Remove numbers

Remove numbers if they are not relevant to your analyses. Usually, regular expressions are used to remove numbers.

Example 2. Numbers removing

Python code:

import re
input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS 7 is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone 5S!!!!! #apple #iPhone5S pic.twitter.com/XmHJCU4pcb @apple Omg the iPhone 5S is so cooool it can read your finger print to unlock you iPhone 5S and to make purchases without a passcode."
result = re.sub("\d+","", input_str)
print(result)

Output:

I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode.

Remove punctuation

The following code removes this set of symbols [!”#$%&’()*+,-./:;<=>?@[\]^_`{|}~]:

Example 3. Punctuation removal

Python code:

import string
string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
input_str="I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode."
result=input_str.strip(string.punctuation)
print(result)

Output:

I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode

Remove whitespaces

To remove leading and ending spaces, you can use the strip() function:

Example 4. White spaces removal

Python code:

input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode"
input_str = input_str.strip()
print(input_str)

Output:

I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode

Remove stop words

“Stop words” are the most common words in a language like “the”, “a”, “on”, “is”, “all”. These words do not carry important meaning and are usually removed from texts. It is possible to remove stop words using Natural Language Toolkit (NLTK), a suite of libraries and programs for symbolic and statistical natural language processing.

Tokenization

Tokenization is the process of splitting the given text into smaller pieces called tokens. Words, punctuation marks, numbers can be considered as tokens.

Example 6. Stop words removal

Code:

input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode"

stop_words = set(stopwords.words('english'))
from nltk.tokenize import word_tokenize
tokens = word_tokenize(input_str)
result = [i for i in tokens if not i in stop_words]
print (result)

Output:

['make', 'H', 'iGuess,', 'has', 'ever', '@Appstore', 'received!', 'passcode', 'purchases', 'can', 'you', 'finger', 'Apple', 'best', 'I', '@APPLE', 'read', 'customer', 'your', 'Thank', 'far', 'care', 'T', 'S', 'and', 'to', 'so', '@Apple', 'service', 'O', '#ThanxApple', '@apple,', 'cooool', 'Luv', 'unlock', 'have', 'is', 'beautiful!!', 'pic.twitter.com/XmHJCUpcb', 'S!!!!!', "I've", 'by', 'my', 'U', 'it', 'a', '@apple', 'say,', 'Omg', 'iOS', 'loving', 'without', '#apple', 'new', '&', 'the', '#iPhoneS', 'M', 'print', 'iPhone']

Remove sparse terms and particular words

In some cases, it’s necessary to remove sparse terms or particular words from texts. This task can be done using stop words removal techniques considering that any group of words can be chosen as the stop words.

Stemming

Stemming is a process of reducing words to their word stem, base or root form (for example, cars — car, booked— book). The main two algorithms are the Porter stemming algorithm (removes common inflexional endings from words ) and Lancaster stemming algorithm (a deeper stemming algorithm).

Code:

from nltk.stem import PorterStemmer

from nltk.tokenize import word_tokenize

ps = PorterStemmer()

input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode"

words = word_tokenize(input_str)

   

for w in words:

    print(w, " : ", ps.stem(w))

output-

I ' ve to say , iGuess , Appl ha by far the best custom care servic I have ever receiv ! @ Appl @ Appstor iO is so S M O O T H & beauti !! # ThanxAppl @ Appl Luv U @ APPL Thank you @ appl , love my new iPhon S !!!!! # appl # iPhon pic . twitter . com / XmHJCUpcb @ appl Omg the iPhon S is so cooool it can read your finger print to unlock you iPhon S and to make purchas without a passcod

Chunking (shallow parsing)

Chunking is a natural language process that identifies constituent parts of sentences (nouns, verbs, adjectives, etc.) and links them to higher order units that have discrete grammatical meanings (noun groups or phrases, verb groups, etc.) [23]. Chunking tools: NLTK, TreeTagger chunker, Apache OpenNLP, General Architecture for Text Engineering (GATE), FreeLing.

Example 7. Chunking using NLTK:

The first step is to determine the part of speech for each word:

Code:

input_str="I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS  is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode"
from textblob import TextBlob
result = TextBlob(input_str)
print(result.tags)

Output:

[I/PRP '/VBP ve/JJ to/TO say/VB ,/, iGuess/NNP ,/, Apple/NNP has/VBZ by/IN far/RB the/DT best/JJS customer/NN care/NN service/NN I/PRP have/VBP ever/RB received/VBN !/.. @/IN Apple/NNP @/NNP Appstore/NNP iOS/NNP Â/NNP is/VBZ so/RB S/NNP M/NNP O/NNP O/NNP T/NNP H/NNP &/CC beautiful/JJ !!/NN. #/# ThanxApple/NNP @/NNP Apple/NNP Luv/NNP U/NNP @/NNP APPLE/NNP Thank/NNP you/PRP @/VBP apple/JJ ,/, loving/VBG my/PRP$ new/JJ iPhone/NNP S/NNP !!!!!/NNP. #/# apple/NN #/# iPhoneS/NNP pic/JJ ./. twitter/NN ./. com/NN //: XmHJCUpcb/NNP @/: apple/NN Omg/NNP the/DT iPhone/NNP S/NNP is/VBZ so/RB cooool/NN it/PRP can/MD read/VB your/PRP$ finger/NN print/NN to/TO unlock/VB you/PRP iPhone/NNP S/NNP and/CC to/TO make/VB purchases/NNS without/IN a/DT passcode/NN]

After the text preprocessing is done, the result may be used for more complicated NLP tasks, for example, machine translation or natural language generation.

Add a comment
Know the answer?
Add Answer to:
consider following text in english: I've to say, iGuess, Apple has by far the best customer...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • CASE 8 Unlocking the Secrets of the Apple iPhone in the Name of access the male...

    CASE 8 Unlocking the Secrets of the Apple iPhone in the Name of access the male San Bernardino suspect's iPhone 5c. Cook stated: Antiterrorism We are challenging the FBI's demands with the deepes respect for American democracy and a love of our country. We believe it would be in the best interest of everyone to step back and consider the implications While we believe the FBI's intentions are good, if would be wrong for the w e nt to force...

  • Problem 1C: Chapter: CHCC Problem: 1C Say It Ain't So! Is This the Real Thing? INTRODUCTION...

    Problem 1C: Chapter: CHCC Problem: 1C Say It Ain't So! Is This the Real Thing? INTRODUCTION David Ortega is the lead researcher for an upscale restaurant group hoping to add another chain that would compete directly with the upscale Smith and Wollensky restaurants (http://www.smith-andwollensky.com). The Smith and Wollensky Restaurant Group operates a handful of iconic restaurants around the country. The average check for a customer at Smith and Wollensky is approximately $80 to $90. Whenever a new venture of this...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT