consider following text in english:
I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS 7 is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone 5S!!!!! #apple #iPhone5S pic.twitter.com/XmHJCU4pcb @apple Omg the iPhone 5S is so cooool it can read your finger print to unlock you iPhone 5S and to make purchases without a passcode.
Identify the BASIC TEXT PREPROCESSING required on the text?? Give EXAMPLES from the text given ABOVE for the same.
I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS 7 is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone 5S!!!!! #apple #iPhone5S pic.twitter.com/XmHJCU4pcb @apple Omg the iPhone 5S is so cooool it can read your finger print to unlock you iPhone 5S and to make purchases without a passcode.
From the above text, we start with text normalization first. Text
normalization includes:
Now, I am gonna tell you normalization steps:
Example 1. Convert text to lowercase
Python code:
input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS 7 is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone 5S!!!!! #apple #iPhone5S pic.twitter.com/XmHJCU4pcb @apple Omg the iPhone 5S is so cooool it can read your finger print to unlock you iPhone 5S and to make purchases without a passcode." input_str = input_str.lower() print(input_str)
Output:
i've to say, iguess, apple has by far the best customer care service i have ever received! @apple @appstore ios 7 is so s m o o t h & beautiful!! #thanxapple @apple luv u @apple thank you @apple, loving my new iphone 5s!!!!! #apple #iphone5s pic.twitter.com/xmhjcu4pcb @apple omg the iphone 5s is so cooool it can read your finger print to unlock you iphone 5s and to make purchases without a passcode.
Remove numbers
Remove numbers if they are not relevant to your analyses. Usually, regular expressions are used to remove numbers.
Example 2. Numbers removing
Python code:
import re input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS 7 is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone 5S!!!!! #apple #iPhone5S pic.twitter.com/XmHJCU4pcb @apple Omg the iPhone 5S is so cooool it can read your finger print to unlock you iPhone 5S and to make purchases without a passcode." result = re.sub("\d+","", input_str) print(result)
Output:
I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode.
Remove punctuation
The following code removes this set of symbols [!”#$%&’()*+,-./:;<=>?@[\]^_`{|}~]:
Example 3. Punctuation removal
Python code:
import string string.punctuation '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~' input_str="I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode." result=input_str.strip(string.punctuation) print(result)
Output:
I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode
Remove whitespaces
To remove leading and ending spaces, you can use the strip() function:
Example 4. White spaces removal
Python code:
input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode" input_str = input_str.strip() print(input_str)
Output:
I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode
Remove stop words
“Stop words” are the most common words in a language like “the”, “a”, “on”, “is”, “all”. These words do not carry important meaning and are usually removed from texts. It is possible to remove stop words using Natural Language Toolkit (NLTK), a suite of libraries and programs for symbolic and statistical natural language processing.
Tokenization
Tokenization is the process of splitting the given text into smaller pieces called tokens. Words, punctuation marks, numbers can be considered as tokens.
Example 6. Stop words removal
Code:
input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode" stop_words = set(stopwords.words('english')) from nltk.tokenize import word_tokenize tokens = word_tokenize(input_str) result = [i for i in tokens if not i in stop_words] print (result)
Output:
['make', 'H', 'iGuess,', 'has', 'ever', '@Appstore', 'received!', 'passcode', 'purchases', 'can', 'you', 'finger', 'Apple', 'best', 'I', '@APPLE', 'read', 'customer', 'your', 'Thank', 'far', 'care', 'T', 'S', 'and', 'to', 'so', '@Apple', 'service', 'O', '#ThanxApple', '@apple,', 'cooool', 'Luv', 'unlock', 'have', 'is', 'beautiful!!', 'pic.twitter.com/XmHJCUpcb', 'S!!!!!', "I've", 'by', 'my', 'U', 'it', 'a', '@apple', 'say,', 'Omg', 'iOS', 'loving', 'without', '#apple', 'new', '&', 'the', '#iPhoneS', 'M', 'print', 'iPhone']
Remove sparse terms and particular words
In some cases, it’s necessary to remove sparse terms or particular words from texts. This task can be done using stop words removal techniques considering that any group of words can be chosen as the stop words.
Stemming
Stemming is a process of reducing words to their word stem, base or root form (for example, cars — car, booked— book). The main two algorithms are the Porter stemming algorithm (removes common inflexional endings from words ) and Lancaster stemming algorithm (a deeper stemming algorithm).
Code:
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
ps = PorterStemmer()
input_str = "I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode"
words = word_tokenize(input_str)
for w in words:
print(w, " : ", ps.stem(w))
output-
I ' ve to say , iGuess , Appl ha by far the best custom care servic I have ever receiv ! @ Appl @ Appstor iO is so S M O O T H & beauti !! # ThanxAppl @ Appl Luv U @ APPL Thank you @ appl , love my new iPhon S !!!!! # appl # iPhon pic . twitter . com / XmHJCUpcb @ appl Omg the iPhon S is so cooool it can read your finger print to unlock you iPhon S and to make purchas without a passcod
Chunking (shallow parsing)
Chunking is a natural language process that identifies constituent parts of sentences (nouns, verbs, adjectives, etc.) and links them to higher order units that have discrete grammatical meanings (noun groups or phrases, verb groups, etc.) [23]. Chunking tools: NLTK, TreeTagger chunker, Apache OpenNLP, General Architecture for Text Engineering (GATE), FreeLing.
Example 7. Chunking using NLTK:
The first step is to determine the part of speech for each word:
Code:
input_str="I've to say, iGuess, Apple has by far the best customer care service I have ever received! @Apple @Appstore iOS is so S M O O T H & beautiful!! #ThanxApple @Apple Luv U @APPLE Thank you @apple, loving my new iPhone S!!!!! #apple #iPhoneS pic.twitter.com/XmHJCUpcb @apple Omg the iPhone S is so cooool it can read your finger print to unlock you iPhone S and to make purchases without a passcode" from textblob import TextBlob result = TextBlob(input_str) print(result.tags)
Output:
[I/PRP '/VBP ve/JJ to/TO say/VB ,/, iGuess/NNP ,/, Apple/NNP has/VBZ by/IN far/RB the/DT best/JJS customer/NN care/NN service/NN I/PRP have/VBP ever/RB received/VBN !/.. @/IN Apple/NNP @/NNP Appstore/NNP iOS/NNP Â/NNP is/VBZ so/RB S/NNP M/NNP O/NNP O/NNP T/NNP H/NNP &/CC beautiful/JJ !!/NN. #/# ThanxApple/NNP @/NNP Apple/NNP Luv/NNP U/NNP @/NNP APPLE/NNP Thank/NNP you/PRP @/VBP apple/JJ ,/, loving/VBG my/PRP$ new/JJ iPhone/NNP S/NNP !!!!!/NNP. #/# apple/NN #/# iPhoneS/NNP pic/JJ ./. twitter/NN ./. com/NN //: XmHJCUpcb/NNP @/: apple/NN Omg/NNP the/DT iPhone/NNP S/NNP is/VBZ so/RB cooool/NN it/PRP can/MD read/VB your/PRP$ finger/NN print/NN to/TO unlock/VB you/PRP iPhone/NNP S/NNP and/CC to/TO make/VB purchases/NNS without/IN a/DT passcode/NN]
After the text preprocessing is done, the result may be used for more complicated NLP tasks, for example, machine translation or natural language generation.
consider following text in english: I've to say, iGuess, Apple has by far the best customer...
CASE 8 Unlocking the Secrets of the Apple iPhone in the Name of access the male San Bernardino suspect's iPhone 5c. Cook stated: Antiterrorism We are challenging the FBI's demands with the deepes respect for American democracy and a love of our country. We believe it would be in the best interest of everyone to step back and consider the implications While we believe the FBI's intentions are good, if would be wrong for the w e nt to force...
Problem 1C: Chapter: CHCC Problem: 1C Say It Ain't So! Is This the Real Thing? INTRODUCTION David Ortega is the lead researcher for an upscale restaurant group hoping to add another chain that would compete directly with the upscale Smith and Wollensky restaurants (http://www.smith-andwollensky.com). The Smith and Wollensky Restaurant Group operates a handful of iconic restaurants around the country. The average check for a customer at Smith and Wollensky is approximately $80 to $90. Whenever a new venture of this...