Question

Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit...

Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit the file, your code and the result.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Choose the text file for which you need to create a word cloud. For instance I am going to create a word cloud of Mr Robot Series. "Welcome back, my tenderfoot hackers! Well, the first season of Mr. Robot just ended and Elliot and fsociety successfully took down Evil Corp! They have effectively destroyed over 70% of the world's consumer and student debt! Free at last! Free at last! Of course, global financial markets crashed as well, but that's another story."

& saved as hacks.txt in Desktop and path is C:\\Desktop\\Word_Cloud\\MrRobot\\project\\hacks.txt

Installing Packages :

Open RStudio.You will need to install the packages “tm” and “wordcloud”. Next you need to load the packages in R

Run the following commands in RStudio.

#Installing Packages

install.packages (“tm”)

install.packages (“wordcloud”)

install.packages (“RColorBrewer”)

#Loading Packages

library(tm)

library(wordcloud)

library(RColorBrewer)

library(tm) library(wordcloud) library(RColorBrewer) speech = “ C:\\Desktop\\Word_Cloud\\MrRobot\\project\\hacks.txt” hack_txt = readLines(speech) hack<-Corpus(VectorSource(hack_txt)) inspect(hack)[1:10] hack_data<-tm_map(hack,stripWhitespace) hack_data<-tm_map(hack_data,tolower) hack_data<-tm_map(hack_data,removeNumbers) hack_data<-tm_map(hack_data,removePunctuation) hack_data<-tm_map(hack_data,removeWords, stopwords(“english”)) hack_data<-tm_map (hack_data, removeWords, c(“and”,”the”,”our”,”that”,”for”,”are”,”also”,”more”,”has”,”must”,”have”,”should”,”this”,”with”)) tdm_hack<-TermDocumentMatrix(hack_data)      TDM1<-as.matrix(tdm_hack)       #Convert this into a matrix format v = sort(rowSums(TDM1), decreasing = TRUE)          #Gives you the frequencies for every word Summary(v) wordcloud (hack_data, scale=c(5,0.5), max.words=1, random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(8, “Dark2″))

Reading the File

Following is the command to read a text file in R:

speech = “ C:\\Desktop\\Word_Cloud\\MrRobot\\project\\hacks.txt”

hack_txt = readLines(speech)

Converting the text file into a Corpus

Now in order to process or clean the text using tm package, you need to first convert this plain text data into a format called corpus which can then be processed by the tm package. A corpus is a collection of documents (although in our case we only have one) .Following is the command to convert .txt file into a corpus.

hack<-Corpus(VectorSource(hack_txt))

To see the first few documents in the text file, type the R command: inspect(hack)[1:10]

Data Cleaning

Execute the following commands in RStudio:

hack_data<-tm_map(hack,stripWhitespace)

hack_data<-tm_map(hack_data,tolower)

hack_data<-tm_map(hack_data,removeNumbers)

hack_data<-tm_map(hack_data,removePunctuation)

hack_data<-tm_map(hack_data,removeWords, stopwords(“english”))

As you can see the commands above, use tm_map() from the tm package for processing your text. As the commands are quite obvious, they do the following: strip unnecessary white space, convert everything to lower case (since tm package is case sensitive) remove English common words like ‘the’ (so-called ‘stopwords’). You can also explicitly remove numbers and punctuation with the removeNumbers and removePunctuation arguments.

After looking at the text document, I also noticed the following words stop words which I wanted to remove:

hack_data<-tm_map

(hack_data, removeWords, c(“and”,”the”,”our”,”that”,”for”,”are”,”also”,”more”,”has”,”must”,”have”,”should”,”this”,”with”))

Create a Term Document Matrix

It is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to words in the collection and columns correspond to documents.

Now we can create a word cloud even without a TDM. But the advantage of using this here is to take a look at the frequency of words.

tdm_hack<-TermDocumentMatrix(hack_data) #Creates a TDM

TDM1<-as.matrix(tdm_hack) #Convert this into a matrix format

v = sort(rowSums(TDM1), decreasing = TRUE) #Gives you the frequencies for every word

Summary(v)

summary(v) will give us the distribution of the frequency of words. So we can take a look at the least and max number of times a word has occurred. This helps us set the “max.words” parameter in the next step.

Create your first word cloud!

Scale controls the difference between the largest and smallest font, max.words is required to limit the number of words in the cloud (if you omit this R will try to squeeze every unique word into the diagram), rot.per is the percentage of vertical text, and colors provides a wide choice of symbolizing your data.

markets bo con consumerfinest took successfully hackers Elliot

i hope you will get your answer

Add a comment
Know the answer?
Add Answer to:
Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT