Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit...

Question

Question

Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit...

Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit the file, your code and the result.

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

Choose the text file for which you need to create a word cloud. For instance I am going to create a word cloud of Mr Robot Series. "Welcome back, my tenderfoot hackers! Well, the first season of Mr. Robot just ended and Elliot and fsociety successfully took down Evil Corp! They have effectively destroyed over 70% of the world's consumer and student debt! Free at last! Free at last! Of course, global financial markets crashed as well, but that's another story."

& saved as hacks.txt in Desktop and path is C:\\Desktop\\Word_Cloud\\MrRobot\\project\\hacks.txt

Installing Packages :

Open RStudio.You will need to install the packages “tm” and “wordcloud”. Next you need to load the packages in R

Run the following commands in RStudio.

#Installing Packages

install.packages (“tm”)

install.packages (“wordcloud”)

install.packages (“RColorBrewer”)

#Loading Packages

library(tm)

library(wordcloud)

library(RColorBrewer)

library(tm) library(wordcloud) library(RColorBrewer) speech = “ C:\\Desktop\\Word_Cloud\\MrRobot\\project\\hacks.txt” hack_txt = readLines(speech) hack<-Corpus(VectorSource(hack_txt)) inspect(hack)[1:10] hack_data<-tm_map(hack,stripWhitespace) hack_data<-tm_map(hack_data,tolower) hack_data<-tm_map(hack_data,removeNumbers) hack_data<-tm_map(hack_data,removePunctuation) hack_data<-tm_map(hack_data,removeWords, stopwords(“english”)) hack_data<-tm_map (hack_data, removeWords, c(“and”,”the”,”our”,”that”,”for”,”are”,”also”,”more”,”has”,”must”,”have”,”should”,”this”,”with”)) tdm_hack<-TermDocumentMatrix(hack_data)      TDM1<-as.matrix(tdm_hack)       #Convert this into a matrix format v = sort(rowSums(TDM1), decreasing = TRUE)          #Gives you the frequencies for every word Summary(v) wordcloud (hack_data, scale=c(5,0.5), max.words=1, random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(8, “Dark2″))

Reading the File

Following is the command to read a text file in R:

speech = “ C:\\Desktop\\Word_Cloud\\MrRobot\\project\\hacks.txt”

hack_txt = readLines(speech)

Converting the text file into a Corpus

Now in order to process or clean the text using tm package, you need to first convert this plain text data into a format called corpus which can then be processed by the tm package. A corpus is a collection of documents (although in our case we only have one) .Following is the command to convert .txt file into a corpus.

hack<-Corpus(VectorSource(hack_txt))

To see the first few documents in the text file, type the R command: inspect(hack)[1:10]

Data Cleaning

Execute the following commands in RStudio:

hack_data<-tm_map(hack,stripWhitespace)

hack_data<-tm_map(hack_data,tolower)

hack_data<-tm_map(hack_data,removeNumbers)

hack_data<-tm_map(hack_data,removePunctuation)

hack_data<-tm_map(hack_data,removeWords, stopwords(“english”))

As you can see the commands above, use tm_map() from the tm package for processing your text. As the commands are quite obvious, they do the following: strip unnecessary white space, convert everything to lower case (since tm package is case sensitive) remove English common words like ‘the’ (so-called ‘stopwords’). You can also explicitly remove numbers and punctuation with the removeNumbers and removePunctuation arguments.

After looking at the text document, I also noticed the following words stop words which I wanted to remove:

hack_data<-tm_map

(hack_data, removeWords, c(“and”,”the”,”our”,”that”,”for”,”are”,”also”,”more”,”has”,”must”,”have”,”should”,”this”,”with”))

Create a Term Document Matrix

It is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to words in the collection and columns correspond to documents.

Now we can create a word cloud even without a TDM. But the advantage of using this here is to take a look at the frequency of words.

tdm_hack<-TermDocumentMatrix(hack_data) #Creates a TDM

TDM1<-as.matrix(tdm_hack) #Convert this into a matrix format

v = sort(rowSums(TDM1), decreasing = TRUE) #Gives you the frequencies for every word

Summary(v)

summary(v) will give us the distribution of the frequency of words. So we can take a look at the least and max number of times a word has occurred. This helps us set the “max.words” parameter in the next step.

Create your first word cloud!

Scale controls the difference between the largest and smallest font, max.words is required to limit the number of words in the cloud (if you omit this R will try to squeeze every unique word into the diagram), rot.per is the percentage of vertical text, and colors provides a wide choice of symbolizing your data.

markets bo con consumerfinest took successfully hackers Elliot

i hope you will get your answer

Add a comment

Answer 2

Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit...

Homework Answers

Add Answer to:
Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit...

Post as a guest

Earn Coins

Using C, Write a program to alphabetically merge the three word list files (american0.txt, american1.txt, and...

a. Provide me with your code file, output file and the text file. 1. Create a...

In Python Provide me with your code file, output file and the text file Create a...

using java create hash set that can for the file use a txt file: Hi my...

Write a c program. CH-12 has arbitrary number of lines and one num. txt EXERCISE 12-11 te a program to create a new file numnew. txt that will he ine. A text file will have number in reverse order...

Create a text file named “file1.txt” (by use of the notepad editor in Windows for instance)...

11. Create your own MATLAB function file using Power Method to find the largest eigenvalue. A...

Solve the Sudoku game using the inputs available online. You can create your own input file...

Q7 MATLAB help Create a text file (.txt) containing the name, weight (pounds), and height (inches)...

Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit...

Homework Answers

Add Answer to: Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit...

Post as a guest

Earn Coins

Add Answer to:
Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit...