Question

I can't attach the data due to the file being real large i can email it...

I can't attach the data due to the file being real large i can email it to you so i can have your help on it

# Assignment 1
# R Programming Language
# ---- Why do Exploratory Data Analysis (EDA)? ----
# We will be looking at
## identifying outliers
## null values
## generating plots
## examining correlations
# --------------------------------------------------------------
# In this video we will cover:
## univariate plots for continuous variables (boxlots, historgrams)
## bivariate plots
## calculating correlation
## subsetting your data frame
## creating new variables (or features)

# ---- Load packages from local library ----
library(ggplot2)
library(reshape2)
library(dplyr)
library(corrplot)
library(utils)
library(forcats)

# ---- Import data ----
# Remember that any back slashes need to be changed to forward slashes
df <- read.csv('AppleStore.csv')

# ---- Explore data ----
# Q1. Print the first 20 rows of the data frame


# Q2. Print the dimensions of the data frame as rows and columns

# Q3. Print information about the structure of the data frame, including
# the class of data found in each column

# Q4. Print summary statistics of each column

# ---- Subset data ----
# --------------------------------------------------------------
#
# There are columns that we're not interested in so let's
# create a subset of the data that only contains
# specific columns
# There are many ways to do this. For example:
# Option 1: Create a vector of the names of the columns that we want to
# use to build a subset of the data, and use these to index the data frame object
#
# Q5. Create new Data Frame, new_set with the following
# "size_bytes", "rating_count_tot", "rating_count_ver", "user_rating",
# "price", "sup_devices.num", "cont_rating", "prime_genre"
# you can use a variable to save the colum names.

# Q6. Option 2: Use the select() function from dplyr to select which
# columns to use


# Q7. What type of object have we just created?


# Q8. Print summary statistics for each column

# ---- Check for null values ----
# Are there any missing (NA) values?
# Q9. If this is > 0 it tells us how many missing values the data frame contains

# ---- Manipulating variables: Convert bytes to MB ----
# --------------------------------------------------------------
#
# Task: Convert bytes to MB and store these values in a new column called size_mb
## 1,048,576 bytes = 1 MB
# New columns can be created in a single step
# Q10. new_set$size_mb means that the new column called "size_mb" will be found in the data frame called "new_set"


# We can also give our variables "friendly names"
# Be careful - here we are simply extracting the values from the column and placing them
# into an object called size_mb; if the values of the column new_set$size_mb change,
# Q11 this change will NOT be reflected in the object called size_mb


# Q11 Print the column names of the data frame new_set

# Q12 Print the first 20 rows of the data frame


# ---- Manipulating variables: Is the app free? ----
# --------------------------------------------------------------

# Q13 Task: Create a column called is_free that contains TRUE if the price is 0 and FALSE if the price is not 0
## The conditional operator "==" here asks the question "is each value of price equal to 0?
## It will return a vector of TRUE and FALSE which is stored in a new column called is_free

# Q14. Task: Instead of TRUE and FALSE, update the values to be "Free" and "Paid"
## Here, the ifelse() function will check each value of is_free. If the
## value is equal to TRUE, it will be replaced with "Free", otherwise it
## will be replaced with "Paid"


# Q15. Task: At the moment the is_free column is stored as a character vector.
# You need to convert it to a factor with identifiable levels.

# ---- Univariate analysis: Is the app free? ----
# Q16 Plot data from is_free column

# Calculate the % of free apps
# Q17 Print a summary of the column is_free

# Q18 calculate the ratio of "Free" of is_free

# Q19 Plot the frequency of each app genre sorted by count

# For this section, a line of code is wrriten for it.
# This won't sort by count so we need to try something else
## There are many ways to do this!
# Here, we'll use ggplot2::qplot() and forcats::fct_infreq

qplot(x = fct_infreq(new_set$prime_genre), data = new_set) +
labs(title = "Frequncy of Apps based on the category",
x = "Categories",
y = "Count") +
theme(axis.text.x = element_text(angle = 90,hjust = 1))


# ---- Manipulating variables: Is the app a game? ----
# --------------------------------------------------------------
#
# Q20 Task: Create a new column called is_game that has TRUE and FALSE values indicating if it is a game


# Q21 Task: Instead of TRUE and FALSE, update the values to be "Game" and "General App"

# Q22 Print summary statistics from each column of new_set


# Q23 Print the first ten values of the column is_game


# Q24 Convert is_game to a factor

# ---- Univariate analysis: Is the app a game? ----


# Q25 Plot data from is_game column


# ---- Manipulating variables: Convert to boolean ----

# Q26 Task: Create a column called is_game_bool that contains boolean value (0 and 1) with is_game
# This will be helpful for modeling purposes later
## Game = 1, anything else = 0


# Q27 Print the first ten values of the column is_game_bool


# ---- Univariate analysis: What is the content rating? ----
# Apple's rating system for the App Store follows the following rubric
## 4+: Contains no objectionable material.
## 9+: May contain content unsuitable for children under the age of 9.
## 12+: May contain content unsuitable for children under the age of 12.
## 17+: May contain content unsuitable for children under the age of 17.
# Plot the frequency of each content type
qplot(x = cont_rating, data = new_set) +
labs(title = "Frequncy of Apps based on the content type",
x = "Content Type",
y = "Count")


# ---- EDA demo ----
## Univariate plots for continuous variables (boxplots, histograms)
## Bivariate plots
## Correlation

# Q28 ---- Exploring price ----
# Print summary information about the price variable


# Q29 Generate a boxplot of price with the y axis ranging from 0 to 25


# Q30 To display the help file for the function boxplot()


# Q31 Generate a histogram of price with the x axis ranging from 0 to 25


# Q32 Generate a density plot of price with the title Price Density Spread

# Q33 Generate a density plot of price with the title Price Density Spread and
# the x axis ranging from 0 to 50

0 0
Add a comment Improve this question Transcribed image text
Answer #1

1. print(df.data[1:20,])
The above command will print 20 rows of the data frame.

2. print(dim(df.data))
The above command will print the dimension of the data frame

3. str(df.data)
This will print the structure of the data frame

4. summary(df.data)
This will print a summary of the data frame.


Being a Chegg expert, I am obliged to do only four parts at a time.
I hope you understand.
Thanks

Add a comment
Know the answer?
Add Answer to:
I can't attach the data due to the file being real large i can email it...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Using Rstudio please provide your code as well as a screenshot of your code ## File:...

    Using Rstudio please provide your code as well as a screenshot of your code ## File: lab5 ## Author: A. Breitzman ## Description: Manipulating and summarizing a data set ## ------------------------------------------------------------------------ ## Instructions: This worksheet is worth 5 points ## Exercises are worth 2.5 point each. If you cannot get the code to work in the line limits specified, the maximum that you can earn per exercise would be 1.5 points. ## First we need to load a library library(MASS)...

  • Due Since Yesterday, need it ASAP please! Data is there already. if you can't do it...

    Due Since Yesterday, need it ASAP please! Data is there already. if you can't do it don't mess with it. You asked the same thing yesterday. We were unable to transcribe this imageWe were unable to transcribe this imageWe were unable to transcribe this imageFATHER MOTHER FIRST SO 68.0 70.0 64.0 71.0 65.5 720 69.0 63.5 69.5 66.0 71.0 72.0 70.0 58.0 65.5 60.0 68.0 67.0 65.0 65.0 71.0 63.0 71.0 67.0 65.5 70,0 68.0 63.0 70.5 62.0 66.0 64.0...

  • Lab Exercise #15 Assignment Overview This lab exercise provides practice with Pandas data analysis library. Data...

    Lab Exercise #15 Assignment Overview This lab exercise provides practice with Pandas data analysis library. Data Files We provide three comma-separated-value file, scores.csv , college_scorecard.csv, and mpg.csv. The first file is list of a few students and their exam grades. The second file includes data from 1996 through 2016 for all undergraduate degree-granting institutions of higher education. The data about the institution will help the students to make decision about the institution for their higher education such as student completion,...

  • Create a new project in BlueJ and name it LastName-lab8-appstore, e.g., Smith-lab8-appstore. If your App class from Lab 4 is fully functional, copy it into the project. (You can drag a file from File...

    Create a new project in BlueJ and name it LastName-lab8-appstore, e.g., Smith-lab8-appstore. If your App class from Lab 4 is fully functional, copy it into the project. (You can drag a file from File Explorer onto the BlueJ project window.) Otherwise, use the instructor's App class: Create a new class using the "New Class..." button and name it App. Open the App class to edit the source code. Select and delete all the source code so that the file is...

  • only how to print the box plot choose between the two i already have all the...

    only how to print the box plot choose between the two i already have all the code except this one in java printBoxPlot - (Choice) A Method that displays an ASCII Representation of a box plot using the 5 Number Summary.   Write the program to display the results in a GUI. i.e. Create a Frame and two Panel's. In the top panel, display the statistics. In the bottom panel, draw the Box Plot Write a class called ArrayStats that has...

  • The built-in R dataset swiss gives Standardized fertility measure and socio-economic indicators for each of 47...

    The built-in R dataset swiss gives Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888. The dataset is a data frame containing 6 columns (variables). The column Infant.Mortality represents the average number of live births who live less than 1 year over a 3-year period. We are interested in the Infant.Mortality column. We can convert the data in this colun to an ordinary vector x by making the assignment x <- swiss$Infant.Mortality....

  • Before you start For this homework, we will need to import some libraries. You need to...

    Before you start For this homework, we will need to import some libraries. You need to execute the following cell only once; you don't need to copy this in every cell you run. In [ ]: import pandas import numpy from urllib.request import urlretrieve from matplotlib import pyplot %matplotlib inline ​ #This library is needed for testing from IPython.display import set_matplotlib_close set_matplotlib_close(False) Introduction In this homework, you will work with data from the World Bank. The subject of study is...

  • please answer it quikly this quizz has time frame Topic Task 11: Regression and Correlation Data...

    please answer it quikly this quizz has time frame Topic Task 11: Regression and Correlation Data Sheet This data and record sheet is required for you to complete the online Topic Task. Please print this sheet BEFORE you attempt to open the Topic Task 11. You will need this information in front of you when doing the task. You will also need a pencil/pen and your calculator Do not clear the data from your calculator between questions, as you will...

  • According to Wikipedia , a comma-separated values (CSV) file is a delimited text file that uses...

    According to Wikipedia , a comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. A CSV file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format. A company has text data that is not...

  • I only need the "functions" NOT the header file nor the main implementation file JUST the impleme...

    I only need the "functions" NOT the header file nor the main implementation file JUST the implementations for the functions Please help, if its difficult to do the complete program I would appreciate if you could do as much functions as you can especially for the derived class. I am a beginer so I am only using classes and pointers while implementing everything using simple c++ commands thank you in advanced Design and implement two C++ classes to provide matrix...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT