Question

Data manipulation using R Download and load Most_popular_baby_name.csv to R using the following R code: library('tidyverse')...

Data manipulation using R

Download and load Most_popular_baby_name.csv to R using the following R code:

library('tidyverse')
baby_names <- read_csv("http://personal.stevens.edu/~fmai/data/Most_Popular_Baby_Names.csv")

The file contains the counts of baby names by sex and mother's ethnicity in NYC in 2011-2014. For example, the first record indicates that in 2011, when the mother is Hispanic and the baby is female, 13 were named GERALDINE.
BRTH_YR Gender ETHCTY Name Count
2011 FEMALE HISPANIC GERALDINE 13

Only analyze the dataset for the years 2012 - 2014, so filter/subset the dataset accordingly:

baby_names <- baby_names %>% filter(BRTH_YR >= 2012)

Note that in some years, names are recorded in lower cases and while in others the names are recorded in upper cases. Find a way to standardize the names.

For the 2012-2014 data, answer the following questions. You may use base R, dplyr package or sqldf package:

a. What is the total number of UNIQUE names in the dataset?

Hint: for base R, consider using unique() and length() function. For dplyr, you can chain distinct(Name) and nrow() together. The answer is between 1500 and 1600.

b. Assuming that the ethnicity is non-overlapping, for each year, calculate the total number of babies born for each ethnicity in the dataset. The 2013 statistics should look like this:
1 2013 ASIAN AND PACIFIC ISLANDER 9293
2 2013 BLACK NON HISPANIC ????
3 2013 HISPANIC ????
4 2013 WHITE NON HISPANIC ????

Hint: for dplyr, consider chaining group_by(BRTH_YR, ETHCTY) and summarise(sum(Count))

c. During 2012-2014, what are the top 3 most popular baby names in each year?. For example, the 3 most popular names in 2012 are:

BRTH_YR Name Total

2012 ethan 723
2012 jacob 641
2012 jayden 752

Hint: For each year-name combination, you need to calculate the total counts across gender and ethnicity.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Let me now if you have any doubt.

library('tidyverse')
baby_names <- read_csv("http://personal.stevens.edu/~fmai/data/Most_Popular_Baby_Names.csv")

baby_names <- baby_names %>% filter(BRTH_YR >= 2012)

#standardize the names
baby_names$Name<-tolower(baby_names$Name) #all names are in lower case

#a. What is the total number of UNIQUE names in the dataset?

length(unique(baby_names$Name))

#b
baby_names %>%
group_by(BRTH_YR, ETHCTY) %>%
summarise(sum(Count))


#c
baby_names %>%
group_by(BRTH_YR,Name) %>%
summarise(Total=sum(Count)) %>%
filter(Total == max(Total))

Source on Save Runsource 1 librarytidyverse 2 baby_names <- read_csvChttp://personal.stevens.edu/fmai/data/Most Popular_Baby_Names.csv 4 baby-names <-baby-names %% filter (BRTH-YR >-2012) 6 #standardize the names 7 baby-names $Nameぐtol owe r (baby-names $Name) #a11 names are in lower case 9 #a. what is the total number of UNIQUE names in the dataset? 10 11 length(unique (baby_names $Name)) 12 13 #b 14 15 group-by (BRTHLYR, ETHCTY) %% 16summarise(sum Count)) baby-names %% 18 19 #c 20 baby-names %% 21 group-by (BRTH-YR , Name)%>% 22 summa rise(Total-Sum(Count)) 23 filter(Totalmax (Total)) 24 %>%

Add a comment
Know the answer?
Add Answer to:
Data manipulation using R Download and load Most_popular_baby_name.csv to R using the following R code: library('tidyverse')...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Please use own words. Thank you. CASE QUESTIONS AND DISCUSSION > Analyze and discuss the questions...

    Please use own words. Thank you. CASE QUESTIONS AND DISCUSSION > Analyze and discuss the questions listed below in specific detail. A minimum of 4 pages is required; ensure that you answer all questions completely Case Questions Who are the main players (name and position)? What business (es) and industry or industries is the company in? What are the issues and problems facing the company? (Sort them by importance and urgency.) What are the characteristics of the environment in which...

  • How did Samsung overtake Panasonic and Philips? What core competencies (resources and capabilities) did the firm...

    How did Samsung overtake Panasonic and Philips? What core competencies (resources and capabilities) did the firm possess that helped it to be successful? (Discuss the international strategy that Samsung executed.) Samsung Leadership Era: 2000–Present Samsung group was founded in 1938 by Byung-Chull Lee as a simple trading company in Taegu, Korea that exported basic goods such as dried fish, vegetables, and fruit before expanding into several business lines, including insurance, securities, and retail.43 In 1969, Lee decided to enter the...

  • 4. Perform a SWOT analysis for Fitbit. Based on your assessment of these, what are some strategic options for Fitbit go...

    4. Perform a SWOT analysis for Fitbit. Based on your assessment of these, what are some strategic options for Fitbit going forward? 5. Analyze the company’s financial performance. Do trends suggest that Fitbit’s strategy is working? 6.What recommendations would you make to Fitbit management to address the most important strategic issues facing the company? Fitbit, Inc., in 2017: Can Revive Its Strategy and It Reverse Mounting Losses? connect ROCHELLE R. BRUNSON Baylor University MARLENE M. REED Baylor University in the...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT