This is an R programming question:
I am trying to improve the line of code 59:
date_movie<-stringi::stri_extract(edx$title,regex = "\\d{4}",
comments=TRUE)%>%as.numeric()
It is supposed to extract the date from the tittle, but as some
titles do have numbers before the date, I currently need to fix
some of the dates "manually" in lines 74 to 87.
My request is to help me modify the code so I can extract the date
more systematically (maybe through a logical expression) when a
number appears first in the title.
Thank you :)
![RStudio File Edit Code View Plots Session Build Debug Profile Tools Help O O O O O 23. ML Caret package.R- 1. Capstone R- 2.](//img.homeworklib.com/images/30a17a51-e050-4f0c-a995-65f5be6939bc.png?x-oss-process=image/resize,w_560)
This is an R programming question: I am trying to improve the line of code 59: date_movie<-stringi::stri_extract(edx$...
RStudio File Edit Code View Plots Session Build Debug Profile Tools Help O O O O O 23. ML Caret package.R- 1. Capstone R- 2. Capstone R- 3, Capstone Fred GegouR x Movielens Project Fred Gigou.Rmd 4·RMSE attempts,R" x 53 # STEP 4 - LET'S TIDY THE DATA BASE TO INCORPORATE MOVIE AGE AND OTHER DATE TIME FACTORS 55 # Movies' dates are encapsulated into the title 56 head (edx) 58 Extracting the date of the movie 59 date-movie-stringi: :stri-extract (edstitle.regex."\\d(4)", comments-TRUE)X>%as.numeric() 61 # Checking if the dates are correct. 1896 being the first year a movie was know to have been ever made and 2019 our current date 62 dates index<-date moviec1896 date movie>2019 63 dates wrong<-date movie[dates index] 64 length(dates_wrong) 67 creating a working data-frame"eda3" that groups by rating and has the year difference between the premier of the movie and the rating of the movie 69 70 edx3 -edx%smutate date movie-date movie date-of-movie rating-as datetime timestamp) year-of-rating-year (as dat etine timestamp) wkday-v day dat etime times as head (edx3) 72 edx3%>%group-by(movieId,title,date_moviesofilter (date-movie-1896 I date-movie-2019)%%summarize(n-n()) 74 edx3[edx3Smovield"671" "date movie"] <- 1996 75 edx3 edx3SmovieId"14 76 edx3 edx3Smovield "2308", "date movie"] <-1973 77 edx3 [edx3Smovield"4159", "date movie"] <-2001 78 edx3 edx3SmovieId"4311 79 edx3 [edx3Smovield"5310, "date movie"] <-1985 80 edx3 [edx3Smovield"5472 "date movie"] <-1972 81 edx3 edx3SmovieId"6290 82 edx3 83 edx3 edx3SmovieId"8198 84 edx3 85 edx3 edx3SmovieId"8905 86 edx3 [edx3Smovield"27266" "date movie] <- 2004 87 edx3 [edx3Smovield 53953" "date movie"] <-2007 422, "date movie"]<-1997 date movie"] <-1998 date movie"]<- 2003 = "6645", "date-movie"] <-1971 "8198" "date movie"] <-1960 "8864", "date movie"]-2004 date movie"]<- 1992 ield eld 741 (Untitled) : R Script Console Terminal movieId title edbl> echr> edbl> <int> 1 2 671 Hystery Science Theater 3000: The Movie (1996) 1422 Murder at 1600 (1997) 2308 Detroit 9000 (1973) 1600 1566 9000 22 3000 714 1732 5000 195 1776 185 1000 367 1138 464 1000 24 3000 146 1492 134 2046 426 1408 466 44159 3000 Miles to Graceland (2001) 4311 Bloody Angels (1732 HA,tten: Marerittet Har et Postnummer) (1998) 5310 Transylvania 6-5000 (1985) 5472 1776 (1972) 6290 House of 1000 Corpses (2003) 6 96645 THX 1138 (1971) 10 8198 1000 Eyes of Dr. Mabuse, The (Tausend Augen des Dr. Mabuse, Die) (1960) 11 8864 Mr. 3000 (2004) 12 8905 1492: Conquest of Paradise (1992) 13 27266 2046 (2004) 14 53953 1408 (2007)
RStudio File Edit Code View Plots Session Build Debug Profile Tools Help O O O O O 23. ML Caret package.R- 1. Capstone R- 2. Capstone R- 3, Capstone Fred GegouR x Movielens Project Fred Gigou.Rmd 4·RMSE attempts,R" x 53 # STEP 4 - LET'S TIDY THE DATA BASE TO INCORPORATE MOVIE AGE AND OTHER DATE TIME FACTORS 55 # Movies' dates are encapsulated into the title 56 head (edx) 58 Extracting the date of the movie 59 date-movie-stringi: :stri-extract (edstitle.regex."\\d(4)", comments-TRUE)X>%as.numeric() 61 # Checking if the dates are correct. 1896 being the first year a movie was know to have been ever made and 2019 our current date 62 dates index2019 63 dates wrong%group-by(movieId,title,date_moviesofilter (date-movie-1896 I date-movie-2019)%%summarize(n-n()) 74 edx3[edx3Smovield"671" "date movie"]