Question

Consider the R builtin dataset cars: data(mtcars) – Divide the data into training and test data...

Consider the R builtin dataset cars: data(mtcars)

  • – Divide the data into training and test data such that 80% of the data is randomly assigned to the training data and the remaining 20% is assigned to the test data. Use set.seed(100) in your code before performing the split to main reproducibility of results. (Hint: use the R function sample)

  • – Fit dist vs speed (as the independent variable) using a linear model on the training data and print a summary of this fit.

  • – Display the residual plots from the fitting

  • – Obtain the mean square error of prediction using the test data.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

PFA the complete R code for all parts

library(CARS)
library(Metrics) # It has MSE function

data1=mtcars
set.seed(100)

speed=data1$mpg
distance=data1$disp

n=length(speed) #Total length = 32
training_size=as.integer(0.8*n) #Trainingt sample size = 25
training_index=sample(1:length(speed),size=training_size) #Index vector for training sample
tr_speed=speed[training_index] #training sample speed
tr_distance=distance[training_index] #training samle distance
tr_df=data.frame(tr_speed,tr_distance) #training dataset

test_speed=speed[-training_index]
test_distance=distance[-training_index]
test_df=data.frame(test_speed,test_distance)

fit=lm(tr_distance~tr_speed,data=tr_df) #linear model
summary(fit)

plot(tr_speed,fit$residuals) #residual plot

#predictions
pred_distance=predict.lm(fit,newdata = data.frame(tr_distance=test_distance,tr_speed=test_speed))

pmse=mse(test_distance,pred_distance)
pmse

------------------------------------------------------------------------------------------------------------------------------------------------------

The indices for training samle obtained are (10 8 17 2 14 28 22 32 27 4 24 19 6 31 26 12 23 20 15 9 7 21 18 30 16)

The summary of fit is

summary(fit)

Call:
lm(formula = tr_distance ~ tr_speed, data = data.frame(tr_speed,
tr_distance))

Residuals:
Min 1Q Median 3Q Max
-89.503 -36.637 -7.361 42.058 120.703

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 568.594 37.888 15.007 2.26e-13 ***
tr_speed -16.959 1.767 -9.599 1.64e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 58.09 on 23 degrees of freedom
Multiple R-squared: 0.8002,   Adjusted R-squared: 0.7916
F-statistic: 92.14 on 1 and 23 DF, p-value: 1.645e-09

The residual plot is

The mean square error for prediction is 8144.912

Add a comment
Know the answer?
Add Answer to:
Consider the R builtin dataset cars: data(mtcars) – Divide the data into training and test data...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Answer the following question by showing the codes in R 2. Consider the dataset mtcars and...

    Answer the following question by showing the codes in R 2. Consider the dataset mtcars and suppose we are interested in modeling the mpg of a vehicle based on a single variable presented in the dataset. a) Use the cor ) function in R, apply it to only numerical variables in the dataset. Identify the numerical variable that shows the most significant correlation, and generate a scatterplot between this variable and mpg. b) Use the 1m() function in R to...

  • Split it into training and test data. Apply logistic regression and KNN algorithm. You can use R ...

    Split it into training and test data. Apply logistic regression and KNN algorithm. You can use R or Python to implement them. Which one of the algorithm fares better? The dataset can be found at https://www.kaggle.com/uciml/indian-liver-patient-records/version/1 Much much thx for help

  • The Motor Trend Car Road Tests dataset mtcars, in faraway R package, was extracted from the...

    The Motor Trend Car Road Tests dataset mtcars, in faraway R package, was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). The data frame has 32 observation on 11 (numeric) variables: mpg: Miles/(US) gallon; cyl: Number of cylinders; disp: Displacement (cu.in.); hp: Gross horsepower; drat: Rear axle ratio; wt: Weight (1000 lbs); qsec: 1/4 mile time; vs: Engine (0 = V-shaped, 1 =...

  • Classification in Python: Classification In this assignment, you will practice using the kNN (k-Nearest Neighbors) algorithm...

    Classification in Python: Classification In this assignment, you will practice using the kNN (k-Nearest Neighbors) algorithm to solve a classification problem. The kNN is a simple and robust classifier, which is used in different applications. The goal is to train kNN algorithm to distinguish the species from one another. The dataset can be downloaded from UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/ (Links to an external site.)Links to an external site.. Download `iris.data` file from the Data Folder. The Data Set description...

  • 2. Suppose Y ~ Exp(a), which has pdf f(y)-1 exp(-y/a). (a) Use the following R code to generate data from the model Yi...

    2. Suppose Y ~ Exp(a), which has pdf f(y)-1 exp(-y/a). (a) Use the following R code to generate data from the model Yi ~ Exp(0.05/Xi), and provide the scatterplot of Y against X set.seed(123) n <- 500 <-rnorm (n, x 3, 1) Y <- rexp(n, X) (b) Fit the model Yi-Ao + Ax, + ε¡ using the lm function in R and provide a plot of the best fit line on the scatterplot of Y vs X, and the residual...

  • The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...

    The Book of R (Question 20.2) Please answer using R code. Continue using the survey data frame from the package MASS for the next few exercises. The survey data set has a variable named Exer , a factor with k = 3 levels describing the amount of physical exercise time each student gets: none, some, or frequent. Obtain a count of the number of students in each category and produce side-by-side boxplots of student height split by exercise. Assuming independence...

  • Submit the following as a R document as usual. Load the library MASS. Type the following: set.see...

    Code in R: Submit the following as a R document as usual. Load the library MASS. Type the following: set.seed (548) propTraining <- 0.5 propTesting <- 0.85 nTraining <- floor (propTraining*nrow (Boston)) nTest < floor (propTesting nrow (Boston)) # find indices for training and test sets indicesTraining <-sort (sample(1:nrow (Boston),size nTraining)) indicesTestingsetdiff (1:nrow(Boston), indicesTraining) # make training and testing data frames BostonTrain<-Boston[indicesTraining,] BostonTest ← Boston(indicesTesting,] Suppose that a contractor needs to predict whet her nox, the nitrogen oxide concentration, is...

  • 4. Two new mathematics learning techniques are being tested. Twenty students were randomly select...

    4. Two new mathematics learning techniques are being tested. Twenty students were randomly selected from a population. nA 9 of them were randomly assigned to use technique A, and nB 11 of them were randomly assigned to use technique B. Each student spent 30 minutes learning the technique to which they were assigned, and then were asked to complete a task. The time to complete the task was recorded, in seconds. A shorter time indicates better mastery of the task....

  • PLEASE ANSWER ALL parts . IF YOU CANT ANSWER ALL, KINDLY ANSWER PART (E) AND PART(F)...

    PLEASE ANSWER ALL parts . IF YOU CANT ANSWER ALL, KINDLY ANSWER PART (E) AND PART(F) FOR PART (E) THE REGRESSION MODEL IS ALSO GIVE AT THE END. REGRESSION MODEL: We will be returning to the mtcars dataset, last seen in assignment 4. The dataset mtcars is built into R. It was extracted from the 1974 Motor Trend US magazine, and comcaprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). You can find...

  • 4. Two new mathematics learning techniques are being tested. Twenty students were randomly selected from a...

    4. Two new mathematics learning techniques are being tested. Twenty students were randomly selected from a population. nA 9 of them were randomly assigned to use technique A, and nB 11 of them were randomly assigned to use technique B. Each student spent 30 minutes learning the technique to which they were assigned, and then were asked to complete a task. The time to complete the task was recorded, in seconds. A shorter time indicates better mastery of the task....

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT