The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...

Question

Question

The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...

The Book of R (Question 20.2) Please answer using R code.

Continue using the survey data frame from the package MASS for the next few exercises.

The survey data set has a variable named Exer , a factor with k = 3 levels describing the amount of physical exercise time each student gets: none, some, or frequent. Obtain a count of the number of students in each category and produce side-by-side boxplots of student height split by exercise.
Assuming independence of the observations and normality as usual, fit a linear regression model with height as the response variable and exercise as the explanatory variable (dummy coding). What’s the default reference level of the predictor? Produce a model summary.
Draw a conclusion based on the fitted model from (b)—does it appear that exercise frequency has any impact on mean height? What is the nature of the estimated effect?
Predict the mean heights of one individual in each of the three exercise categories, accompanied by 95 percent prediction intervals.
Do you arrive at the same result and interpretation for the height-by-exercise model if you construct an ANOVA table using aov ?
Is there any change to the outcome of (e) if you alter the model so that the reference level of the exercise variable is “none”? Would you expect there to be?

Now, turn back to the ready-to-use mtcars data set. One of the variables in this data frame is qsec , described as the time in seconds it takes to race a quarter mile; another is gear , the number of forward gears (cars in this data set have either 3, 4, or 5 gears).

Using the vectors straight from the data frame, fit a simple linear regression model with qsec as the response variable and gear as the explanatory variable and interpret the model summary.
Explicitly convert gear to a factor vector and refit the model. Compare the model summary with that from (g). What do you find?
Explain, with the aid of a relevant plot in the same style as the right image of Figure 20-6 why you think there is a difference between the two models (g) and (h).

math Statistics-And-Probability

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

> library(MASS)
> #a)
> library(plyr)
> count(survey,vars='Exer')
Exer freq
1 Freq 115
2 None 24
3 Some 98
> par(mfrow=c(1,3))
> attach(survey)

> s1=survey[which(Exer=='Freq'),]
> s2=survey[which(Exer=='None'),]
> s3=survey[which(Exer=='Some'),]
> boxplot(s1$Height,xlab="Freq")
> boxplot(s2$Height,xlab="None")
> boxplot(s3$Height,xlab="Some")

Some None Freg 06 k 08k 0L 09 --I 06 S8 08k 0LL S9 09 002 06 08 04W 09 0S

> #b)
> linmodel=lm(Height~Exer)
> summary(linmodel)

Call:
lm(formula = Height ~ Exer)

Residuals:
Min 1Q Median 3Q Max
-24.607 -6.397 -1.607 6.103 25.393

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 174.6067 0.9396 185.836 < 2e-16 ***
ExerNone -5.5787 2.3489 -2.375 0.01847 *
ExerSome -4.2098 1.4094 -2.987 0.00316 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.628 on 206 degrees of freedom
(28 observations deleted due to missingness)
Multiple R-squared: 0.05333, Adjusted R-squared: 0.04414
F-statistic: 5.802 on 2 and 206 DF, p-value: 0.003536

>
> #c)
> #Based on the p-value for the model (=0.003536), the exercise frequency does have an impact on the height since the null corresponding to equality of means in rejected at 5% level.

>
> #d) lwr and upr give the 95% prediction intervals and fit gives the mean
> predict(linmodel, s1, interval="predict")
fit lwr upr
7 174.6067 155.5349 193.6784
> predict(linmodel, s2, interval="predict")
fit lwr upr
2 169.028 149.5777 188.4783
> predict(linmodel, s3, interval="predict")
fit lwr upr
1 170.3969 151.3027 189.4911

> #e)
> aov(Height~Exer)
Call:
aov(formula = Height ~ Exer)

Terms:
Exer Residuals
Sum of Squares 1075.657 19094.894
Deg. of Freedom 2 206

Residual standard error: 9.627755
Estimated effects may be unbalanced
28 observations deleted due to missingness
> summary(aov(Height~Exer))
Df Sum Sq Mean Sq F value Pr(>F)
Exer 2 1076 537.8 5.802 0.00354 **
Residuals 206 19095 92.7
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
28 observations deleted due to missingness
> #No change from the result obtained in the linear model. p-value is also the same.

Question 2

> attach(mtcars)

#a)

> linmod2=lm(qsec~gear)
> linmod2

Call:
lm(formula = qsec ~ gear)

Coefficients:
(Intercept) gear
19.452 -0.449

> summary(linmod2)

Call:
lm(formula = qsec ~ gear)

Residuals:
Min 1Q Median 3Q Max
-2.7074 -1.0629 -0.2164 1.2436 2.3436

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.4523 1.5131 12.856 8.96e-13 ***
gear -0.4490 0.4039 -1.112 0.276
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.517 on 26 degrees of freedom
Multiple R-squared: 0.04537, Adjusted R-squared: 0.008657
F-statistic: 1.236 on 1 and 26 DF, p-value: 0.2765

#From the p-value, we see that the null is accepted and hence gear is not a significant variable when qsec is the response.

> #b)
> linmod3=lm(qsec~factor(gear))
> summary(linmod3)

Call:
lm(formula = qsec ~ factor(gear))

Residuals:
Min 1Q Median 3Q Max
-2.29308 -0.66058 -0.04727 0.81568 2.51692

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.7031 0.3549 49.880 <2e-16 ***
factor(gear)4 0.9042 0.5242 1.725 0.0969 .
factor(gear)5 -1.8031 0.7317 -2.464 0.0210 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.28 on 25 degrees of freedom
Multiple R-squared: 0.3468, Adjusted R-squared: 0.2945
F-statistic: 6.635 on 2 and 25 DF, p-value: 0.004881

#The difference arises because in the first case the gear column is treated as a continuous variable and in the second case it is a categorical variable with 3 levels.

Add a comment

Answer 2

The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...

Homework Answers

Add Answer to:
The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...

Post as a guest

Earn Coins

The data set "mtcars" in R has 11 variables with 32 observations. A data frame with...

Exercise 2. Consider the iris data set. (a) Fit a linear regression model for Sepal.Width using S...

Please provide with R codes! thank you!! Data: Question: Data: 179 161 162 605557 155 60...

Exercise 1. For this exercise use the bdims data set from the openintro package. Type ?bdims to r...

Please help me with these questions with R codes.. thank you!! Here’s the data I have...

Consider the R builtin dataset cars: data(mtcars) – Divide the data into training and test data...

PLEASE ANSWER ALL parts . IF YOU CANT ANSWER ALL, KINDLY ANSWER PART (E) AND PART(F)...

The Motor Trend Car Road Tests dataset mtcars, in faraway R package, was extracted from the...

USE R STUDIO The stackloss data frame available in R contains 21 observations on four variables...

USING R: x variable = income, y variable = sales; data set = Carseats how would...

The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...

Homework Answers

Add Answer to: The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...

Post as a guest

Earn Coins

Add Answer to:
The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...