Question

The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...

The Book of R (Question 20.2) Please answer using R code.

Continue using the survey data frame from the package MASS for the next few exercises.

  1. The survey data set has a variable named Exer , a factor with k = 3 levels describing the amount of physical exercise time each student gets: none, some, or frequent. Obtain a count of the number of students in each category and produce side-by-side boxplots of student height split by exercise.
  2. Assuming independence of the observations and normality as usual, fit a linear regression model with height as the response variable and exercise as the explanatory variable (dummy coding). What’s the default reference level of the predictor? Produce a model summary.
  3. Draw a conclusion based on the fitted model from (b)—does it appear that exercise frequency has any impact on mean height? What is the nature of the estimated effect?
  4. Predict the mean heights of one individual in each of the three exercise categories, accompanied by 95 percent prediction intervals.
  5. Do you arrive at the same result and interpretation for the height-by-exercise model if you construct an ANOVA table using aov ?
  6. Is there any change to the outcome of (e) if you alter the model so that the reference level of the exercise variable is “none”? Would you expect there to be?

Now, turn back to the ready-to-use mtcars data set. One of the variables in this data frame is qsec , described as the time in seconds it takes to race a quarter mile; another is gear , the number of forward gears (cars in this data set have either 3, 4, or 5 gears).

  1. Using the vectors straight from the data frame, fit a simple linear regression model with qsec as the response variable and gear as the explanatory variable and interpret the model summary.
  2. Explicitly convert gear to a factor vector and refit the model. Compare the model summary with that from (g). What do you find?
  3. Explain, with the aid of a relevant plot in the same style as the right image of Figure 20-6 why you think there is a difference between the two models (g) and (h).
0 0
Add a comment Improve this question Transcribed image text
Answer #1

> library(MASS)
> #a)
> library(plyr)
> count(survey,vars='Exer')
Exer freq
1 Freq 115
2 None 24
3 Some 98

> par(mfrow=c(1,3))
> attach(survey)

> s1=survey[which(Exer=='Freq'),]
> s2=survey[which(Exer=='None'),]
> s3=survey[which(Exer=='Some'),]
> boxplot(s1$Height,xlab="Freq")
> boxplot(s2$Height,xlab="None")
> boxplot(s3$Height,xlab="Some")

Some None Freg 06 k 08k 0L 09 --I 06 S8 08k 0LL S9 09 002 06 08 04W 09 0S


> #b)
> linmodel=lm(Height~Exer)
> summary(linmodel)

Call:
lm(formula = Height ~ Exer)

Residuals:
Min 1Q Median 3Q Max
-24.607 -6.397 -1.607 6.103 25.393

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 174.6067 0.9396 185.836 < 2e-16 ***
ExerNone -5.5787 2.3489 -2.375 0.01847 *
ExerSome -4.2098 1.4094 -2.987 0.00316 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.628 on 206 degrees of freedom
(28 observations deleted due to missingness)
Multiple R-squared: 0.05333, Adjusted R-squared: 0.04414
F-statistic: 5.802 on 2 and 206 DF, p-value: 0.003536

>
> #c)
> #Based on the p-value for the model (=0.003536), the exercise frequency does have an impact on the height since the null corresponding to equality of means in rejected at 5% level.

>
> #d) lwr and upr give the 95% prediction intervals and fit gives the mean
> predict(linmodel, s1, interval="predict")
fit lwr upr
7 174.6067 155.5349 193.6784
> predict(linmodel, s2, interval="predict")
fit lwr upr
2 169.028 149.5777 188.4783
> predict(linmodel, s3, interval="predict")
fit lwr upr
1 170.3969 151.3027 189.4911

> #e)
> aov(Height~Exer)
Call:
aov(formula = Height ~ Exer)

Terms:
Exer Residuals
Sum of Squares 1075.657 19094.894
Deg. of Freedom 2 206

Residual standard error: 9.627755
Estimated effects may be unbalanced
28 observations deleted due to missingness
> summary(aov(Height~Exer))
Df Sum Sq Mean Sq F value Pr(>F)   
Exer 2 1076 537.8 5.802 0.00354 **
Residuals 206 19095 92.7   
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
28 observations deleted due to missingness
> #No change from the result obtained in the linear model. p-value is also the same.

Question 2

> attach(mtcars)

#a)

> linmod2=lm(qsec~gear)
> linmod2

Call:
lm(formula = qsec ~ gear)

Coefficients:
(Intercept) gear
19.452 -0.449

> summary(linmod2)

Call:
lm(formula = qsec ~ gear)

Residuals:
Min 1Q Median 3Q Max
-2.7074 -1.0629 -0.2164 1.2436 2.3436

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.4523 1.5131 12.856 8.96e-13 ***
gear -0.4490 0.4039 -1.112 0.276
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.517 on 26 degrees of freedom
Multiple R-squared: 0.04537, Adjusted R-squared: 0.008657
F-statistic: 1.236 on 1 and 26 DF, p-value: 0.2765

#From the p-value, we see that the null is accepted and hence gear is not a significant variable when qsec is the response.

> #b)
> linmod3=lm(qsec~factor(gear))
> summary(linmod3)

Call:
lm(formula = qsec ~ factor(gear))

Residuals:
Min 1Q Median 3Q Max
-2.29308 -0.66058 -0.04727 0.81568 2.51692

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.7031 0.3549 49.880 <2e-16 ***
factor(gear)4 0.9042 0.5242 1.725 0.0969 .
factor(gear)5 -1.8031 0.7317 -2.464 0.0210 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.28 on 25 degrees of freedom
Multiple R-squared: 0.3468, Adjusted R-squared: 0.2945
F-statistic: 6.635 on 2 and 25 DF, p-value: 0.004881

#The difference arises because in the first case the gear column is treated as a continuous variable and in the second case it is a categorical variable with 3 levels.

Add a comment
Know the answer?
Add Answer to:
The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • The data set "mtcars" in R has 11 variables with 32 observations. A data frame with...

    The data set "mtcars" in R has 11 variables with 32 observations. A data frame with 32 observations on 11 variables. [, 1] mpg Miles/(US) gallon [, 2] cyl Number of cylinders [, 3] disp Displacement (cu.in.) [, 4] hp Gross horsepower [, 5] drat Rear axle ratio [, 6] wt Weight (1000 lbs) [, 7] qsec 1/4 mile time [, 8] vs V/S [, 9) am Transmission (0 = automatic, 1 = manual) [,10] gear Number of forward gears...

  • Exercise 2. Consider the iris data set. (a) Fit a linear regression model for Sepal.Width using S...

    Exercise 2. Consider the iris data set. (a) Fit a linear regression model for Sepal.Width using Sepal.Length and Species as predictors. Recall that Species is a categorical variable with 3 levels (setosa versicolor, and virginica). Use summary) to print the results. What is the base- line level for Species in the model? (b) Fit a linear regression model for Sepal.Width using Sepal.Length, Species, and the interaction between Sepal.Length and Species as predictors. Use summary ) to print the results. (c)...

  • Please provide with R codes! thank you!! Data: Question: Data: 179 161 162 605557 155 60...

    Please provide with R codes! thank you!! Data: Question: Data: 179 161 162 605557 155 60 158 56 172 57 191 60 179 57 163 58 Height (cm) Head Circumference (cm) 2. Draw at most 3 plots to visually describe your data. Is your response variable approximately Normal? 3. Numerically describe the centre, spread and any unusual points of your variables/data. 4. Fit and describe a simple linear regression model between head circumference and height. 5. Are the regression parameters...

  • Exercise 1. For this exercise use the bdims data set from the openintro package. Type ?bdims to r...

    Exercise 1. For this exercise use the bdims data set from the openintro package. Type ?bdims to read about this data set in the help menu. Of interest are the variables hgt (height in centimeters), wgt (weight in kilograms), and sex (dummy variable with 1-male, 0-female). Since ggplotO requires that a categorical variable be coded as a factor type in R, run the following code: library (openintro) bdíms$sex2 <-factor (bdins$sex, levels-c (0,1), labels=c('F', 'M')) (a) Use ggplot2 to make a...

  • Please help me with these questions with R codes.. thank you!! Here’s the data I have...

    Please help me with these questions with R codes.. thank you!! Here’s the data I have obtained for the questions: Data: 9 students in total Height(cm) Head Circumference(cm) 179 60 161 55 162 57 155 60 158 56 172 57 191 60 179 57 163 58 2. Draw at most 3 plots to visually describe your data. Is your response variable approximately Normal? 3. Numerically describe the centre, spread and any unusual points of your variables/data. 4. Fit and describe...

  • Consider the R builtin dataset cars: data(mtcars) – Divide the data into training and test data...

    Consider the R builtin dataset cars: data(mtcars) – Divide the data into training and test data such that 80% of the data is randomly assigned to the training data and the remaining 20% is assigned to the test data. Use set.seed(100) in your code before performing the split to main reproducibility of results. (Hint: use the R function sample) – Fit dist vs speed (as the independent variable) using a linear model on the training data and print a summary...

  • PLEASE ANSWER ALL parts . IF YOU CANT ANSWER ALL, KINDLY ANSWER PART (E) AND PART(F)...

    PLEASE ANSWER ALL parts . IF YOU CANT ANSWER ALL, KINDLY ANSWER PART (E) AND PART(F) FOR PART (E) THE REGRESSION MODEL IS ALSO GIVE AT THE END. REGRESSION MODEL: We will be returning to the mtcars dataset, last seen in assignment 4. The dataset mtcars is built into R. It was extracted from the 1974 Motor Trend US magazine, and comcaprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). You can find...

  • The Motor Trend Car Road Tests dataset mtcars, in faraway R package, was extracted from the...

    The Motor Trend Car Road Tests dataset mtcars, in faraway R package, was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). The data frame has 32 observation on 11 (numeric) variables: mpg: Miles/(US) gallon; cyl: Number of cylinders; disp: Displacement (cu.in.); hp: Gross horsepower; drat: Rear axle ratio; wt: Weight (1000 lbs); qsec: 1/4 mile time; vs: Engine (0 = V-shaped, 1 =...

  • USE R STUDIO The stackloss data frame available in R contains 21 observations on four variables...

    USE R STUDIO The stackloss data frame available in R contains 21 observations on four variables taken at a factory where ammonia is converted to nitric acid. The first three variables are Air.Flow, Water.Temp, and Acid.Conc. The fourth variable is stack.loss, which measures the amount of ammonia that escapes before being absorbed. Read the help file for more information about this data frame. - Give a numerical summarization of each column of the dataset, then use boxplots to help illustrating...

  • USING R: x variable = income, y variable = sales; data set = Carseats how would...

    USING R: x variable = income, y variable = sales; data set = Carseats how would you code this? In this part of the problem, we will find a polynomial function of Income that best fits the Carseats data. For each polynomial function between p 0,1,2,..10: i. Fit a linear regression to predict Sales as a function of Income, Income2. IncomeP (you should include an intercept as wel. Note that p 0 model is an "intercept-only" model.

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT