Question
please answer the following using the r code provided

. The data set below contains information about the gasoline mileage performance for 32 au- tomobiles. We are interested in d
i. Give the prediction equation ii. Construct the ANOVA table for this model Source sum of squares df mean sum of squares F R
(e) Applying the above transformation, we observe that the residuals are more randomly scattered, and we choose to use the tr
. The data set below contains information about the gasoline mileage performance for 32 au- tomobiles. We are interested in developing a model to predict the miles per gallon () using related predictor variables. The variables in the study are Dependent variable: Miles per gallon (v) Independent variables: ri horsepower (ft-lb) ra: torque (ft-lb) r: horsepower+torque (ft-lb) rs: carburetor (barrels) (a) We first start by fitting a model using y and ri,,s,zs.rs, r. However, the regression fails to fit the model due to perfect multicollinearity, since z is a linear combination of r2 and What is a possible remedy for multicollinearity in this example? (b) Dropping r from the model, we fit a model using y and rs and r that is, we fit the following model where the description of the response and the predictors is given above. Below is some R out put: > model1-ln(y x1+x2+x3+x5+x6) > sumnary (nodel1) Call Im(fornula x1x2x3x5x6) Residuals 3Q Max 6.780-1429-0.332 1.586 6.296 Min 1Q Median Coofficients Estimate Std. Error t value Pr>ItI) (Intercept) 33.5412568 3.2028581 10.472 8.040-11* x2 x3 x5 x6 -0.0876880 0.0424834-2.064 0.0491 -0.0553033 0.0740766 -o.747 0.4620 0.0758799 0.0737334 1.029 0.3129 1.3299840 1.1131248 1.195 0.2429 -0.0001946 0.0017454 -0.111 0.9121 Signif. codes: 0 ? ? 0.001 ?*#7 0.01 ?#7 0.05 ?,? 0.1 ? ? 1 Residual standard error 3.122 on 26 degrees of freedom Multiple R-squared: 0.7952,Adjusted R-squared: 0.7558 F-statistic: 20.19 on 5 and 26 DF, p-value: 3.29e-08 > anova (nodel1) Analysis of Variance Table Response : y x1 x2 x3 x5 x6 Residuals 26 253.47 Dt Sum Sq Mean Sq F vlue PrOF) 955.34 955.34 97.9968 2.616e-10 16.37 6.37 0.6531 0.4263 1 2.39 2.39 0.2450 0.6248 1 19.86 19.86 2.0373 0.1654 1 0.12 0.12 0.0124 0.9121 9.75
i. Give the prediction equation ii. Construct the ANOVA table for this model Source sum of squares df mean sum of squares F Regression Error Total ii. Test for the significance of the regression using a-0.05 (c) Using the following output and plots, comment about the model assumptions > shapiro.test (res) Shapiro-Wilk normality test data: res 0.98429, p-value 0.9101 Normal QQ Plot Residuals against Stned values (d) Using the following output and plot, what possible transformation will you suggest? Give the equation of the transformed model result-boxcox(y~x1+x2+x3+x5+x6, lambda-seq(-1,5,1,by-o.01)) > result$x[resultsy -nax(result$y)) C1)-0.41 1544
(e) Applying the above transformation, we observe that the residuals are more randomly scattered, and we choose to use the transformed model. However, we canl observe some outliers in the plots of residuals against fitted values. We would like to further investigate those outliers. Using the plot of residuals against leverage we observe that ob- servations 2 and 17 have leverage value grater than the cut-off point. We further use the other measurements (DFFITS, DFFITS, Cook's Distance, COVRATIO) and conclude that we should investigate more those two observations. We refit the model excluding those observations and compare some statistics. Below is a table with the comparisons. Model full model PRESS .148 4.7880-04 6.82-04 6.62-046.783-033-060.820 000022 0.01001 without o. 072-04 7.113-047.2520-04-6.837e-03 8.349-060.824 0.000230.01020 without no. 17 | 0.146 4.636e-04 5474e-04 | -5.9lks04 .5.105.03 5.0Hk-06 0.816 0.00€ 23 0.01014 without io, 2 and i7 0.140| 4.624AM 5866e-04 -6,575·AM 1-53c-03-318 -06-.820-0(Kas 0.01010 MS Comment whether those two points are influential or not.
0 0
Add a comment Improve this question Transcribed image text
Answer #1

(i)

y = 33.5413 - 0.0877(x1) - 0.0553(x2) = 0.0759(x3) + 1.3299(x5) - 0.0002(x6)

(ii)

ANOVA Source of variation DF Sum Of Squares Mean Sum of Squares F value 955.34 6.37 2.39 19.86 0.12 253.47 1237.55 955.3497.9

(iii)

Regression X1 X2 X3 X5 x6 Significance(at 0.05) Significant Not significant Not significant Not significant Not significant p

d) In most cases we use Log Transformation to increase the model efficiency.

e)

Model Full Model Without no. 2 iihoui r>. 1./ Without no. 2 and 17 R2 0.82 0.824 0.82

Without Number 17 R2 is reduced from 0.82 to 0.816 but without number 2 R2 becomes 0.824.Though there is only slight variation reducing coefficient of determination becomes serious so we can say that the number 17 is more influential than the number 2.

Add a comment
Know the answer?
Add Answer to:
. The data set below contains information about the gasoline mileage performance for 32 au- tomob...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • The following data were collected on a simple random sample of 20 patients with hypertension: Y=m...

    The following data were collected on a simple random sample of 20 patients with hypertension: Y=mean arterial blood pressure (mmHg), X1=age(years),  X2= weight (kg), X3=body surface area (sq m), X4=duration of hypertension, X5 =basal pulse (beats/min), X6=measure of stress. A researcher is interested in developing a regression model to predict mean arterial blood pressure and has produced the following output: > rcorr(as.matrix(hyper))       Y   X1  X2   X3   X4  X5   X6 Y  1.00 0.66 0.95 0.87 0.29 0.72 0.16 X1 0.66 1.00...

  • The following data were collected on a simple random sample of 20 patients with hypertension: Y=mean...

    The following data were collected on a simple random sample of 20 patients with hypertension: Y=mean arterial blood pressure (mmHg), X1=age(years),  X2= weight (kg), X3=body surface area (sq m), X4=duration of hypertension, X5 =basal pulse (beats/min), X6=measure of stress. A researcher is interested in developing a regression model to predict mean arterial blood pressure and has produced the following output: > rcorr(as.matrix(hyper))       Y   X1  X2   X3   X4  X5   X6 Y  1.00 0.66 0.95 0.87 0.29 0.72 0.16 X1 0.66 1.00 0.41 0.38 0.34 0.62 0.37 X2 0.95 0.41...

  • Two linear regression models are fitted using software and below is their R2 and adjusted R2...

    Two linear regression models are fitted using software and below is their R2 and adjusted R2 values. Which of the two models fits the data better? Why does it fit the model better? In order from Model, R specification, R2, Adjusted R2 Model Model 1 : Y ∼ X1 + X3, 0.91, 0.84 Model 2 : Y ∼ X1 + X2, 0.88, 0.86

  • 4. The anscombe data set in the datasets R package (should automatically be loaded) contains 4 pa...

    4. The anscombe data set in the datasets R package (should automatically be loaded) contains 4 pairs of response-explanatory variables. The pairs are xl-yl, x2-y2, x3-y3, and x4-y4 where x is the explanatory variable and y is the response variable. (a) Run 4 simple linear regression analyses (one on each of the 4 pairs) to verify that the regression output is exactly the same (up to numerical accuracy) b) For each pair, describe what is wrong (if anything) and use...

  • 3. Description of each X and data for 27 franchise stores are given below The data (X1, X2, X3, X4, X5, X6) are for each franchise store. X1 annual net sales/$1000 X2 number sq. ft/1000 X3 - inv...

    3. Description of each X and data for 27 franchise stores are given below The data (X1, X2, X3, X4, X5, X6) are for each franchise store. X1 annual net sales/$1000 X2 number sq. ft/1000 X3 - inventory I$1000 X4- amount spent on advertising /$1000 X5 size of sales district/1000 families X6 number of competing stores in distric X1 X2 X3 X4 X5 X6 231 3 294 8.2 8.2 11 156 2.2 232 6.9 4.1 12 10 0.5 149 3...

  • 2. Suppose Y ~ Exp(a), which has pdf f(y)-1 exp(-y/a). (a) Use the following R code to generate data from the model Yi...

    2. Suppose Y ~ Exp(a), which has pdf f(y)-1 exp(-y/a). (a) Use the following R code to generate data from the model Yi ~ Exp(0.05/Xi), and provide the scatterplot of Y against X set.seed(123) n <- 500 <-rnorm (n, x 3, 1) Y <- rexp(n, X) (b) Fit the model Yi-Ao + Ax, + ε¡ using the lm function in R and provide a plot of the best fit line on the scatterplot of Y vs X, and the residual...

  • Please include the R code for each individual question. Save PDF to My Note The article...

    Please include the R code for each individual question. Save PDF to My Note The article "The Undrained Strength of Some Thawed Permafrost Soils" (Canadian Geotech. J., 1979: 420-427) contained the accompanying data on y shear strength of sandy soil (kPa), xl depth (m), and x2 water content (%) Obs Depth Content Strength 8.9 31.5 14.7 2 36.6 27.0 48.0 3 36.8 25.9 25.6 46.1 39.1 10.0 56.9 39.216.0 66.9 38.3 16.8 77.3 33.9 20.7 88.4 33.8 38.8 9 6.5...

  • Question 2: Suppose that we wish to fit a regression model for which the true regression...

    Question 2: Suppose that we wish to fit a regression model for which the true regression line passes through the origin (0,0). The appropriate model is Y = Bx + €. Assume that we have n pairs of data (x1.yı) ... (Xn,yn). a) From first principle, derive the least square estimate of B. (write the loss function then take first derivative W.r.t coefficient etc) b) Assume that e is normally distributed what is the distribution of Y? Explain your answer...

  • Help is needed on question 1. The second picture is the data set “Showtime.xlsx” needed to...

    Help is needed on question 1. The second picture is the data set “Showtime.xlsx” needed to answer the question . Stat 351 Homework #5 (Section 15.8-16.1) Make sure to show your work if you did any caleulation, and Minitab output if you used Minitab. I. Please download the dataset "Showtime.xlsx" from Canvas. The dataset "Showtime.xlsx" gives the data on weekly gross revenue (y), television advertising (x1), and newspaper advertising (32) for Showtime Movie Theaters. Use Minitab to help you answer...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT