Question

Please set eval = FALSE in the codechunk of your RMarkdown, as the output for this question will be too lengthy. In R, use set.seed (35135) and then the rnorm) command to generate 80,000 standard normally distributed observations. Put those values into a matrix with 400 rows and 200 columns. Now fit a multiple linear regression where the tenth column of that matrix is the response variable, and the remainder of the columns are considered predictors. Note: the syntax lm(Y ~ to tell R that the variable named Y in the dat.matrix should be modelled using the remainder of the variables in dat.matrix data-dat.matrix) is a shortcut Examine the summary output for the model you just fit. Manually count the number of statistically significant (α-0.05) predictor variables in this model. How many are there? What concept in class is this illustrating? How many statistically significant predictor variables would we expect to see for this simulation?

0 0
Add a comment Improve this question Transcribed image text
Answer #1

R-code:

Console Terminal > set.seed (35135) > data. matrix data. frame (matrix(rnorm(400 200), ncol-200) y-data. matrix[,10] > data. matrix-data. matrix[,-10] > reg-1m(y. , data-data. matrix) summary(reg) Call: 1m(formulay- ., datadata. matrix) Residuals -2. 21526 -0. 51996 0.01957 0. 52171 1.87459 coefficients: Min 1Q Median 3Q Max Estimate std. Error t value Pr(ltI) (Intercept) 0. 0111566 0.0770659 0.145 0.8850 x1 X2 X3 X4 -0.0016279 0. 0806650 -0.020 0.9839 -0. 0848108 0. 0703651 -1.205 0. 2295 0.0771280 0.0759899 1.015 0. 3113 0.1366954 0.0769509 1.7760.0772 . 0.0152359 0.0664623 0.229 0.8189 -0.0128077 0. 0751483 -0.170 0. 8648 0.0679734 0. 0786377 0. 864 0. 3884 0.0529031 0.0699834 0.7560.4506 0.1027706 0.0778118 1.321 0.1881 0.1749166 0.0732211 2. 389 0. 0178 0.0721969 0.0702182 1.028 0. 3051 0.0682777 0.0697036 0.980 0. 3285 -0.0034966 0.0702669-0.050 0. 9604 -0.1072522 0. 07 59415 -1.412 0.1594 X6 X8 X9 X11 x12 X13 X14 X15

X178 x179 X180 X181 X182 X183 X184 X185 X186 X187 X188 X189 X190 x191 x192 x193 X194 x195 x196 X197 X198 x199 X200 0. 0696132 0. 0/65404 0. 909 . 3642 0.0116681 0.0710432 0.164 0. 8697 0.0166739 0.0718617 0.232 0.8168 0.0468920 0. 0790108 0. 593 0. 5535 -0.1047574 0.0699616 -1.4970.1359 -0.0415234 0. 0699174-0. 5940.5533 -0. 0409633 0.0658620 -0. 622 0. 5347 0.1225387 0.07619581.608 0.1094 -0. 0644171 0. 0747036 -0. 862 0. 3896 0.0950928 0.0732208 1.299 0.1955 -0.0174549 0. 0750740 -0. 233 0. 8164 -0. 0778076 0.0723866 -1.075 0. 2837 -0. 04 54622 0.0707489 -0. 643 0. 5212 -0. 0511916 0.0674848 -0.759 0.4490 0.0616288 0.0721222 0. 855 0. 3938 0.0406006 0.0711635 0. 571 0. 5690 -0. 0486792 0.0718579 -0. 677 0.4989 0.0054715 0.0664 518 0.082 0.9345 -0.1256069 0.0751697-1.671 0.0963 -0. 0681435 0.0687096 -0.992 0. 3225 -0.0128784 0. 0757990 -0.170 0. 8653 0.0002671 0.0673538 0.004 0. 9968 -0. 0379809 0. 0749534 -0. 5070. 6129 signif. codes 0.001 0.01 * 0.05. 0.11 Residual standard error: 1.017 on 200 degrees of freedom Multiple R-squared 0.4598, F-statistic: 0.8553 on 199 and 200 DF, p-value: 0. 8647 Adjusted R-squared -0.07778

As per output, R-Squre = 0.4598 = 45.98% of the variation in the 10 column is explained by the remaining columns.

Test Statistic F = 0.8553

Here P-value = 0.8647 which is greater than 0.05 so we conclude that there multiple regression equation is not best fit tot he given data.

Add a comment
Know the answer?
Add Answer to:
Please set eval = FALSE in the codechunk of your RMarkdown, as the output for this...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • matlab multiple regression

    peruvian.txtProblem 1 (explore the data):In this exercise use the Peruvian blood pressure data set, provided in the file peruvian.txt (A NOTE for repeat students: The data is different from the data I shared last year.). This dataset consists of variables possibly relating to blood pressures of n = 30 Peruvians who have moved from rural high altitude areas to urban lower altitude areas. The variables in this dataset are: Age, Weight, Height, Pulse, Systol and Diastol. Before reading the data into MATLAB, it can be viewed in a...

  • Need help with stats true or false questions Decide (with short explanations) whether the following statements are true or false a) We consider the model y-Ao +A(z) +E. Let (-0.01, 1.5) be a 95% con...

    Need help with stats true or false questions Decide (with short explanations) whether the following statements are true or false a) We consider the model y-Ao +A(z) +E. Let (-0.01, 1.5) be a 95% confidence interval for A In this case, a t-test with significance level 1% rejects the null hypothesis Ho : A-0 against a two sided alternative. b) Complicated models with a lot of parameters are better for prediction then simple models with just a few parameters c)...

  • In this exercise use the Peruvian blood pressure data set, provided in the file peruvian.txt. Thi...

    In this exercise use the Peruvian blood pressure data set, provided in the file peruvian.txt. This dataset consists of variables possibly relating to blood pressures of n = 39 Peruvians who have moved from rural high altitude areas to urban lower altitude areas. The variables in this dataset are: Age, Years, Weight, Height, Calf, Pulse, Systol and Diastol. Before reading the data intoMATLAB, it can be viewed in a text editor. This question involves the use of multiple linear regression...

  • For the following question (#19 and #20), please use the following multiple regression output. The dependent...

    For the following question (#19 and #20), please use the following multiple regression output. The dependent variable is Home Price: ($) the independent variables are Number of Bedrooms, Size (square footage), and Pool (0 = no pool, 1 = pool). 19: Which statement is correct? SUMMARY OUTPUT A: The R square of 571 is the best goodness of fit statistic to use for multiple regression analyses. B: The Number of Bedrooms is not a significant predictor variable. Regression Statistics Multiple...

  • Question on interpreting linear regression 1. The data file airfares.txt on the book web site gives...

    Question on interpreting linear regression 1. The data file airfares.txt on the book web site gives the one-way airfare (in US dol on modeling airfare as a function of distance. The first model fit to the data was Fare B+BDistance+e (3.7) (a) Based on the output for model (3.7) a business analyst concluded the following The regression coefficient of the predictor variable, Distance is highly statistically signifi- cant and the model explains 99.4% of the variability in the Y-variable. Fare....

  • 6. (textbook) An analyst fitted a regression model to predict city MPG using as predictors Length...

    6. (textbook) An analyst fitted a regression model to predict city MPG using as predictors Length (of car in inches), Width (of car in inches) and Weight (of car in pounds). a. Intuitively, what association do you expect between the explanatory variables and MPG? b. Do you see anything of concern about these variables being used as explanatory variables? Explain S c. What does the matrix plot done in class show you? Explain d. Write the null and alternative hypothesis...

  • Exercise 1. For this exercise use the bdims data set from the openintro package. Type ?bdims to r...

    Exercise 1. For this exercise use the bdims data set from the openintro package. Type ?bdims to read about this data set in the help menu. Of interest are the variables hgt (height in centimeters), wgt (weight in kilograms), and sex (dummy variable with 1-male, 0-female). Since ggplotO requires that a categorical variable be coded as a factor type in R, run the following code: library (openintro) bdíms$sex2 <-factor (bdins$sex, levels-c (0,1), labels=c('F', 'M')) (a) Use ggplot2 to make a...

  • Problem 3: Question 2 in Section 6.7 (pg. 215) in the textbook A Modern Approach to Regression wi...

    Please explain this question in detail. Problem 3: Question 2 in Section 6.7 (pg. 215) in the textbook A Modern Approach to Regression with R. Chapter 5-2 of the award-winning book on baseball (Keri, 2006) makes extensive use of multiple linear regression. For example, since the "30 Major League Baseball teams play eighty-one home games during the regular season and receive the largest share of their income from the ticket sales associated with these games", the author develops a linear...

  • 3. The table below shows the regression output of a multiple regression model relating the beginn...

    3. The table below shows the regression output of a multiple regression model relating the beginning salaries of employees in a given company to the following independent variables: Sex : an indicator variable (1=man and 0-woman) ducation years of schooling at the time of hire Experience number of months of previous work experience Source Regression Residual Total Df 4 8822,387,82 254,407 92 MS F-value 23.763,297 5,940,82423.35 46,151,118 Coefficient table Variable Constant Sex Education Experience Months t-value 10.94 6.02 3.22 2.16...

  • Question 4 (3 points) The statsmodels ols() method is used on a cars dataset to fit...

    Question 4 (3 points) The statsmodels ols() method is used on a cars dataset to fit a multiple regression model using Quality as the response variable. Speed and Angle are used as predictor variables. The general form of this model is: Y = Bo + B. Speed+B Angle If the level of significance, alpha, is 0.10, based on the output shown, is Angle statistically significant in the multiple regression model shown above? Select one. OLS Regression Results ==================================== ========== 0.978...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT