1. Choose a data set of your own:?Response or dependent variable (Y)?At least 3 or more independent variables (X1, X2, X3, ... etc.) that you believe has an influence on Y.?At least 40 observations or data points?If there are categorical variables, model them appropriately2. Fit a multiple regression model. ?Interpret the model equation?Are all the chosen variables significant? Discuss.?Check for model assumptions and make appropriate comments.?How good is the model? Comment on R2 , R , se, F-value etc and discuss.2a3. Refine the model.?Does the addition of interaction terms create a better model??Can any variable be eliminated??Does stepwise regression, forward selection, or backward elimination produce the same model? Comment and substantiate.4. Forecast and predict using the chosen model.?Find a 95% confidence interval for the mean value of Y for given value of X’s?Find a 95% prediction interval for a particular value of Y for a given value of X’s?Comment and discuss the two above intervals?Are there any influential observations? Comment and discuss.5. Conclusions?Are you comfortable with the model??If you had to re-build the model, what changes would you consider?6. Appendix?Include copy of the data set?Include figures, tables, outputs of analyses using software. Be sure to clearly label each so that you may refer to them in your write-up.
In an example we want to test whether the different variables affect the price of the residential home. The previous data of 150 residential apartment is shown below,
Home | Price | Home Size | Lot Size | Rooms | Bathrooms |
1 | 102000 | 600 | 0.5 | 3 | 1 |
2 | 146300 | 1050 | 0.43 | 5 | 1.5 |
3 | 182000 | 1800 | 0.68 | 7 | 1.5 |
4 | 110500 | 922 | 0.3 | 5 | 1 |
5 | 171900 | 1950 | 0.75 | 8 | 2.5 |
6 | 154000 | 1783 | 0.22 | 8 | 1.5 |
7 | 147000 | 1008 | 0.5 | 6 | 1 |
8 | 195900 | 1840 | 1.16 | 8 | 2 |
9 | 183500 | 3700 | 1.1 | 10 | 3 |
10 | 156500 | 1092 | 0.26 | 6 | 1 |
11 | 152000 | 1950 | 0.5 | 7 | 1.5 |
12 | 170000 | 1403 | 0.5 | 6 | 2 |
13 | 253000 | 1680 | 14.37 | 8 | 2 |
14 | 129500 | 1000 | 0.49 | 4 | 1 |
15 | 241900 | 2310 | 0.46 | 8 | 2.5 |
16 | 151900 | 1300 | 0.78 | 6 | 1 |
17 | 199000 | 1930 | 3 | 9 | 3 |
18 | 186000 | 3000 | 0.5 | 11 | 2.5 |
19 | 153500 | 1362 | 0.4 | 7 | 2 |
20 | 166000 | 1750 | 0.5 | 7 | 2 |
21 | 224900 | 2080 | 1 | 8 | 2.5 |
22 | 158500 | 1344 | 0.94 | 6 | 2 |
23 | 332000 | 2130 | 11.91 | 8 | 1.5 |
24 | 172000 | 1500 | 0.41 | 7 | 1 |
25 | 176000 | 2400 | 0.4 | 7 | 2.5 |
26 | 210000 | 2272 | 0.41 | 9 | 2.5 |
27 | 156500 | 1050 | 1 | 5 | 1 |
28 | 169500 | 1610 | 0.45 | 8 | 1.5 |
29 | 154900 | 1248 | 0.22 | 7 | 1 |
30 | 163000 | 2000 | 0.5 | 8 | 2 |
31 | 140000 | 1450 | 0.3 | 6 | 2 |
32 | 148500 | 1248 | 0.25 | 7 | 1 |
33 | 224500 | 2544 | 0.28 | 9 | 2.5 |
34 | 299900 | 2500 | 0.92 | 8 | 3 |
35 | 199900 | 2858 | 0.79 | 9 | 3 |
36 | 220000 | 1745 | 0.58 | 7 | 2.5 |
37 | 233000 | 2653 | 1.8 | 9 | 3 |
38 | 174900 | 1450 | 0.3 | 7 | 1 |
39 | 124000 | 850 | 0.11 | 4 | 1 |
40 | 169900 | 1839 | 2.6 | 7 | 1.5 |
41 | 213000 | 2016 | 0.78 | 8 | 2.5 |
42 | 165000 | 1625 | 0.36 | 7 | 1.5 |
43 | 162000 | 2000 | 0.11 | 8 | 2 |
44 | 211500 | 2250 | 0.33 | 9 | 2.5 |
45 | 166000 | 1300 | 0.3 | 7 | 1 |
46 | 194000 | 1956 | 0.5 | 8 | 2.5 |
47 | 192000 | 2496 | 0.75 | 9 | 2.5 |
48 | 171000 | 1575 | 0.25 | 7 | 1.5 |
49 | 226800 | 1960 | 1.33 | 8 | 2.5 |
50 | 155000 | 1200 | 0.33 | 5 | 1 |
51 | 157500 | 1296 | 0.5 | 9 | 1 |
52 | 297000 | 1950 | 18.7 | 7 | 2.5 |
53 | 315000 | 2516 | 8.1 | 7 | 2.5 |
54 | 161000 | 1066 | 0.33 | 5 | 1 |
55 | 193500 | 2276 | 1 | 8 | 2.5 |
56 | 163000 | 1908 | 0.46 | 7 | 2 |
57 | 180000 | 1122 | 3.09 | 5 | 2 |
58 | 171000 | 3500 | 1 | 10 | 2.5 |
59 | 163000 | 1100 | 0.33 | 6 | 1 |
60 | 220000 | 2300 | 5.63 | 7 | 2.5 |
61 | 155900 | 1118 | 0.56 | 7 | 1.5 |
62 | 219900 | 2464 | 0.43 | 8 | 2.5 |
63 | 185000 | 2100 | 0.58 | 8 | 1.5 |
64 | 172500 | 1552 | 0.46 | 6 | 1.5 |
65 | 167900 | 1856 | 0.33 | 7 | 1.5 |
66 | 160000 | 1800 | 0.3 | 7 | 1.5 |
67 | 147000 | 1248 | 0.3 | 6 | 1 |
68 | 210500 | 2000 | 0.6 | 9 | 2.5 |
69 | 192500 | 1848 | 0.5 | 7 | 2.5 |
70 | 138000 | 1036 | 0.95 | 6 | 1 |
71 | 200000 | 2277 | 0.8 | 8 | 3 |
72 | 186000 | 2300 | 0.65 | 7 | 3 |
73 | 217000 | 2080 | 1.23 | 8 | 2.5 |
74 | 180000 | 1600 | 1.84 | 7 | 2 |
75 | 195000 | 2680 | 0.5 | 9 | 3 |
76 | 149000 | 1200 | 0.25 | 7 | 1 |
77 | 165500 | 1526 | 0.3 | 7 | 1.5 |
78 | 175900 | 1680 | 0.5 | 6 | 1.5 |
79 | 156000 | 1232 | 0.31 | 6 | 2 |
80 | 235406 | 2465 | 1.55 | 8 | 2.5 |
81 | 215500 | 2800 | 1.68 | 9 | 1.5 |
82 | 225000 | 2265 | 0.85 | 8 | 2.5 |
83 | 155000 | 1300 | 0.65 | 5 | 1 |
84 | 190000 | 1900 | 1 | 8 | 2.5 |
85 | 126000 | 864 | 0.32 | 4 | 1 |
86 | 172000 | 2000 | 0.75 | 9 | 1.5 |
87 | 175000 | 1800 | 0.66 | 8 | 2.5 |
88 | 181500 | 1900 | 0.75 | 7 | 2 |
89 | 180000 | 1564 | 0.33 | 6 | 2 |
90 | 295000 | 2400 | 2 | 7 | 2 |
91 | 146000 | 1100 | 1.1 | 6 | 1 |
92 | 165000 | 1800 | 1 | 8 | 2.5 |
93 | 159000 | 1200 | 0.33 | 6 | 1 |
94 | 138500 | 1540 | 0.18 | 7 | 2 |
95 | 194900 | 1980 | 0.7 | 8 | 2.5 |
96 | 140000 | 1289 | 0.25 | 6 | 1 |
97 | 184000 | 1800 | 0.68 | 7 | 2 |
98 | 164000 | 1502 | 0.35 | 7 | 1.5 |
99 | 190000 | 2025 | 1.1 | 7 | 2 |
100 | 250000 | 3000 | 1.15 | 10 | 3.5 |
101 | 156500 | 1500 | 0.5 | 7 | 1.5 |
102 | 156500 | 1600 | 0.26 | 8 | 1.5 |
103 | 188000 | 1500 | 0.54 | 5 | 2.5 |
104 | 202000 | 2100 | 1 | 8 | 2.5 |
105 | 245000 | 2100 | 0.5 | 8 | 2.5 |
106 | 171900 | 1632 | 3 | 6 | 3 |
107 | 119900 | 1660 | 0.21 | 7 | 1 |
108 | 159900 | 1070 | 1.69 | 5 | 1 |
109 | 165000 | 1400 | 0.35 | 6 | 2 |
110 | 165000 | 1800 | 0.5 | 7 | 2 |
111 | 152500 | 1100 | 0.37 | 7 | 1 |
112 | 265000 | 3150 | 0.3 | 11 | 4 |
113 | 164500 | 2000 | 0.7 | 8 | 1 |
114 | 156500 | 1700 | 0.3 | 8 | 2 |
115 | 210000 | 1800 | 1.52 | 8 | 2.5 |
116 | 157500 | 1850 | 0.26 | 9 | 2 |
117 | 195000 | 2320 | 0.4 | 8 | 2.5 |
118 | 127000 | 1300 | 0.37 | 5 | 1 |
119 | 130000 | 1338 | 0.12 | 6 | 1 |
120 | 238000 | 2288 | 1.2 | 8 | 2.5 |
121 | 212000 | 2400 | 0.5 | 8 | 2.5 |
122 | 205000 | 2400 | 0.7 | 8 | 3 |
123 | 174900 | 1900 | 0.44 | 6 | 2 |
124 | 207000 | 2010 | 0.68 | 8 | 1.5 |
125 | 261750 | 2981 | 1.3 | 10 | 3.5 |
126 | 195000 | 1725 | 1.53 | 8 | 2.5 |
127 | 108000 | 821 | 2.3 | 4 | 1 |
128 | 209000 | 3060 | 0.75 | 8 | 2 |
129 | 115000 | 875 | 0.26 | 5 | 1 |
130 | 190000 | 1760 | 0.05 | 7 | 2 |
131 | 171000 | 2000 | 0.65 | 7 | 1 |
132 | 215000 | 2600 | 0.75 | 8 | 2 |
133 | 143500 | 1624 | 1.8 | 7 | 1.5 |
134 | 220000 | 2473 | 1.25 | 9 | 2.5 |
135 | 137000 | 1100 | 0.17 | 5 | 1 |
136 | 247000 | 3100 | 0.54 | 10 | 3.5 |
137 | 224500 | 2300 | 0.91 | 8 | 2.5 |
138 | 182000 | 1450 | 0.3 | 6 | 1.5 |
139 | 240000 | 2100 | 0.5 | 8 | 2.5 |
140 | 170000 | 1650 | 0.5 | 8 | 2.5 |
141 | 150500 | 1600 | 0.4 | 6 | 2 |
142 | 209900 | 2790 | 0.75 | 13 | 2.5 |
143 | 182500 | 1786 | 0.3 | 8 | 2 |
144 | 189000 | 1728 | 0.5 | 8 | 1.5 |
145 | 198500 | 1900 | 1.06 | 7 | 2.5 |
146 | 128000 | 1165 | 0.12 | 6 | 1 |
147 | 147500 | 1300 | 0.29 | 6 | 1 |
148 | 145000 | 1080 | 0.31 | 5 | 1 |
149 | 305000 | 2820 | 1 | 9 | 2.5 |
150 | 220000 | 2100 | 1.3 | 8 | 1.5 |
The regression analysis is done in excel by using following steps,
Step 1: Write the data value in excel. the sceenshot is shown below,
Step 2: DATA > Data Analysis > Regression > OK. The screenshot is shown below,
Step 3: Select Input Y Range: Price column, Input X Range: All the independent variables and tick on Confidence level = 95%. The screenshot is shown below,
The Result is obtained. The screenshot is shown below,
The multiple regression equation is,
The R-square value is 0.682428 which means model fits the 68.2428% of the data.
The F statistic value = 77.89748 < Significance F. There is a significant effect of independent variables on price.
The P-values of each indepent variables are,
P-value | |
Home Size | 0.000<0.05 |
Lot Size | 0.000<0.05 |
Rooms | 0.724>0.05 |
Bathrooms | 0.0008<0.05 |
The P-value is significant for all the variable except Rooms. Variable rooms is insignificant variable here. Hence this variable needs to be eliminated from the regression equation.
Let say the variable Home size and Bathroom has an interaction present. Now making the regression model by adding the new variable Home Size * Bathroom, Home Size * Lot Size and Lot Size and bathroom. The screenshot of result summary is shown below,
There is a significant effect of Home size * lot size and Lot size * Bathroom while Home size * Bathroom do not hava significant interaction effect.
The adjusted R square value is 0.7091 which is 70.91%.
By removing the insignificant variable Home size * Bathroom. The screenshot of summary output is shown below
R square value = 0.708452
(Note: Use forward stepwise method to add variable one by one)
The model will give the same result to whether select forward stepwise method or backward stepwise method.
Forcast of model using model with interaction
For Home size 800, Lot size = 0.75, bathroom = 1.5
Using the value in last model regression line,
1. Choose a data set of your own:?Response or dependent variable (Y)?At least 3 or more...
Here is data with y as the response variable. x y 32.2 18.3 34.1 25 40.8 2.2 26 0.2 20 50.8 46.1 27.7 24.8 -6.9 14.5 24.5 14.7 24.3 25.1 35.1 -96.1 63.5 Make a scatter plot of this data. Which point is an outlier? Hint: Copy the data and paste them into Excel. Then select the two columns. Then click Insert and choose the scatterplot icon. Answer: Enter as an ordered pair. For example (x,y) - with parenthesis. Find...
1. Using question 12 (delaying major purchases) as the response variable (Y) compute a regression model with the following questions 9, 25 (gender: males as 0 and females coded as 1) as your predictor variables. You will have to use the data set Economic Gun Legislation Survey Regression Exercise posted for Week 9 on the webpage. Please do the following in exactly this order: a. Excel Output b. Model: write down model like in form y- b, b,X, -b.X. +...
3. The following questions are related to a simulated data Y. The least square method was used to fit a model of the form Ý, 0 + At + t 1, . .200. The regression output, the ACF plot of the standardized residuals (after regressing Y on time) and the code is below. The code is not needed to answer the question a. Estimate the slope and the intercept of the least regression line for the data. b. What percentage...
Find the least squares regression line for the data points. (Let x be the independent variable and y be the dependent variable.) Graph the points and the line on the same set of axes 3 -3 3 -3
Find the least squares regression line for the data points. (Let x be the independent variable and y be the dependent variable.) Graph the points and the line on the same set of axes 3 -3 3 -3
Q6). Suppose that you want to fit two separate regression lines on the same data set - For the first least square fit, Y is the response variable and X is the predictor variable For the second least square fit, X is the response variable and Y is the predictor variable. (a). Show that the product of the slope estimates from the two regression lines is Show that the above two regression lines will never be perpendicular to each other...
Consider the following data for a dependent variable y and two independent variables, x1 and x2. x1 x2 y 30 12 94 47 10 108 25 17 112 51 16 178 40 5 94 51 19 175 74 7 170 36 12 117 59 13 142 76 16 211 The estimated regression equation for the data is ŷ = −18.4 + 2.01x1 + 4.74x2. (a) Develop a 95% confidence interval for the mean value of y when x1 = 65...
The regression model Yˆi = B_0 + B_1x_i has been adapted to a data set consisting of 23 observations (x_i, y_i) for i = 1, ..., 23. Using the least squares method, the estimates b_0 = 26.984 and b_1 = 0.748 have been found. Yˆ0 is the value of the custom model at point x_0. The following is stated x = 1/23 * Σn = 23, i = 1, x_i = 17, y = 1/23 * Σn = 23, i...
3. A regression analysis of a data set yielded the Least Squares regression line: y = 15.8 − 0.7 x. (a) What would be the estimated mean of Y when X = 9? (b) A student said: “ If we plug x = 10 in the equation of the regression line, then we get an estimate for the value of Y.” What is wrong with this statement? (c) By how much, and in what direction, would you expect Y to...
Help is needed on question 1. The second picture is the data
set “Showtime.xlsx” needed to answer the question .
Stat 351 Homework #5 (Section 15.8-16.1) Make sure to show your work if you did any caleulation, and Minitab output if you used Minitab. I. Please download the dataset "Showtime.xlsx" from Canvas. The dataset "Showtime.xlsx" gives the data on weekly gross revenue (y), television advertising (x1), and newspaper advertising (32) for Showtime Movie Theaters. Use Minitab to help you answer...
1) Consider n data points with 3 covariates and observations {xil, Гіг, xī,3, yi); i-1,.,n, and you fit the following model, y Bo+B+B32+Br+e that is yi-An + ßiXiut Ali,2 + Asri,3 + Ei where є,'s are independent normal distribution with mean zero and variance ơ2 For a observed covariate vector-(1, ri, ^2, r3) (with the intercept and three regressor variables) and observed yg at that point a) write the expression for estimated variance for the fit zs at z. (Let...