Question

Eclipse Engineering provides services of structural engineering. They just opened a new branch in...

Eclipse Engineering provides services of structural engineering. They just opened a new branch in Portland, OR. One of their products is called Structural Insulated Panels, or SIPs. It is a type of foam insulation that replaces typical wall and roof framing for residential or commercial buildings. Clients are billed a fixed fee for each project based on square footage of the building, square footage of the panel area, and number of panels. Depending on how long a project takes to complete, there usually ends up being additional cost not billed to the client (fee burned). Eclipse engineering is using simple linear regression to model their costs. We will investigate the data to find the best model that can be used to predict what the fee burned will be for a particular project!

  1. Create a scatter plot for each explanatory variable and show the linear equation on each of the 3 scatter plots. (Hint: you will have to rearrange your data to create the scatter plot to list x first, then y for each.)
  2. Remove the one obvious outlier you can see in the Panel Area. (Your scatter plots should update automatically.)
  3. Pull 3 simple linear regression models (with outlier removed) and determine which is the best model.
  4. Pull 2 more models. One with all three explanatory variables and one with some combination of 2 of the variables. Identify your best model.
  5. Write out the sample multiple regression equation and use it to predict the fee burned for taking on a project with panel area = 5,000, floor area = 2500, and panels = 11.
Fee Burned Panel Area Floor Area # Panels
1759.92 2832 2482 12
2225 5670 2566 10
2885 6740 2632 23
1732.5 4478 1204 20
1662.5 4303 1680 11
390 2493 805 6
683 1804 1012 10
1138 14459 1296 9
683 2350 1184 18
748 2198 1215 10
130 3215 1209 9
163 2497 987 12
1108 4168 2270 14
857.5 2988 1160 9
928 4481 1853 11
458 1040 326 7
683 3350 1610 8

Please show all working by using Excel formulas/commands. Please write out any explanations where necessary. Please show all formulas and working out. Thank you. Please only use Excel commands and show which was used. Please show where you got your numbers from when talking about equations and etc. Thank you.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

SCATTERPLOT

panel area(X1) floor area(X2) # panel(X3) free burned(y)
2832 2482 12 1759.92
5670 2566 10 2225
6740 2632 23 2885
4478 1204 20 1732.5
4303 1680 11 1662.5
2493 805 6 390
1804 1012 10 683
14459 1296 9 1138
2350 1184 18 683
2198 1215 10 748
3215 1209 9 130
2497 987 12 163
4168 2270 14 1108
2988 1160 9 857.5
4481 1853 11 928
1040 326 7 458
3350 1610 8 683

LINEAR EQUATION : Y = a +bx

a = interceot

b = slope

b= (som of x.y * mean of x * mean of y)/ standard deviation of x * standard deviation of y

a = y bar - b * x bar

between panel area and fee burned- linear equation

panel area(X1) fee burned(y)
2832 1759.92
5670 2225
6740 2885
4478 1732.5
4303 1662.5
2493 390
1804 683
14459 1138
2350 683
2198 748
3215 130
2497 163
4168 1108
2988 857.5
4481 928
1040 458
3350 683

y 0.0996x + 667.98 scatter plot 01617 3500 2500 1000 500 0 2000 4000 6000 8000 10000 12000 14000 16000 panel area

linear equation : y= 0.0996*x +667.98

between floor area and fee burned- linear equation

floor area(X2) free burned(y)
2482 1759.92
2566 2225
2632 2885
1204 1732.5
1680 1662.5
805 390
1012 683
1296 1138
1184 683
1215 748
1209 130
987 163
2270 1108
1160 857.5
1853 928
326 458
1610 683

scatter plot y 0.8838x-252.57 R2 0.6048 3500 2500 2000 u 1500 1000 500 500 1000 1500 2000 2500 floor area

linear equation : y = 0.8838*x - 252.57

between# panel and fee burned- linear equation

# panel(X3) free burned(y)
12 1759.92
10 2225
23 2885
20 1732.5
11 1662.5
6 390
10 683
9 1138
18 683
10 748
9 130
12 163
14 1108
9 857.5
11 928
7 458
8 683

scatter plot y98.472x-80.088 R2 0.3655 3500 2500 2000 u 1500 1000 500 10 15 20 25 # panel

linear equation : y = 98472*x -80.088

after removing the outlier in the panel area

14459-panel area 1138- fee burned

y 0.4256x 383.86 scatter plot P250676586 3500 2500 E 2000 1500 1000 500 0 1000 2000 3000 4000 5000 6000 7000 8000 panel area

y = 0.4256*x -383.86

between panel area and fee burned after removal outlier

Y = -383.8617 + 0.4256 X1

Source

DF

Sum of Squares Mean Square F Statistic P-value
Regression
(between ŷi andyi bar)

1

5953159.9054

5953159.9054

26.8510

0.0001391

Residual
(between yi and ŷi)

14

3103955.7000

221711.1214

Total(between yi andyi bar)

15

9057115.6054

603807.7070

regression: sum of square = ( y hat - y bar)^2

residual : sum of square = (y - y hat )^2

F = mean square of regression/mean square of residual

Coeff

SE t-stat lower t0.025(14) upper t0.975(14)

Stand Coeff

p-value

VIF

intercept -383.8617 304.0027 -1.2627 -1035.8825 268.1592 0.000 0.2273
slope 0.4256 0.08212 5.1818 0.2494 0.6017 0.8107 0.0001391 1.0000

SE of intercept =

2 1(F) 2

X = independent variable

x bar = mean of the independent variable

Se = square root of [(sum of square of residual / (n-2)]

t test of intercept = intercept/ SE

SE of slope =

YX

Syx = square root of [(sum of square of residual / (n-2)]

slope t test = slope/ SE

Y and X relationship
R square (R2) equals 0.6573. It means that the predictors (Xi) explain 65.7% of the variance of Y.
Adjusted R square equals 0.6328.
The coefficient of correlation (R) equals 0.8107. It means that there is a very strong direct relationship between the predicted data (ŷ) and the observed data (y).

Goodness of fit
Overall regression: right-tailed, F(1,14) = 26.8510, p-value = 0.0001391. Since p-value < α (0.05), we reject the H0.
The linear regression model, Y = b0+ b1X1provides a better fit

All the independent variables (Xi) are significant.

The Y-intercept : two-tailed, T = -1.2627, p-value = 0.2273. Hence intercept is not significantly different from zero. It is still most likely recommended not to force b to be zero.

between floor area and fee burned - there is no outlier

Y = -252.5679 + 0.8838 X1

Source

DF

Sum of Squares Mean Square F Statistic P-value
Regression
(between ŷi andyi bar)

1

5480818.3840

5480818.3840

22.9589

0.0002378

Residual
(between yi and ŷi)

15

3580839.9055

238722.6604

Total(between yi andyibar)

16

9061658.2896

566353.6431

Coeff

SE t-stat lower t0.025(15) upper t0.975(15)

Stand Coeff

p-value

VIF

intercept -252.5679 300.8844 -0.8394 -893.8878 388.7521 0.000 0.4144
slope 0.8838 0.1844 4.7915 0.4906 1.2769 0.7777 0.0002378 1.0000

Y and X relationship
R square (R2) equals 0.6048. It means that the predictors (Xi) explain 60.5% of the variance of Y.
Adjusted R square equals 0.5785.
The coefficient of correlation (R) equals 0.7777. It means that there is a strong direct relationship between the predicted data (ŷ) and the observed data (y).

Goodness of fit
Overall regression: right-tailed, F(1,15) = 22.9589, p-value = 0.0002378. Since p-value < α (0.05), we reject the H0.
The linear regression model, Y = b0+ b1X1 provides a better fit t

All the independent variables (Xi) are significant.

The Y-intercept : two-tailed, T = -0.8394, p-value = 0.4144. Hence intercept is not significantly different from zero. It is still most likely recommended not to force b to be zero.

between # panel and fee burned - there is no outlier

y = -80.0880 + 98.4719 X1

Source

DF

Sum of Squares Mean Square F Statistic P-value
Regression
(between ŷi andyibar)

1

3311716.2806

3311716.2806

8.6393

0.01015

Residual
(between yi and ŷi)

15

5749942.0089

383329.4673

Total(between yi andyibar)

16

9061658.2896

566353.6431

Coeff

SE t-stat lower t0.025(15) upper t0.975(15)

Stand Coeff

p-value

VIF

intercept -80.0880 419.9374 -0.1907 -975.1634 814.9873 0.000 0.8513
slope 98.4719 33.5021 2.9393 27.0639 169.8800 0.6045 0.01015 1.0000

Y and X relationship
R square (R2) equals 0.3655. It means that the predictors (Xi) explain 36.5% of the variance of Y.
Adjusted R square equals 0.3232.
The coefficient of correlation (R) equals 0.6045. It means that there is a strong direct relationship between the predicted data (ŷ) and the observed data (y).

Goodness of fit
Overall regression: right-tailed, F(1,15) = 8.6393, p-value = 0.01015. Since p-value < α (0.05), we reject the H0.
The linear regression model, Y = b0+ b1X1, provides a better fit.

All the independent variables (Xi) are significant.

The Y-intercept : two-tailed, T = -0.1907, p-value = 0.8513. Hence intercept is not significantly different from zero. It is still most likely recommended not to force b to be zero.

so, the best model is panel area and fee burned relationship because there is a strong correlation after removal of the outlier and F value is highest p-value0.0001391.

6)

with three explanatory variable- panel area(x1) , floor area(x2) , # panel(x3) and fee burned

y = -740.43 + 0.0454749*X1 + 0.659653*X2 + 54.6017X3

Variable Parameter S.D. T-STAT
H0: parameter = 0
2-tail p-value 1-tail p-value
(Intercept) -740.4 336.6 -2.2000e+00 0.04651 0.02326
X1 +0.04548 0.03758 +1.2100e+00 0.2478 0.1239
X2 +0.6596 0.1887 +3.4960e+00 0.003946 0.001973
X3 +54.6 26.01 +2.1000e+00 0.05586 0.02793
Multiple Linear Regression - Regression Statistics
Multiple R 0.8532
R-squared 0.7279
Adjusted R-squared 0.6651
F-TEST (value) 11.59
F-TEST (DF numerator) (4-1) =3
F-TEST (DF denominator) 13
p-value 0.0005619
Multiple Linear Regression - Residual Statistics
Residual Standard Deviation 435.5
Sum Squared Residuals 2.465e+06

WITH two explanatory variable - floor area and # panel

y = -650.516 + 0.722435X2 + 54.6614V3X3

Variable Parameter S.D. T-STAT
H0: parameter = 0
2-tail p-value 1-tail p-value
(Intercept) -650.5 333.7 -1.9500e+00 0.07155 0.03578
X2 +0.7224 0.1844 +3.9170e+00 0.001548 0.0007738
X3 +54.66 26.43 +2.0680e+00 0.05766 0.02883
Multiple Linear Regression - Regression Statistics
Multiple R 0.835
R-squared 0.6973
Adjusted R-squared 0.654
F-TEST (value) 16.12
F-TEST (DF numerator) 2
F-TEST (DF denominator) 14
p-value 0.0002329
Multiple Linear Regression - Residual Statistics
Residual Standard Deviation 442.6
Sum Squared Residuals 2.743e+06

Based on F test and slope test of two models we can say that the second model is best ( two variable) because 69 % explained by this two variable if I include third variable help to explained extra 3 %. and coefficient of panel area is insignificant.

y = -740.43 + 0.0454749*X1 + 0.659653*X2 + 54.6017X3

x1 =5000

x2= 2500

x3 =11

y = -740.43 + 0.0454749*5000 + 0.659653*2500 + 54.6017*11

y = 1736.6957

Add a comment
Know the answer?
Add Answer to:
Eclipse Engineering provides services of structural engineering. They just opened a new branch in...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Eclipse Engineering provides services of structural engineering. They just opened a new branch in...

    Eclipse Engineering provides services of structural engineering. They just opened a new branch in Portland, OR. One of their products is called Structural Insulated Panels, or SIPs. It is a type of foam insulation that replaces typical wall and roof framing for residential or commercial buildings. Clients are billed a fixed fee for each project based on square footage of the building, square footage of the panel area, and number of panels. Depending on how long a project takes to...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT