Eclipse Engineering provides services of structural engineering. They just opened a new branch in...

Question 1

Question

Eclipse Engineering provides services of structural engineering. They just opened a new branch in...

Eclipse Engineering provides services of structural engineering. They just opened a new branch in Portland, OR. One of their products is called Structural Insulated Panels, or SIPs. It is a type of foam insulation that replaces typical wall and roof framing for residential or commercial buildings. Clients are billed a fixed fee for each project based on square footage of the building, square footage of the panel area, and number of panels. Depending on how long a project takes to complete, there usually ends up being additional cost not billed to the client (fee burned). Eclipse engineering is using simple linear regression to model their costs. We will investigate the data to find the best model that can be used to predict what the fee burned will be for a particular project!

Create a scatter plot for each explanatory variable and show the linear equation on each of the 3 scatter plots. (Hint: you will have to rearrange your data to create the scatter plot to list x first, then y for each.)
Remove the one obvious outlier you can see in the Panel Area. (Your scatter plots should update automatically.)
Pull 3 simple linear regression models (with outlier removed) and determine which is the best model.
Pull 2 more models. One with all three explanatory variables and one with some combination of 2 of the variables. Identify your best model.
Write out the sample multiple regression equation and use it to predict the fee burned for taking on a project with panel area = 5,000, floor area = 2500, and panels = 11.

Fee Burned	Panel Area	Floor Area	# Panels
1759.92	2832	2482	12
2225	5670	2566	10
2885	6740	2632	23
1732.5	4478	1204	20
1662.5	4303	1680	11
390	2493	805	6
683	1804	1012	10
1138	14459	1296	9
683	2350	1184	18
748	2198	1215	10
130	3215	1209	9
163	2497	987	12
1108	4168	2270	14
857.5	2988	1160	9
928	4481	1853	11
458	1040	326	7
683	3350	1610	8

Please show all working by using Excel formulas/commands. Please write out any explanations where necessary. Please show all formulas and working out. Thank you. Please only use Excel commands and show which was used. Please show where you got your numbers from when talking about equations and etc. Thank you.

math Statistics-And-Probability

Add a comment Improve this question Transcribed image text

Answer

Answer #1

SCATTERPLOT

panel area(X1)	floor area(X2)	# panel(X3)	free burned(y)
2832	2482	12	1759.92
5670	2566	10	2225
6740	2632	23	2885
4478	1204	20	1732.5
4303	1680	11	1662.5
2493	805	6	390
1804	1012	10	683
14459	1296	9	1138
2350	1184	18	683
2198	1215	10	748
3215	1209	9	130
2497	987	12	163
4168	2270	14	1108
2988	1160	9	857.5
4481	1853	11	928
1040	326	7	458
3350	1610	8	683

LINEAR EQUATION : Y = a +bx

a = interceot

b = slope

b= (som of x.y * mean of x * mean of y)/ standard deviation of x * standard deviation of y

a = y bar - b * x bar

between panel area and fee burned- linear equation

panel area(X1)	fee burned(y)
2832	1759.92
5670	2225
6740	2885
4478	1732.5
4303	1662.5
2493	390
1804	683
14459	1138
2350	683
2198	748
3215	130
2497	163
4168	1108
2988	857.5
4481	928
1040	458
3350	683

y 0.0996x + 667.98 scatter plot 01617 3500 2500 1000 500 0 2000 4000 6000 8000 10000 12000 14000 16000 panel area

linear equation : y= 0.0996*x +667.98

between floor area and fee burned- linear equation

floor area(X2)	free burned(y)
2482	1759.92
2566	2225
2632	2885
1204	1732.5
1680	1662.5
805	390
1012	683
1296	1138
1184	683
1215	748
1209	130
987	163
2270	1108
1160	857.5
1853	928
326	458
1610	683

scatter plot y 0.8838x-252.57 R2 0.6048 3500 2500 2000 u 1500 1000 500 500 1000 1500 2000 2500 floor area

linear equation : y = 0.8838*x - 252.57

between# panel and fee burned- linear equation

# panel(X3)	free burned(y)
12	1759.92
10	2225
23	2885
20	1732.5
11	1662.5
6	390
10	683
9	1138
18	683
10	748
9	130
12	163
14	1108
9	857.5
11	928
7	458
8	683

scatter plot y98.472x-80.088 R2 0.3655 3500 2500 2000 u 1500 1000 500 10 15 20 25 # panel

linear equation : y = 98472*x -80.088

after removing the outlier in the panel area

14459-panel area

1138- fee burned

y 0.4256x 383.86 scatter plot P250676586 3500 2500 E 2000 1500 1000 500 0 1000 2000 3000 4000 5000 6000 7000 8000 panel area

y = 0.4256*x -383.86

between panel area and fee burned after removal outlier

Y = -383.8617 + 0.4256 X₁

Source	DF	Sum of Squares	Mean Square	F Statistic	P-value
Regression (between ŷ_i andy_{i bar})	1	5953159.9054	5953159.9054	26.8510	0.0001391
Residual (between y_i and ŷ_i)	14	3103955.7000	221711.1214
Total(between y_i andy_i bar)	15	9057115.6054	603807.7070

regression: sum of square = ( y hat - y bar)^2

residual : sum of square = (y - y hat )^2

F = mean square of regression/mean square of residual

	Coeff	SE	t-stat	lower t_0.025(14)	upper t_0.975(14)	Stand Coeff	p-value	VIF
intercept	-383.8617	304.0027	-1.2627	-1035.8825	268.1592	0.000	0.2273
slope	0.4256	0.08212	5.1818	0.2494	0.6017	0.8107	0.0001391	1.0000

SE of intercept =

2 1(F) 2

X = independent variable

x bar = mean of the independent variable

Se = square root of [(sum of square of residual / (n-2)]

t test of intercept = intercept/ SE

SE of slope =

Syx = square root of [(sum of square of residual / (n-2)]

slope t test = slope/ SE

Y and X relationship
R square (R²) equals 0.6573. It means that the predictors (X_i) explain 65.7% of the variance of Y.
Adjusted R square equals 0.6328.
The coefficient of correlation (R) equals 0.8107. It means that there is a very strong direct relationship between the predicted data (ŷ) and the observed data (y).

Goodness of fit
Overall regression: right-tailed, F_(1,14) = 26.8510, p-value = 0.0001391. Since p-value < α (0.05), we reject the H₀.
The linear regression model, Y = b₀+ b₁X₁provides a better fit

All the independent variables (Xi) are significant.

The Y-intercept : two-tailed, T = -1.2627, p-value = 0.2273. Hence intercept is not significantly different from zero. It is still most likely recommended not to force b to be zero.

between floor area and fee burned - there is no outlier

Y = -252.5679 + 0.8838 X₁

Source	DF	Sum of Squares	Mean Square	F Statistic	P-value
Regression (between ŷ_i andy_{i bar})	1	5480818.3840	5480818.3840	22.9589	0.0002378
Residual (between y_i and ŷ_i)	15	3580839.9055	238722.6604
Total(between y_i andy_ibar)	16	9061658.2896	566353.6431

	Coeff	SE	t-stat	lower t_0.025(15)	upper t_0.975(15)	Stand Coeff	p-value	VIF
intercept	-252.5679	300.8844	-0.8394	-893.8878	388.7521	0.000	0.4144
slope	0.8838	0.1844	4.7915	0.4906	1.2769	0.7777	0.0002378	1.0000

Y and X relationship
R square (R²) equals 0.6048. It means that the predictors (X_i) explain 60.5% of the variance of Y.
Adjusted R square equals 0.5785.
The coefficient of correlation (R) equals 0.7777. It means that there is a strong direct relationship between the predicted data (ŷ) and the observed data (y).

Goodness of fit
Overall regression: right-tailed, F_(1,15) = 22.9589, p-value = 0.0002378. Since p-value < α (0.05), we reject the H₀.
The linear regression model, Y = b₀+ b₁X₁ provides a better fit t

All the independent variables (Xi) are significant.

The Y-intercept : two-tailed, T = -0.8394, p-value = 0.4144. Hence intercept is not significantly different from zero. It is still most likely recommended not to force b to be zero.

between # panel and fee burned - there is no outlier

y = -80.0880 + 98.4719 X₁

Source	DF	Sum of Squares	Mean Square	F Statistic	P-value
Regression (between ŷ_i andy_ibar)	1	3311716.2806	3311716.2806	8.6393	0.01015
Residual (between y_i and ŷ_i)	15	5749942.0089	383329.4673
Total(between y_i andy_ibar)	16	9061658.2896	566353.6431

	Coeff	SE	t-stat	lower t_0.025(15)	upper t_0.975(15)	Stand Coeff	p-value	VIF
intercept	-80.0880	419.9374	-0.1907	-975.1634	814.9873	0.000	0.8513
slope	98.4719	33.5021	2.9393	27.0639	169.8800	0.6045	0.01015	1.0000

Y and X relationship
R square (R²) equals 0.3655. It means that the predictors (X_i) explain 36.5% of the variance of Y.
Adjusted R square equals 0.3232.
The coefficient of correlation (R) equals 0.6045. It means that there is a strong direct relationship between the predicted data (ŷ) and the observed data (y).

Goodness of fit
Overall regression: right-tailed, F_(1,15) = 8.6393, p-value = 0.01015. Since p-value < α (0.05), we reject the H₀.
The linear regression model, Y = b₀+ b₁X₁, provides a better fit.

All the independent variables (Xi) are significant.

The Y-intercept : two-tailed, T = -0.1907, p-value = 0.8513. Hence intercept is not significantly different from zero. It is still most likely recommended not to force b to be zero.

so, the best model is panel area and fee burned relationship because there is a strong correlation after removal of the outlier and F value is highest p-value0.0001391.

6)

with three explanatory variable- panel area(x1) , floor area(x2) , # panel(x3) and fee burned

y = -740.43 + 0.0454749*X1 + 0.659653*X2 + 54.6017X3

Variable	Parameter	S.D.	T-STAT H0: parameter = 0	2-tail p-value	1-tail p-value
(Intercept)	-740.4	336.6	-2.2000e+00	0.04651	0.02326
X1	+0.04548	0.03758	+1.2100e+00	0.2478	0.1239
X2	+0.6596	0.1887	+3.4960e+00	0.003946	0.001973
X3	+54.6	26.01	+2.1000e+00	0.05586	0.02793

Multiple Linear Regression - Regression Statistics
Multiple R	0.8532
R-squared	0.7279
Adjusted R-squared	0.6651
F-TEST (value)	11.59
F-TEST (DF numerator)	(4-1) =3
F-TEST (DF denominator)	13
p-value	0.0005619
Multiple Linear Regression - Residual Statistics
Residual Standard Deviation	435.5
Sum Squared Residuals	2.465e+06

WITH two explanatory variable - floor area and # panel

y = -650.516 + 0.722435X2 + 54.6614V3X3

Variable	Parameter	S.D.	T-STAT H0: parameter = 0	2-tail p-value	1-tail p-value
(Intercept)	-650.5	333.7	-1.9500e+00	0.07155	0.03578
X2	+0.7224	0.1844	+3.9170e+00	0.001548	0.0007738
X3	+54.66	26.43	+2.0680e+00	0.05766	0.02883

Multiple Linear Regression - Regression Statistics
Multiple R	0.835
R-squared	0.6973
Adjusted R-squared	0.654
F-TEST (value)	16.12
F-TEST (DF numerator)	2
F-TEST (DF denominator)	14
p-value	0.0002329
Multiple Linear Regression - Residual Statistics
Residual Standard Deviation	442.6
Sum Squared Residuals	2.743e+06

Based on F test and slope test of two models we can say that the second model is best ( two variable) because 69 % explained by this two variable if I include third variable help to explained extra 3 %. and coefficient of panel area is insignificant.

y = -740.43 + 0.0454749*X1 + 0.659653*X2 + 54.6017X3

x1 =5000

x2= 2500

x3 =11

y = -740.43 + 0.0454749*5000 + 0.659653*2500 + 54.6017*11

y = 1736.6957

Add a comment

Answer