a) Scatter plot of Power consumption vs Avg Product Purity is shown below. The relationship looks non-linear from the plot below.
b) Correlation(Y, X1) = 0.0488
Correlation(Y, X2) = -0.0092
Looking at the low correlation coefficient in either case, both the independent variables show a weak linear relationship with the dependent variable.
c) Carrying out regression between Y, X1 and X2 in excel (go to Data tab-> Data Analysis -> Regression) we get the following output:
Regression Statistics | |
Multiple R | 0.050571764 |
R Square | 0.002557503 |
Adjusted R Square | -0.219096385 |
Standard Error | 27.16048318 |
Observations | 12 |
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 235.4441624 | 337.9871826 | 0.696606778 | 0.503641767 | -529.1359637 | 1000.024288 |
X1 (AvgProdPurity) | 0.540644691 | 3.619346314 | 0.149376336 | 0.884550424 | -7.646885498 | 8.72817488 |
X2 (TonsProduced) | -0.050252565 | 1.27775449 | -0.039328811 | 0.969486837 | -2.940734037 | 2.840228907 |
Hence, least squares regression line: Y = 235.44 + 0.541 * X1 - 0.0503 * X2
d) Coefficient of purity = 0.541,which is +ve suggesting a positive linear relationship with power consumed. However, the high p-value of 0.8845 suggests this correlation is weak
Coefficient of purity = -0.0503,which is +ve suggesting a negative linear relationship with power consumed. However, the high p-value of 0.9695 suggests this correlation is again very weak
e) Coefficient of determination, R-squared = 0.00256
=> only 0.256% of the variation in Y (Power consumption) is explained by the variation in X1 and X2
f) Given p-values for both avg product purity (X1) and tons of product produced (X2) are high (>>0.05), they are not useful predictors of power consumption
g) The p-value of both the coefficients is extremely high (0.8845 and 0.9695), much higher than 0.01, suggesting it is a very poor linear regression model where both the predictor variables fail in predicting the dependent variable at any significant level of accuracy.
h) For X1 = 90% and X2 = 98, we have
Y = 235.44 + 0.541 * 90 - 0.0503 * 98 = 279.2 (1000 kWh)
i) The assumption of the fitted model is that the errors in prediction are normally distributed, which doesn't look like the case from the prediction errors (Y_predicted - Y_observed) plotted below, which is far from normal, thus violating the most basic premise of regression.