The regression model being estimated is
where Salary is the Monthly salary (in Dollars)
Master_Degree is the number of years since being awarded a master degree.
a) The error term is
The assumptions are
b) The estimated values of the coefficients are
The estimated value of intercept is
The estimated value of the slope is
Ans: The estimated equation is
c) The estimated value of the slope coefficient is 150.95. The positive value indicates that number of years since being awarded a master degree and the monthly salary move in the same direction. That is the monthly salary would increase with the increase in number of years since being awarded a master degree. For each year increase in the number of years since being awarded a master degree, the monthly salary increases by $150.95.
It is reasonable that monthly salary would increase with the increase in number of years since being awarded a master degree, provided the person is gainfully employed (as a data scientist?) during these years in the field of interest. That is there is a corresponding increase in the number of years of experience working as a data scientist.
d) 90% confidence interval indicates that a significance level of .
The critical t value is obtained using .
The number of observations is n=80. The degrees of freedom for t is n-2=80-2=78.
Using the t tables we can get the critical value of t for degrees of freedom df=60 as and for df=120 as 1.658.
the value for df=78 will be something between these 2 values. We can either interpolate of use excel to get the exact value.
Using =T.INV.2T(0.1,78) we get a t value of 1.665
We know the standard error of the slope estimate from the output
The 90% confidence interval is
ans: 90% confidence interval for the slope is [128.44,173.46]
e) There would be a positive relationship between monthly salary and number of years, if the slope coefficient is positive (that is >0).
That is we want to test the following hypotheses
The hypothesized value of the slope is
The test statistics is
this is a 1 tailed (right tailed) test (The alternative hypothesis has ">")
the p-value is P(T>11.165).
Using the excel function, =T.DIST.RT(11.165,78) we get p-value=3.83E-18
We will reject the null hypothesis if the p-value is less than the significance level.
Here the p-value of 0.000 is less than the significance level 0.05.
Hence we reject the null hypothesis.
We conclude that there is a significant positive relationship between monthly salary and number of years.
f) Using the ANOVA table
the degrees of freedom for Residuals is df=n-2=80-2=78
The Mean square residuals is MSE=126408.2.
The sum of square residuals is
Sum of square Total is SST=20514650.04
The coefficient of determination is
The value of coefficient of determination is 0.5194. It indicates that 51.94% of variation is monthly salary is explained by the variation is the number of years since being awarded a master degree.
g) The expected value of salary for master_degree=20 is
The expected monthly salary of a data scientist who has been awarded with a master degree for 20 years is $9,608.12
Simple Linear regression 1. A researcher uses a simple linear regression to measure the relationship between...
Question 6 (10 marks) Finally, the researcher considers using regression analysis to establish a linear relationship between the two variables – hours worked per week and yearly income. a) What is the dependent variable and independent variable for this analysis? Why? (2 marks) b) Use an appropriate plot to investigate the relationship between the two variables. Display the plot. On the same plot, fit a linear trend line including the equation and the coefficient of determination R2 . (2 marks)...
#1 In simple linear regression, r is the: a) coefficient of determination. b) mean square error. c) correlation coefficient. d) squared residual. #2 In regression analysis, with the model in the form y = β0 + β1x + ε, x is the a) estimated regression equation. b) y-intercept. c) slope. d) independent variable. #3 A regression analysis between sales (y in $1,000s) and advertising (x in dollars) resulted in the following equation. ŷ = 40,000 + 3x The above equation...
Consider the simple linear regression model: HARD1 = β0 + β1*SCORE + є, where є ~ N(0, σ). Note: HARD1 is the Rockwell hardness of 1% copper alloys and SCORE is the abrasion loss score. Assume all regression model assumptions hold. The following incomplete output was obtained from Excel. Consider also that the mean of x is 81.467 and SXX is 81.733. SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square 0.450969 Standard Error Observations 15 ANOVA df...
Problem 5- Simple Linear Regression The following data represent the number of flash drives sold per day at a local computer shop and their prices Price $34 36 32 35 30 Units Sold 6 40 A computer output is produced to examine this relationship further SUMMA RY OUTPUT Regression Statistics Multiple R RSquare Adjusted R Square Standard Error Observations 0.924982 0.855592 0.826711 1.119949 7 ANOVA MS gnificance F Regression Residual Total 137.15714 37.15714 29.62415 0.002842 5 б,271429 1.254286 6 43.42857...
please help! Following is a simple linear regression model: y = a + A + & The following results were obtained from some statistical software. R2 = 0.523 Syx (regression standard error) = 3.028 n (total observations) = 41 Significance level = 0.05 = 5% Variable Interecpt Slope of X Parameter Estimate 0.519 -0.707 Std. Err. of Parameter Est 0.132 0.239 Note: For all the calculated numbers, keep three decimals. Write the fitted model (5 points) 2. Make a prediction...
2. In a typical simple linear regression model, explore the relationship between the expected value of change in the response variable y and the value of the regressor x changed by 20 or 40 units. Describe the condition or assumption, if any, to meet for such exploration. 3. In a multiple linear regression model where x1 and x2 are two regressors. Explore the relationship between the expected value of change in the response variable y and the value of the...
a) The simple linear regression equation that shows the best relationship between the number of patients and year is (round your responses to three decimal places). y= _ + _x b) Using linear regression the number of patients Dr. Fok will see in year 11 = _____ patients (round your response to two decimal places). c) Using linear regression, the number of patients Dr. Fok will see in year 12 = _____ patients. (round your response to two decimal places)....
Question 3 A researcher is interested in the relationship between the birth weights of infants and mothers' smoking habits. He uses the birth weight of an infant (ounces) and the average number of cigarettes the mother smokes per day during the pregnancy as the dependent and independent variables, y and x, respectively. Using a sample of size (1388 the following model is obtained by the method of least squares: y-119.770.514.x (3.15) (0.13) SE e the figures in brackets are the...
Consider a researcher who is trying to analyze the relationship between a persons number of years of schooling, Si, and their hourly wages ($) in the labor market, W. Using adminis trative data (so, containing observations which represent a lot of different (wage,schooling) observation) the researcher wants to estimate the following model: W; = 312Sui 1. In this specific case, what could the disturbance term ui capture (give 2 examples with short explanations of why these are included in the...