Question

1. A researcher uses a simple linear regression to measure the relationship between the monthly salary (Salary measured in dollars) of data scientists and the number of years since being awarded a Master degree (Master Degree). A random sample of 80 observations was collected for the analysis. A researcher used the econometric model which has the following specification Salary,-β0 + β, Master-Degree, + εί, where i = 1, , 80 The (incomplete) Excel output of equation (1) is presented below SUMMARY OUTPUT Regression Statistics R-Square Observations ANOVA XxX 80 df MS Regression Residual Total XxX 126408.2 XxX 20514650.04 Standard Coefficients error t-stat Intercept Master_Degree 6589.12 150.95 136.14 13.52

a) What are the assumptions on the error term (Ei) a researcher has made in the (3 marks) (2 marks) simple linear regression (1)? b) State the estimated equation.

c) Interpret the slope coefficient. Is the slope coefficient in line with your expectation? (3 marks) (4 marks) e) Perform a test to determine whether there is a significant positive relationship d) Determine the 90% confidence interval for the slope coefficient. between the monthly salary and the number of years since being awarded a Master degree (at the 5% level of significance). (4 marks) f Calculate and interpret the coefficient of determination. g) Predict the expected monthly salary of a data scientist who has been awarded with a master degree for 20 years. (3 marks)

Simple Linear regression

0 0
Add a comment Improve this question Transcribed image text
Answer #1

The regression model being estimated is

Salary Bo+B1Master Degree, +e,

where Salary is the Monthly salary (in Dollars)

Master_Degree is the number of years since being awarded a master degree.

a) The error term is epsilon_istackrel{iid}sim mathcal{N}(0,sigma^2)

The assumptions are

  • The error term is i.i.d, that is the errors are independent and identically distributed.
  • The error terms are normally distributed with mean 0 and constant variance

b) The estimated values of the coefficients are

Coefficients Intercept Master_Degree 6589.12 150.95

The estimated value of intercept is 30 = 6589.12

The estimated value of the slope is 31 = 150.95

Ans: The estimated equation is

Salary 6589.12 + 150.95 × Master-Degree

c) The estimated value of the slope coefficient is 150.95. The positive value indicates that number of years since being awarded a master degree and the monthly salary move in the same direction. That is the monthly salary would increase with the increase in number of years since being awarded a master degree. For each year increase in the number of years since being awarded a master degree, the monthly salary increases by $150.95.

It is reasonable that monthly salary would increase with the increase in number of years since being awarded a master degree, provided the person is gainfully employed (as a data scientist?) during these years in the field of interest. That is there is a corresponding increase in the number of years of experience working as a data scientist.

d) 90% confidence interval indicates that a significance level of a 1-90/100 = 0.10 .

The critical t value is obtained using P(T > ta/2) = a/2 = 0.1/2 = 0.05 .

The number of observations is n=80. The degrees of freedom for t is n-2=80-2=78.

Using the t tables we can get the critical value of t for degrees of freedom df=60 as to/2 = 1.671 and for df=120 as 1.658.

the value for df=78 will be something between these 2 values. We can either interpolate of use excel to get the exact value.

Using =T.INV.2T(0.1,78) we get a t value of 1.665

to/2 = 1.665

We know the standard error of the slope estimate from the output

se(31) 13.52

The 90% confidence interval is

egin{align*} &hat{eta}_1pm t_{alpha/2}s.e(hat{eta}_1) implies &150.95pm 1.665 imes 13.52 implies &[128.44,173.46] end{align*}

ans: 90% confidence interval for the slope is [128.44,173.46]

e) There would be a positive relationship between monthly salary and number of years, if the slope coefficient is positive (that is >0).

That is we want to test the following hypotheses

Ho: B1 -0 null hypothesis: There is no positive relationship between salary and number of years Ha : β1 > 0 ← alternative hypothesis: There is a positive relationship between salary and number of years a 0.05 <-level of significance to test the hypotheses

The hypothesized value of the slope is egin{align*} eta_{1H_0}=0 end{align*}

The test statistics is

1Ho =11.165 s.e Bi 13.52

this is a 1 tailed (right tailed) test (The alternative hypothesis has ">")

the p-value is P(T>11.165).

Using the excel function, =T.DIST.RT(11.165,78) we get p-value=3.83E-18

We will reject the null hypothesis if the p-value is less than the significance level.

Here the p-value of 0.000 is less than the significance level 0.05.

Hence we reject the null hypothesis.

We conclude that there is a significant positive relationship between monthly salary and number of years.

f) Using the ANOVA table

df MS Regression Residual Total 126408.2 20514650.04

the degrees of freedom for Residuals is df=n-2=80-2=78

The Mean square residuals is MSE=126408.2.

The sum of square residuals is

egin{align*}& ext{MSE}=rac{ ext{SSE}}{df} implies & ext{SSE}= ext{MSE} imes df implies & ext{SSE}=126408.2 imes 78=9859839.6 end{align*}

Sum of square Total is SST=20514650.04

The coefficient of determination is

egin{align*}R^2=1-rac{ ext{SSE}}{ ext{SST}}=rac{9859839.6}{20514650.04}=0.5194 end{align*}

The value of coefficient of determination is 0.5194. It indicates that 51.94% of variation is monthly salary is explained by the variation is the number of years since being awarded a master degree.

g) The expected value of salary for master_degree=20 is

widehat{ ext{Salary}}=6589.12+150.95 imes 20=9608.12

The expected monthly salary of a data scientist who has been awarded with a master degree for 20 years is $9,608.12

Add a comment
Know the answer?
Add Answer to:
Simple Linear regression 1. A researcher uses a simple linear regression to measure the relationship between...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Question 6 (10 marks) Finally, the researcher considers using regression analysis to establish a linear relationship...

    Question 6 (10 marks) Finally, the researcher considers using regression analysis to establish a linear relationship between the two variables – hours worked per week and yearly income. a) What is the dependent variable and independent variable for this analysis? Why? (2 marks) b) Use an appropriate plot to investigate the relationship between the two variables. Display the plot. On the same plot, fit a linear trend line including the equation and the coefficient of determination R2 . (2 marks)...

  • #1 In simple linear regression, r is the: a) coefficient of determination. b) mean square error.    ...

    #1 In simple linear regression, r is the: a) coefficient of determination. b) mean square error.     c) correlation coefficient. d) squared residual. #2 In regression analysis, with the model in the form y = β0 + β1x + ε, x is the a) estimated regression equation. b) y-intercept. c) slope. d) independent variable. #3 A regression analysis between sales (y in $1,000s) and advertising (x in dollars) resulted in the following equation. ŷ = 40,000 + 3x The above equation...

  • Consider the simple linear regression model: HARD1 = β0 + β1*SCORE + є, where є ~...

    Consider the simple linear regression model: HARD1 = β0 + β1*SCORE + є, where є ~ N(0, σ). Note: HARD1 is the Rockwell hardness of 1% copper alloys and SCORE is the abrasion loss score. Assume all regression model assumptions hold. The following incomplete output was obtained from Excel. Consider also that the mean of x is 81.467 and SXX is 81.733. SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square 0.450969 Standard Error Observations 15 ANOVA df...

  • Problem 5- Simple Linear Regression The following data represent the number of flash drives sold per...

    Problem 5- Simple Linear Regression The following data represent the number of flash drives sold per day at a local computer shop and their prices Price $34 36 32 35 30 Units Sold 6 40 A computer output is produced to examine this relationship further SUMMA RY OUTPUT Regression Statistics Multiple R RSquare Adjusted R Square Standard Error Observations 0.924982 0.855592 0.826711 1.119949 7 ANOVA MS gnificance F Regression Residual Total 137.15714 37.15714 29.62415 0.002842 5 б,271429 1.254286 6 43.42857...

  • please help! Following is a simple linear regression model: y = a + A + &...

    please help! Following is a simple linear regression model: y = a + A + & The following results were obtained from some statistical software. R2 = 0.523 Syx (regression standard error) = 3.028 n (total observations) = 41 Significance level = 0.05 = 5% Variable Interecpt Slope of X Parameter Estimate 0.519 -0.707 Std. Err. of Parameter Est 0.132 0.239 Note: For all the calculated numbers, keep three decimals. Write the fitted model (5 points) 2. Make a prediction...

  • 2. In a typical simple linear regression model, explore the relationship between the expected value of change in the re...

    2. In a typical simple linear regression model, explore the relationship between the expected value of change in the response variable y and the value of the regressor x changed by 20 or 40 units. Describe the condition or assumption, if any, to meet for such exploration. 3. In a multiple linear regression model where x1 and x2 are two regressors. Explore the relationship between the expected value of change in the response variable y and the value of the...

  • a) The simple linear regression equation that shows the best relationship between the number of patients...

    a) The simple linear regression equation that shows the best relationship between the number of patients and year is (round your responses to three decimal places). y= _ + _x b) Using linear regression the number of patients Dr. Fok will see in year 11 = _____ patients (round your response to two decimal places). c) Using linear regression, the number of patients Dr. Fok will see in year 12 = _____ patients. (round your response to two decimal places)....

  • Question 3 A researcher is interested in the relationship between the birth weights of infants and mothers' smoking habits. He uses the birth weight of an infant (ounces) and the average n...

    Question 3 A researcher is interested in the relationship between the birth weights of infants and mothers' smoking habits. He uses the birth weight of an infant (ounces) and the average number of cigarettes the mother smokes per day during the pregnancy as the dependent and independent variables, y and x, respectively. Using a sample of size (1388 the following model is obtained by the method of least squares: y-119.770.514.x (3.15) (0.13) SE e the figures in brackets are the...

  • Consider a researcher who is trying to analyze the relationship between a persons number of years...

    Consider a researcher who is trying to analyze the relationship between a persons number of years of schooling, Si, and their hourly wages ($) in the labor market, W. Using adminis trative data (so, containing observations which represent a lot of different (wage,schooling) observation) the researcher wants to estimate the following model: W; = 312Sui 1. In this specific case, what could the disturbance term ui capture (give 2 examples with short explanations of why these are included in the...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT