Pick a minimum of 20 observations on any subject. This will include a dependent variable plus two independent variables that you may think are either negatively or positively correlated with the dependent variable. List the observed data (include the source). Then do the following:
a. State before doing any calculations whether you think they are positively or negatively correlated. What is your rationale?
Example: I test for a correlation between the quantity of coffee that people buy (Y) with the price of coffee (X1) and the household income (X2).
I hypothesize that there is a negative correlation between quantity and price because people like to buy goods at lower rather than higher prices. I also hypothesize that there is also a positive correlation between the quantity of coffee and household income because people can buy more coffee when their income increases.
b. Draw a graph of each of the two independent variables with the dependent variable either by hand or by using Excel. (Do this by inserting an XY/Scatter chart.)
c. Use Excel to do the necessary regression. Give the values for the y-intercept, b1 and b2. Write out the equation. Also show R-square, the F-statistic and its p-value and the t-statistics with their respective p-values.
d. Test for multicollinearity using the rule that the two independent variables are multicollinear if their correlation coefficient is .70 or greater (implying r-square is .49 or greater). If they are multicolliear, give a brief statement on why do you think that is the case.
e.Pretend that this was an assignment from your manager and communicate your findings to the manager in 100 words or less. You should assume in preparing this memo:
I ONLY NEED HELP WITH (E). THE REST IS JUST FOR REFERENCE. Thank you!!!
Answer: In a regression model, Y is known as the dependent variable, whose value depends upon some Xs, which are independent variables. In this case, the quantity of coffee bought depends on the price and the household income. Here, Y is the amount of coffee bought and X1 = price of coffee and X2 = household income.
In order to see whether there is a relationship between the given Y and Xs, we fit a regression model. This is a statistical model that helps us know whether a set of given variables affect the changes in a particular dependent variable and if so, what is the change that occurs in the dependent variable with 1 unit change in the independent variable/s. In order to fit this model, we use the following steps:
a. Draw the scatter plot of X vs Y. This helps us to see whether there is a linear relationship between the variables. Because we can perform linear regression only when there is a linear relationship between the dependent variable and the independent variable/s.
b. Fit the regression model. The regression model in this case will be given as
y = o + 1X1 + 2X2. Here, o is the y-intercept. It means that when the value of X1 and X2 is 0, this is the value of Y. It is the initial amount of coffee purchased, irrespective of the price or the household income.
Now, X1 = price of coffee. 1 is the amount by which the value of Y changes when there is a change of 1 unit in the value of X1. Thus, in this case, if the price of coffee increases by 1 unit, the amount of coffee purchases is affected by 1 units.
X2 = household income. 2 is the amount by which the value of Y changes when there is a change of 1 unit in the value of X2. Thus, in this case, if the household income increases by 1 unit, the amount of coffee purchased is affected by 2 units.
Now, in order for these independent variables to affect the dependent variable, 1 and 2 must not be equal to 0. In order to test this, we use the t-test for the coefficients. We hypothesize that the coefficients are 0 and use t-test to see if our hypothesis is true or not. If any one of the is 0, then we conclude that there is no relationship between Y and the given X. We conclude this on the basis of a p-value. This is the probability that for a given hypothesis, the t-stat obtained lies outside the acceptable range. For most tests, we use a 5% significance level for p-value. That is, if the p-value obtained is less than 0.05, then we conclude that is not 0, otherwise it's 0. The p-value comes as an output of the t-test for coefficients.
Also, since this is a model, it can be used to predict the values for the dependent variable for any simulated value of the independent variable. Thus, in order to do so, the model must be robust and accurate. This estimate is given through the value of R2.
R-squared (R2) is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. Whereas correlation explains the strength of the relationship between an independent and dependent variable, R-squared explains to what extent the variance of one variable explains the variance of the second variable. So, if the R2 of a model is 0.50, then approximately half of the observed variation can be explained by the model's inputs.
For a model to be a good fit, R2 must be high.
There's also a concept of multicollinearity. This usually happens when the independent variables are related to each other/correlated. In this case, there's always a bias if we take just the variables. We also need to consider their interaction effect. Thus, in this case, for example, if price of coffee and household income were related, we would introduce a third variable Coffee*Household Income, which would find the interaction effect of these variable on the amount of coffee purchased.
Pick a minimum of 20 observations on any subject. This will include a dependent variable plus...
In estimating a regression based on monthly observations from January 1987 to December 2002 inclusive, you find that the coefficient on the independent variable is positive and significant at the 0.05 level. You are concerned, however, that the t−statistic on the independent variable may be inflated because of serial correlation between the error terms. Therefore, you examine the Durbin-Watson statistic, which is 1.8953 for this regression. (3.1) Based on the value of the Durbin-Watson statistic, what can you say about...
Dummy Variable Regression: Choose any metric variable as the dependent variable (you can use the same one that you used in Part A) and choose gender as an independent variable. Also choose one more metric variable as an additional independent variable. Once again, however, you must sort through the metric independent variables until you find one that, along with gender, produces a significant F-calc. Use alpha = .05 here as well. You only need to report the model that produced...
The equation of the regression line between two variables x (independent variable) and y (dependent variable) is given by y-hat = -3x + 2; and the correlation coefficient is r = -.95. The possible x-values range from 1 to 10. Which of the following statements are correct? I. The variable y is strongly positive correlated to the variable x. II. The variable y is strongly negative correlated to the variable x. III. If x = 5, one would predict that...
2. According to Cohen's (1988) guidelines, an r of -0.56 would be considered a correlation 3. If two variables are correlated people who have low scores on one variable will tend to have low scores on the other variable. 4. Calculating a correlation coefficient is only appropriate when there is a relation between two variables. 5. A correlation value of would indicate that there was no association between the two variables. 6. regression enables one to predict an individual's score...
Consider a multiple regression model of the dependent variable y on independent variables x1, X2, X3, and x4: Using data with n 60 observations for each of the variables, a student obtains the following estimated regression equation for the model given: y0.35 0.58x1 + 0.45x2-0.25x3 - 0.10x4 He would like to conduct significance tests for a multiple regression relationship. He uses the F test to determine whether a significant relationship exists between the dependent variable and He uses the t...
A linear regression model found the following : Dependent variable : Quantity Independent variables : X1 X2 coefficient constant. 10 price. -2 Income. 3 R^2 = 0.83 t = 2.36 a. write the demand function as an equation b. do the sign of the coefficients make sense ? why? c. if price = 10, Income = 24 what is the predicted quantity sold? d. find the point price elasticity at price =10, Income = 24
13. Regressions for Decision Making (20 points) The station manager of a local television station is interested in predicting the will watch in the viewing area. The explanatory variables are: age (n years years), and family size (number of family members in household). The multiple regression n predicting the amount of television (in hours) that people education (highest level obtained, in output from Excel is shown 0 6644 05598 R-Square of Estimate ANOVA Table 13.9682 5.6413 4.6561 0.3134 14.8564 0,0000...
4. Testing for significance Aa Aa Consider a multiple regression model of the dependent variable y on independent variables x1, x2, X3, and x4: Using data with n = 60 observations for each of the variables, a student obtains the following estimated regression equation for the model given: 0.04 + 0.28X1 + 0.84X2-0.06x3 + 0.14x4 y She would like to conduct significance tests for a multiple regression relationship. She uses the F test to determine whether a significant relationship exists...
Consider the following results of a multiple regression model of dollar price of unleaded gas (dependent variable) and a set of independent variables: price of crude oil, value of S&P500, price U.S. Dollars against Euros, personal disposal income (in million of dollars) : Coefficient t-statistics Intercept 0.5871 68.90 Crude Oil 0.0651 32.89 S&P 500 -0.0020 18.09 Price of $ -0.0415 14.20 PDI 0.0001 17.32 R-Square = 97% What will be forecasted price of unleaded gas if the value of independent...
(16 pts) Suppose you have the output from an Excel linear regression. The dependent variable is ntrip, see definitions below Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.534 0.386 0.370 1.414 785 ANOVA df sS MS Regression Residual Total 2 113.5355 56.76777 156.2694 782 284.0761 0.363269 784 397.6116 Standard Coefficients Error tStat P-value Intercept hhsize wrkrcnt 1.500 0.250 0.150 0.049 20.7860.000 0.016 12.857 .000 0.027 5.551 0.000 NAME |Type- ntrip Numeric # of trips made...