Question

Question 2 (15p): Multicollinearity and heteroscedasticity a. What does heteroscedasticity mean? How does it affect estimates? b. What is multicollinearity and how does it affect estimates? Question 3 (15p): In estimating the effect of a beer tax on traffic accidents, assume as in the textbook
0 0
Add a comment Improve this question Transcribed image text
Answer #1

What is multicollinearity

Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others with a non-trivial degree of accuracy. In this situation the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictivepower or reliability of the model as a whole, at least within the sample data themselves; it only affects calculations regarding individual predictors. That is, a multiple regression model with correlated predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others.
Collinearity (or multicollinearity) is the undesirable situation where the correlations among the independent variables are strong (Central Michigan University, 2014).
Causes of multicollinearity Multicollinearity can be caused by the following:

1) Improper use of dummy variables (e.g. failure to exclude one category).

2) Including a variable that is computed from other variables in the equation (e.g. family income = husband’s income + wife’s income, and the regression includes all 3 income measures).
3) In effect, including the same or almost the same variable twice (height in feet and height in inches; or, more commonly, two different operationalizations of the same identical concept).
4) The above all imply some sort of error on the researcher’s part. But, it may just be that variables really and truly are highly correlated.
Consequences of multicollinearity

Multicollinearity increases the standard errors of the coefficients. Increased standard errors in turn means that coefficients for some independent variables may be found not to be significantly different from 0, whereas without multicollinearity and with lower standard errors, these same coefficients might have been found to be significant and the researcher may not have come to null findings in the first place. In other words, multicollinearity misleadingly inflates the standard errors. Thus, it makes some variables statistically insignificant while they should be otherwise significant.

In Multicollinearity some of the regressors(Independent variables) are highly correlated with each other. It will make the estimate highly in-stable. This instability will increase the variance of estimates. It means that if there is a small change in X, produces large changes in estimate.

Effects of Multicollinearity

It will be difficult to find the correct predictors from the set of predictors.It will be difficult to find out precise effect of each predictor.

If the estimates are not reliable, then it will perform poorly on test data because the estimated function might not have generalized it properly for the data. Then, the prediction accuracy for test data will be bad.

What is heteroscedasticity?
Heteroscedasticity means a situation in which the variance of the dependent variable varies across the data (The Institute for Statistics Education, 2014). Heteroscedasticity complicates analysis because many methods in regression analysis are based on an assumption of equal variance.
A collection of random variables is heteroscedastic if there are sub-populations that have different variabilities from others. Here "variability" could be quantified by the variance or any other measure of statistical dispersion. Thus heteroscedasticity is the absence of homoscedasticity. The possible existence of heteroscedasticity is a major concern in the application of regression analysis, including the analysis of variance, because the presence of heteroscedasticity can invalidate statistical tests of significance that assume that the modelling errors are uncorrelated and normally distributed and that their variances do not vary with the effects being modelled. Similarly, in testing for differences between subpopulations using a location test, some standard tests assume that variances within groups are equal.
Assumption 5 of the CLRM states that the disturbances should have a constant (equal) variance independent of t:
Var(ut)=σ2
Therefore, having an equal variance means that the disturbances are homoskedastic. If the homoskedasticity assumption is violated then Var(ut)=σt
2 Where the only difference is the subscript t, attached to the σt 2, which means that the
variance can change for every different observation in the sample t=1, 2, 3, 4, …, n. 4.2 Causes of heteroscedasticity
Heteroskedasticity arises most often with crosssectional data. Heteroskedasticity may occur under certain circumstances illustrated as below:
i.) If 100 students enroll in a typing class—some of which have typing experience and some of which do not. After the first class there would be a great deal of dispersion in the number of typing mistakes. After the final class the dispersion would be smaller. The error variance is nonconstant—it falls as time increases.
ii.) If we gathered data on the income and food expenditures of a large number of families, those with high levels of income may have a greater dispersion in food expenditures than those at lower income levels. Those with high incomes, can afford to eat whatever individual tastes dictate. Those with low incomes, everyone forced to eat the cheapest foods.
iii.) Errors may also increase as the values of an IV become more extreme in either direction, e.g. with attitudes that range from extremely negative to extremely positive.
iv.) Measurement error can cause heteroscedasticity. Some respondents might provide more accurate responses than others. (Note that this problem arises from the violation of another assumption, that variables are measured without error.)
v.) Heteroscedasticity can also occur if there are subpopulation differences or other interaction effects (e.g. the effect of income on expenditures differs for whites and blacks). (Again, the problem arises from violation of the assumption that no such differences exist or have already been incorporated into the model.)
vi.) Other model misspecifications can produce heteroskedasticity. For example, it may be that instead of using Y, you should be using the log of Y. Instead of using X, maybe you should be using X2, or both X and X2. Important variables may be omitted from the model. If the model were correctly specified, you might find that the patterns of heteroskedasticity disappeared.
Consequences of heteroscedasticity

The consequences of heteroscedasticity can be summarized as follows (Asteriou & Hall):
1. The OLS estimators are still unbiased and consistent. This is because none of the explanatory variables is correlated with the error term. So a correctly specified equation will give us values of estimated coefficient which are very close to the real parameters.
2. Affects the distribution of the estimated coefficients increasing the variances of the distributions and therefore making the OLS estimators inefficient.
3. Underestimates the variances of the estimators, leading to higher values of t and F statistics.
4. In addition, the standard errors are biased when heteroskedasticity is present. This in turn leads to bias in test statistics and confidence intervals.
5. Fortunately, unless heteroscedasticity is “marked,” significance tests are virtually unaffected, and thus OLS estimation can be used without concern of serious distortion. But, severe heteroscedasticity can sometimes be a problem.

Heteroscedasticity would not affect your parameter estimates and your beta coefficients are still unbiased. The problem however lies in the standard errors. The standard errors no longer follow a consistent t/f distribution resulting in invalid inference or computing the confidence intervals. One can simply ignore the heteroscedaisty and assume none exists. Then the resulting confidence intervals will be narrower (false result).

If you run a simple test like Breusch Pagan and reject the null then you most likely have this problem. We as econometricians like to draw valid conclusions and conducting this type of check is paramount in computing regressions. Furthermore, you can always correct this by using robust standard errors which come built in with most of tradition econometrics software.

Add a comment
Know the answer?
Add Answer to:
Question 2 (15p): Multicollinearity and heteroscedasticity a. What does heteroscedasticity mean? How does it affect estimates?...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT