Question

2. In a multiple regression analysis, describe how to detect each of the following phenomenon and indicate the steps you woul
0 0
Add a comment Improve this question Transcribed image text
Answer #1

Lack of Fit
It is a measure which tells us whether a regression model is a poor model of the data. This may be because we made a poor choice of variables, or it may be because important terms weren’t included. It can also be because of poor experimental design. If unusually large residuals or errors appear when fitting the model, we know we have lack-of-fit.

Tests Used to Determine Lack of Fit

A variety of tests can be used to identify lack-of-fit in statistical models. These include:

Goodness of fit
Lack-of-fit F-Test/ sum of squares
Ljung Box Test

Correcting Lack of Fit

Correcting lack of fit in a model usually involves rewriting the model to fit the data better. This may be by adding a quadratic term, changing a linear regression model to a polynomial regression model, for instance.

Sometimes, what it points to is poor experimental design. This could suggest we redesign our experiment to get more accurate data or expand our sampling to get more data points that can provide a more complete picture. If the model was in fact an accurate description of the situation, a combination of these methods will change the fit to a good one.

Heteroscedasticity is a problem because ordinary least squares(OLS) regression assumes that all residuals are drawn from a population that has a constant variance (homoscedasticity).

Detection :

Levene's test
Goldfeld–Quandt test
Park test
Glejser test
Brown–Forsythe test
Harrison–McCabe test
Breusch–Pagan test
White test
Cook–Weisberg test

Remedial measures :

  • View logarithmized data. Non-logarithmized series that are growing exponentially often appear to have increasing variability as the series rises over time. The variability in percentage terms may, however, be rather stable.
  • Use a different specification for the model (different X variables, or perhaps non-linear transformations of the X variables).
  • Apply a weighted least squares estimation method, in which OLS is applied to transformed or weighted values of X and Y. The weights vary over observations, usually depending on the changing error variances. In one variation the weights are directly related to the magnitude of the dependent variable, and this corresponds to least squares percentage regression.
  • Heteroscedasticity-consistent standard errors (HCSE), while still biased, improve upon OLS estimates.HCSE is a consistent estimator of standard errors in regression models with heteroscedasticity. This method corrects for heteroscedasticity without altering the values of the coefficients. This method may be superior to regular OLS because if heteroscedasticity is present it corrects for it, however, if the data is homoscedastic, the standard errors are equivalent to conventional standard errors estimated by OLS. Several modifications of the White method of computing heteroscedasticity-consistent standard errors have been proposed as corrections with superior finite sample properties.

Multicollinearity:

Multicollinearity occurs when independent variables in a regression model are correlated. This correlation is a problem because independent variables should be independent. If the degree of correlation between variables is high enough, it can cause problems when you fit the model and interpret the results

Detection .:

  1. Large changes in the estimated regression coefficients when a predictor variable is added or deleted
  2. Insignificant regression coefficients for the affected variables in the multiple regression, but a rejection of the joint hypothesis that those coefficients are all zero (using an F-test)
  3. Variation Inflation Factor (VIF)
  4. Farrar–Glauber test
  5. Condition number

Remedial measures :

  1. Make sure you have not fallen into the dummy variable trap; including a dummy variable for every category (e.g., summer, autumn, winter, and spring) and including a constant term in the regression together guarantee perfect multicollinearity.
  2. Try seeing what happens if you use independent subsets of your data for estimation and apply those estimates to the whole data set. Theoretically you should obtain somewhat higher variance from the smaller datasets used for estimation, but the expectation of the coefficient values should be the same. Naturally, the observed coefficient values will vary, but look at how much they vary.
  3. Leave the model as is, despite multicollinearity. The presence of multicollinearity doesn't affect the efficiency of extrapolating the fitted model to new data provided that the predictor variables follow the same pattern of multicollinearity in the new data as in the data on which the regression model is based.
  4. Drop one of the variables. An explanatory variable may be dropped to produce a model with significant coefficients. However, you lose information (because you've dropped a variable). Omission of a relevant variable results in biased coefficient estimates for the remaining explanatory variables that are correlated with the dropped variable.
  5. Obtain more data, if possible. This is the preferred solution. More data can produce more precise parameter estimates (with lower standard errors), as seen from the formula in variance inflation factor for the variance of the estimate of a regression coefficient in terms of the sample size and the degree of multicollinearity.
  6. Mean-center the predictor variables. Generating polynomial terms (i.e., for {\displaystyle x_{1}}x_{1}, {\displaystyle x_{1}^{2}}x_{1}^{2}, {\displaystyle x_{1}^{3}}x_{1}^{3}, etc.) or interaction terms (i.e., {\displaystyle x_{1}\times x_{2}}{\displaystyle x_{1}\times x_{2}}, etc.) can cause some multicollinearity if the variable in question has a limited range (e.g., [2,4]). Mean-centering will eliminate this special kind of multicollinearity. However, in general, this has no effect. It can be useful in overcoming problems arising from rounding and other computational steps if a carefully designed computer program is not used.
  7. Standardize your independent variables. This may help reduce a false flagging of a condition index above 30.
  8. It has also been suggested that using the Shapley value, a game theory tool, the model could account for the effects of multicollinearity. The Shapley value assigns a value for each predictor and assesses all possible combinations of importance.
  9. Ridge regression or principal component regression or partial least squares regression can be used.
  10. If the correlated explanators are different lagged values of the same underlying explanator, then a distributed lagtechnique can be used, imposing a general structure on the relative values of the coefficients to be estimated

Influential Points :

An influential point is an outlier that greatly affects the slope of the regression line. One way to test the influence of an outlier is to compute the regression equation with and without the outlier.

Detection:

  1. Difference in fits (DFFITS)
  2. Cook's distance

Remedial measures :

  1. Research the observations
  2. Bootstrapping
  3. Robust estimation method  
Add a comment
Know the answer?
Add Answer to:
2. In a multiple regression analysis, describe how to detect each of the following phenomenon and...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT