Would the traditional R squared be a good measure of fit when the dependent variable is...

Question

Question

Would the traditional R squared be a good measure of fit when the dependent variable is...

Would the traditional R squared be a good measure of fit when the dependent variable is dichotomous? Why? Provide an example please thank you!

business economics

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. R-squared measures the strength of the relationship between your model and the dependent variable on a convenient 0 – 100% scale.

After fitting a linear regression model, you need to determine how well the model fits the data. Does it do a good job of explaining changes in the dependent variable? There are a several key goodness-of-fit statistics for regression analysis. In this post, we’ll examine R-squared (R2 ), highlight some of its limitations, and discover some surprises. For instance, small R-squared values are not always a problem, and high R-squared values are not necessarily good!

Assessing Goodness-of-Fit in a Regression Model

Residuals are the distance between the observed value and the fitted value.

Linear regression identifies the equation that produces the smallest difference between all of the observed values and their fitted values. To be precise, linear regression finds the smallest sum of squared residuals that is possible for the dataset.

Statisticians say that a regression model fits the data well if the differences between the observations and the predicted values are small and unbiased. Unbiased in this context means that the fitted values are not systematically too high or too low anywhere in the observation space.

However, before assessing numeric measures of goodness-of-fit, like R-squared, you should evaluate the residual plots. Residual plots can expose a biased model far more effectively than the numeric output by displaying problematic patterns in the residuals. If your model is biased, you cannot trust the results. If your residual plots look good, go ahead and assess your R-squared and other statistics.

Read my post about checking the residual plots.

R-squared and the Goodness-of-Fit

R-squared evaluates the scatter of the data points around the fitted regression line. It is also called the coefficient of determination, or the coefficient of multiple determination for multiple regression. For the same data set, higher R-squared values represent smaller differences between the observed data and the fitted values.

R-squared is the percentage of the dependent variable variation that a linear model explains.

$R2 Variance explained by the model Total variance$

R-squared is always between 0 and 100%:

0% represents a model that does not explain any of the variation in the response variable around its mean. The mean of the dependent variable predicts the dependent variable as well as the regression model.
100% represents a model that explains all of the variation in the response variable around its mean.

Usually, the larger the R2, the better the regression model fits your observations. However, this guideline has important caveats that I’ll discuss in both this post and the next post.

Visual Representation of R-squared

To visually demonstrate how R-squared values represent the scatter around the regression line, you can plot the fitted values by observed values.

Graph that illustrates a regression model with a low R-squared.

Graph that illustrates a model with a high R-squared.

The R-squared for the regression model on the left is 15%, and for the model on the right it is 85%. When a regression model accounts for more of the variance, the data points are closer to the regression line. In practice, you’ll never see a regression model with an R2 of 100%. In that case, the fitted values equal the data values and, consequently, all of the observations fall exactly on the regression line.

R-squared has Limitations

You cannot use R-squared to determine whether the coefficient estimates and predictions are biased, which is why you must assess the residual plots.

R-squared does not indicate if a regression model provides an adequate fit to your data. A good model can have a low R2 value. On the other hand, a biased model can have a high R2 value!

Are Low R-squared Values Always a Problem?

No! Regression models with low R-squared values can be perfectly good models for several reasons.

Some fields of study have an inherently greater amount of unexplainable variation. In these areas, your R2 values are bound to be lower. For example, studies that try to explain human behavior generally have R2 values less than 50%. People are just harder to predict than things like physical processes.

Fortunately, if you have a low R-squared value but the independent variables are statistically significant, you can still draw important conclusions about the relationships between the variables. Statistically significant coefficients continue to represent the mean change in the dependent variable given a one-unit shift in the independent variable. Clearly, being able to draw conclusions like this is vital.

There is a scenario where small R-squared values can cause problems. If you need to generate predictions that are relatively precise (narrow prediction intervals), a low R2 can be a show stopper.

How high does R-squared need to be for the model produce useful predictions? That depends on the precision that you require and the amount of variation present in your data. A high R2 is necessary for precise predictions, but it is not sufficient by itself, as we’ll uncover in the next section.

Add a comment

Answer 2

Answer #2

Add a comment

Answer 3

Would the traditional R squared be a good measure of fit when the dependent variable is...

Homework Answers

Add Answer to:
Would the traditional R squared be a good measure of fit when the dependent variable is...

Post as a guest

Earn Coins

What information does R (that is, r-squared) provide in general about the fit of a regression...

Which statement is not correct? Multiple Choice R-squared is a measure of the degree of variability...

When do we look at Adjusted R-squared rather than regular R-squared? What would you do if...

1. Define the coefficient of determination (r squared). a. measure of the proportion of variation in...

When evaluating a multiple regression model, for example when we regress dependent variable Y on two...

Find the appropriate measure of center. Discuss why the chosen measure is most appropriate. Why did...

Tell us why you would be a good fit to work with older adults with memory...

We have explored the use of logistic regression for when the dependent variable has two classes,...

Please think about a real-life manufacturing company (big or small) that would be a good fit...

Suppose we have the following values for a dependent variable, Y, and three independent variables, X1,...

Would the traditional R squared be a good measure of fit when the dependent variable is...

Homework Answers

Add Answer to: Would the traditional R squared be a good measure of fit when the dependent variable is...

Post as a guest

Earn Coins

Add Answer to:
Would the traditional R squared be a good measure of fit when the dependent variable is...