Question

Would the traditional R squared be a good measure of fit when the dependent variable is...

Would the traditional R squared be a good measure of fit when the dependent variable is dichotomous? Why? Provide an example please thank you!

1 0
Add a comment Improve this question Transcribed image text
Answer #1

R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. R-squared measures the strength of the relationship between your model and the dependent variable on a convenient 0 – 100% scale.

After fitting a linear regression model, you need to determine how well the model fits the data. Does it do a good job of explaining changes in the dependent variable? There are a several key goodness-of-fit statistics for regression analysis. In this post, we’ll examine R-squared (R2 ), highlight some of its limitations, and discover some surprises. For instance, small R-squared values are not always a problem, and high R-squared values are not necessarily good!

Assessing Goodness-of-Fit in a Regression Model

Residuals are the distance between the observed value and the fitted value.

Linear regression identifies the equation that produces the smallest difference between all of the observed values and their fitted values. To be precise, linear regression finds the smallest sum of squared residuals that is possible for the dataset.

Statisticians say that a regression model fits the data well if the differences between the observations and the predicted values are small and unbiased. Unbiased in this context means that the fitted values are not systematically too high or too low anywhere in the observation space.

However, before assessing numeric measures of goodness-of-fit, like R-squared, you should evaluate the residual plots. Residual plots can expose a biased model far more effectively than the numeric output by displaying problematic patterns in the residuals. If your model is biased, you cannot trust the results. If your residual plots look good, go ahead and assess your R-squared and other statistics.

Read my post about checking the residual plots.

R-squared and the Goodness-of-Fit

R-squared evaluates the scatter of the data points around the fitted regression line. It is also called the coefficient of determination, or the coefficient of multiple determination for multiple regression. For the same data set, higher R-squared values represent smaller differences between the observed data and the fitted values.

R-squared is the percentage of the dependent variable variation that a linear model explains.

R2 Variance explained by the model Total variance

R-squared is always between 0 and 100%:

  • 0% represents a model that does not explain any of the variation in the response variable around its mean. The mean of the dependent variable predicts the dependent variable as well as the regression model.
  • 100% represents a model that explains all of the variation in the response variable around its mean.

Usually, the larger the R2, the better the regression model fits your observations. However, this guideline has important caveats that I’ll discuss in both this post and the next post.

Visual Representation of R-squared

To visually demonstrate how R-squared values represent the scatter around the regression line, you can plot the fitted values by observed values.

Graph that illustrates a regression model with a low R-squared.

Graph that illustrates a model with a high R-squared.

The R-squared for the regression model on the left is 15%, and for the model on the right it is 85%. When a regression model accounts for more of the variance, the data points are closer to the regression line. In practice, you’ll never see a regression model with an R2 of 100%. In that case, the fitted values equal the data values and, consequently, all of the observations fall exactly on the regression line.

R-squared has Limitations

You cannot use R-squared to determine whether the coefficient estimates and predictions are biased, which is why you must assess the residual plots.

R-squared does not indicate if a regression model provides an adequate fit to your data. A good model can have a low R2 value. On the other hand, a biased model can have a high R2 value!

Are Low R-squared Values Always a Problem?

No! Regression models with low R-squared values can be perfectly good models for several reasons.

Some fields of study have an inherently greater amount of unexplainable variation. In these areas, your R2 values are bound to be lower. For example, studies that try to explain human behavior generally have R2 values less than 50%. People are just harder to predict than things like physical processes.

Fortunately, if you have a low R-squared value but the independent variables are statistically significant, you can still draw important conclusions about the relationships between the variables. Statistically significant coefficients continue to represent the mean change in the dependent variable given a one-unit shift in the independent variable. Clearly, being able to draw conclusions like this is vital.

There is a scenario where small R-squared values can cause problems. If you need to generate predictions that are relatively precise (narrow prediction intervals), a low R2 can be a show stopper.

How high does R-squared need to be for the model produce useful predictions? That depends on the precision that you require and the amount of variation present in your data. A high R2 is necessary for precise predictions, but it is not sufficient by itself, as we’ll uncover in the next section.

Add a comment
Know the answer?
Add Answer to:
Would the traditional R squared be a good measure of fit when the dependent variable is...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • What information does R (that is, r-squared) provide in general about the fit of a regression...

    What information does R (that is, r-squared) provide in general about the fit of a regression model? It is exactly equal to the correlation between X and Y. It tells us the proportion of variability in the dependent variable, Y, that is explained by the model. It tells us the proportion of variability in the independent variable, X, that is explained by the model We should keep choosing different independent variables until R equals 1 O The closer R is...

  • Which statement is not correct? Multiple Choice R-squared is a measure of the degree of variability...

    Which statement is not correct? Multiple Choice R-squared is a measure of the degree of variability in the dependent variable about its sample mean explained by the regression line. The adjusted R-squared measure should be used in the case of more than one independent variable. The null hypothesis that R2 = 0 can be tested using the F-statistic. Forecasters should always select independent variables on the basis of R2. All of the options are correct.

  • When do we look at Adjusted R-squared rather than regular R-squared? What would you do if...

    When do we look at Adjusted R-squared rather than regular R-squared? What would you do if adding an independent variable decreased the Adjusted R-squared value?

  • 1. Define the coefficient of determination (r squared). a. measure of the proportion of variation in...

    1. Define the coefficient of determination (r squared). a. measure of the proportion of variation in y using x as the explanatory variable b. measure of direction of association between dependent and independent variables c. a and b d. none of the above 1a.) Scatter plot can be sued to study a. direction of association between gender and income b. percentage variation of association between gender and income c. strength and direction of association between minutes doing exercise and body...

  • When evaluating a multiple regression model, for example when we regress dependent variable Y on two...

    When evaluating a multiple regression model, for example when we regress dependent variable Y on two independent variables X1 and X2, a commonly used goodness of fit measure is: A. Correlation between Y and X1 B. Correlation between Y and X2 C. Correlation between X1 and X2 D. Adjusted-R2 E. None of the above

  • Find the appropriate measure of center. Discuss why the chosen measure is most appropriate. Why did...

    Find the appropriate measure of center. Discuss why the chosen measure is most appropriate. Why did you decide against other possible measures of center? Find the appropriate measure of variation. The measure of variation chosen here should match the measure of center chosen in Part 1. Find the graph(s) needed to appropriately describe the data. These may be done by hand and inserted into the Word document. Define a random variable (X) so that your chosen data set represents values...

  • Tell us why you would be a good fit to work with older adults with memory...

    Tell us why you would be a good fit to work with older adults with memory loss or other chronic conditions. Continue

  • We have explored the use of logistic regression for when the dependent variable has two classes,...

    We have explored the use of logistic regression for when the dependent variable has two classes, Yes or No, Admit or Not, etc. Suppose that there are m (>2) classes. For example, Buy, Sell, or Hold. How will you model this using logistic regression? You don't have to solve it, but provide a strategy to model this.

  • Please think about a real-life manufacturing company (big or small) that would be a good fit...

    Please think about a real-life manufacturing company (big or small) that would be a good fit for the Process Costing method. Explain why you feel that your company would be a good candidate for Process Costing (hint: look at the criteria). Also, describe how you believe the units of product move through the processing department(s) and how to direct labor, direct material, and manufacturing overhead cost are added in each department (keep in mind that direct labor and material in...

  • Suppose we have the following values for a dependent variable, Y, and three independent variables, X1,...

    Suppose we have the following values for a dependent variable, Y, and three independent variables, X1, X2, and X3. The variable X3 is a dummy variable where 1 = male and 2 = female: X1       X2       X3       Y 0          40        1          30 0          50        0          10 2          20        0          40 2          50        1          50 4          90        0          60 4          60        0          70 4          70        1          80 4          40        1          90 6          40        0          70 6          50        1          90 8          80       ...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT