Suppose we are interested in the effect of drinking on student achievement. We survey students at the University of Rhode Island about their drinking habits - the number of times they “binge” drank (had 5+ drinks in one sitting) in the previous semester - and their previous semester’s GPA. We also have data on their gender, race, parent’s education, and family income. If we estimate the following regression:
GP Ai = β0 + β1BingeEventsi + γXi + i
Where GP Ai is student i’s GPA last semester, BingeEventsi is the number of times they reported binge drinking last semester, Xi is a set of controls for gender, race, and family variables, and i is the error term. Suppose we find a statistically significant negative correlation between binge drinking and GPA - increased binge drinking is associated with decreases in GPA (β1 < 0). Why does this finding not imply a causal effect of binge drinking on GPA?
In the question, it's written that Binge drinking is negatively correlated with GPA and it is statistically significant. Though, a big thing to note here is A correlation between two variables does not imply causation. This can be understood in a very simple way using this example:
There have been findings that there exists a negative correlation between a student's anxiety before an exam and the student's score on the test. But we cannot say that the anxiety causes a lower score on the test; there could be other reasons—the student may not have studied well, for example. So the correlation here does not imply causation.
Also, there exists a positive correlation between the number of hours one spends studying for a test and the grade he/she gets on the test. Here, there is causation as well; if you spend more time studying, it results in a higher grade.
So, this is why we can't infer there exists a causal effect of binge drinking on GPA in spite of being the fact that these two are negatively correlated.
Suppose we are interested in the effect of drinking on student achievement. We survey students at...