Solved by using R-software
a)
Analysis for the explanatory variable x= 'Budget' and response variable y= 'USRevenue'
> x=scan("clipboard")
Read 35 items
> y=scan("clipboard")
Read 35 items
i)
> plot(x,y)
ii)
> cor(x,y)
[1] 0.791832
iii)
Correlation coefficient indicates that Budget and USRevenue are 79% correlated with each other, and from scatter plot we see there is no linear trend.
b)
Analysis for the explanatory variable x1= 'Opening' and response variable y= 'USRevenue'
> y=scan("clipboard")
Read 35 items
> x1=scan("clipboard")
Read 35 items
> plot(x1,y)
> cor(x1,y)
[1] 0.9840788
From the scatter plot we see that the Opening and USRevenue are linearly related with each other. Correlation coefficient shows that 98% correlation is there.
c)
Analysis for the explanatory variable x2= 'Theaters' and response variable y= 'USRevenue'
> x2=scan("clipboard")
Read 35 items
> y=scan("clipboard")
Read 35 items
> plot(x2,y)
> cor(x2,y)
[1] 0.7152985
From scatter plot we see that there is no linear relationship between Theaters and USRevenue. Correlation coefficient shows that 71% correlation is there.
d)
From the findings of a,b,c the explanatory variable 'Opening' is most appropriate for predicting response variable 'USRevenue'. The scatter plot of Opening and USRevenue shows linear trend.
e)
> m=lm(y~x1)
> summary(m)
Call:
lm(formula = y ~ x1)
Residuals:
Min 1Q Median 3Q Max
-35.038 -11.898 1.720 7.827 46.250
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.918 4.378 -1.58 0.124
x1 3.278 0.103 31.81 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 19.17 on 33 degrees of freedom
Multiple R-squared: 0.9684, Adjusted R-squared: 0.9675
F-statistic: 1012 on 1 and 33 DF, p-value: < 2.2e-16
i)
Regression equation is
USRevenue = -6.918 + 3.728 Opening
ii)
As the Hypothesis of slope is rejected, the slope of the line is not zero. i.e. the variable Opening is may not be zero in the regression model.
iii)
As the Hypothesis of slope is accepted, the y-intercept of the regression line may be zero.
iv)
Multiple R-squared = 0.9684
It shows that 96% variability in the USRevenue was explained by the variable Opening.
2.) The data set named "HW 6.2" contains a random sample of 35 movies released in 2008 collected from the Internet Movie Database (IMDb). The goal of this problem is to explore if the informa...
Problem 4: Variables that may affect Grades The data set contains a random sample of STAT 250 Final Exam Scores out of 80 points. For each individual sampled, the time (in hours per week) that the student spent participating in a GMU club or sport and working for pay outside of GMU was recorded. Values of 0 indicate the students either does not participate in a club or sport or does not work a job for pay. The goal of...