2. The data set prostate in the faraway package is from a study on 97 men...

Question

Question

math Statistics-And-Probability

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

R code with explanations (all statements starting with # are comments)

a) R code

#install the faraway package if it is not already installed
install.packages('faraway')

library(faraway)
names(prostate)
#a) Draw a scatter plot
plot(prostate$lcavol,prostate$lpsa,xlab="lcavol",ylab="lpsa",main="lpsa vs lcavol")

#get this plot

o O 0 2 4 Icavol

We can see that there is an overall positive linear relationship between lspa and lcavol. The log of prostate specific antigen (lspa) seems to increase with the increase in log cancer vol (lcavol).

A simple linear regression model seems reasonable.

b) The regression line that we want to fit is

$y=eta_0+eta_1x+epsilon$

where y = lspa

$eta_0$ is the intercept of the regression line

$61$ is the slope coefficient corresponding to x=lcavol

$epsilon stackrel{iid}sim mathcal{N}(0,sigma^2)$ is a random error

We calculate the following

$egin{align*} ar{x}&=rac{sum x}{n} ar{y}&=rac{sum y}{n} S_x&=sum(x_i-ar{x})^2 S_y&=sum(y_i-ar{y})^2 S_{xy}&=sum(x_i-ar{x})(y_i-ar{y}) end{align*}$

and the estimates of slope and intercept using

$egin{align*} hat{eta}_1&=rac{S_{xy}}{S_x} hat{eta}_0&=ar{y}-hat{eta}_1ar{x} end{align*}$

The fitted value of y is

$egin{align*} hat{y}=hat{eta}_0+hat{eta}_1x end{align*}$

The following R code does all these

#part b)
y<-prostate$lpsa
x<-prostate$lcavol
#sample means
xbar<-mean(x)
ybar<-mean(y)
#sum of sqaures
Sx<-sum((x-xbar)^2)
Sy<-sum((y-ybar)^2)
Sxy<-sum((x-xbar)*(y-ybar))
#estimate the value of slope
beta1hat<-Sxy/Sx
#Estimate the value of intercept
beta0hat<-ybar-beta1hat*xbar
sprintf('The estimated value of the intercept is %.4f',beta0hat)
sprintf('The estimated value of the slope is %.4f',beta1hat)
sprintf('The estimated regression line is %.4f+%.4fx',beta0hat,beta1hat)
#calculate the fitted values
yhat<-beta0hat+beta1hat*x
#Draw the fitted line on to the plot from part a)
lines(sort(x),yhat[order(x)],col="red")

# get these outputs

> sprintf( The estimated value of the intercept is %·4f,beta°hat) [1] The estimated value of the intercept is 1.5073 > 3printf(The estimated value of the slope is 4f,betalhat) [1] The estimated value of the slope is 0.7193 > sprintf( The estimated regression line is % .4f .4fx,beta0ha t , betalhat) [1] The estimated regression line is 1.5073+0.7193x

get this plot

lpsa vs Icavol 寸 O oo 0D D O 2 4 lcavol

c&d) An estimate of $egin{align*} sigma^2 end{align*}$ is

$MSESSE 2 _$

The standard errors of coefficients are

$S.e s.elo cov(Bo, Bi)$

R code

#part c)
#get the number of observations
n<-length(x)
# get the sum of square error
sse<-Sy-beta1hat*Sxy
#get mean square error, which is the estimate of sigma^2
mse<-sse/(n-2)
#estimates of stamdard errors
sb1<-sqrt(mse/Sx)
sb0<-sqrt(mse*sum(x^2)/(n*Sx))
sprintf('The estimated value of sigma^2 %.4f',mse)
sprintf('The standard error of beta1 %.4f',sb1)
sprintf('The standard error of beta0 %.4f',sb0)

#part d)
cov<--mse*xbar/Sx
sprintf('The estimated covariance between beta0&beta 1 %.4f',cov)

#get the following outputs

> sprint f(The estimated value of sigma ^2 %.4f,mse) > sprintf(The standard error of betal %.4f, sbl) [1] The standard error of betal 0.0682 > sprintf(The standard error of beta0 %.4f, sbO) [1] The standard error of beta0 0.1219 5 > #part d) > sprintf(The estimated covariance between beta0&beta 1 %.4f, cov) [1] The estimated covariance between beta0&beta 1 -0.0063

e) We want to test the following hypotheses for $egin{align*} eta_i=0 end{align*}$ where i=0,1

$Ho : β.. 0 null hypothesis Ha: B0alternative hypothesis 0.05level of significance to test the hypotheses$

The test statistics is

$egin{align*} t=rac{hat{eta}_i-eta_{iH_0}}{s.e(hat{eta}_i)}=rac{hat{eta}_i-0}{s.e(hat{eta}_i)}=rac{hat{eta}_i}{s.e(hat{eta}_i)} end{align*}$

this is a 2 tailed test (the alternative hypothesis has "not equal to")

The p-value is

$egin{align*} ext{p-value}=P(T>t)+P(T<-t) end{align*}$

the degrees of freedom for t statistics is n-2

Following is the R code

#part e)
#test statistics for beta 0
tb0<-beta0hat/sb0
#p-value of beta0 = P(T>tb0)+P(T<-tb0)
pb0<-pt(abs(tb0),df=n-2,lower.tail=FALSE)+ pt(-abs(tb0),df=n-2,lower.tail=TRUE)
sprintf('The test statistics to test beta0=0 is %.4f, the p-value is %.4f',tb0,pb0)

#test statistics for beta 1
tb1<-beta1hat/sb1
#p-value of beta1 = P(T>tb1)+P(T<-tb1)
pb1<-pt(abs(tb1),df=n-2,lower.tail=FALSE)+ pt(-abs(tb1),df=n-2,lower.tail=TRUE)
sprintf('The test statistics to test beta1=0 is %.4f, the p-value is %.4f',tb1,pb1)

# get these

We will reject the null hypothesis if the p-value is less than the significance level of alpha=0.05

Here for both $egin{align*} eta_0,eta_1 end{align*}$ the p-values are less than 0.05.

Hence we reject the null hypothesis.

We conclude that there is sufficient evidence to support the claim that the coefficients are significant.

f) Use lm()

R code

#part f) use lm()
m<-lm(lpsa~lcavol,data=prostate)
summary(m)

# get these

we can see that what we have calculated in part a to e), match with this output

Add a comment

Answer 2

2. The data set prostate in the faraway package is from a study on 97 men...

Homework Answers

Add Answer to:
2. The data set prostate in the faraway package is from a study on 97 men...

Post as a guest

Earn Coins

2. R programming 2·The data set prostate in the faraway package is froma study on 97...

Please use RStudio, thanks! 3. This problem uses the prostate data set in the faraway package....

1. The data set UN11 in the alr4 package contains several variables, including ppgdp, per capita...

R programming question. Please use #comments too ! 1. The data set UN11 in the alr4...

2. Suppose Y ~ Exp(a), which has pdf f(y)-1 exp(-y/a). (a) Use the following R code to generate data from the model Yi...

2. (Continmed from Onestion 2 in Homework 6) The data set cars gives the sneed (X) and stopping distance (Y) for n=...

Exercise 2. [Data analysis, requires R] For this questions use the bac data set from the...

1. Consider data from a study of the association between vapor pressure (in mm and temperature...

1. For each of the following regression models, write down the X matrix and 3 vector....

2. The data set prostate in the faraway package is from a study on 97 men...

Homework Answers

Add Answer to: 2. The data set prostate in the faraway package is from a study on 97 men...

Post as a guest

Earn Coins

Add Answer to:
2. The data set prostate in the faraway package is from a study on 97 men...