Question

2. The data set prostate in the faraway package is from a study on 97 men with prostate cancer who were due to receive a radical prostatectomy. We are interest is in predicting lpsa (log prostate specific antigen) with lcavol (log cancer volume). (a) Draw a scatterplot - does a simple linear regression model seem reasonable? (b) Without using the R function Im(0, compute the values , Y,Sxx, Syy and Sxy. Com pute the ordinary least squares estimates of the intercept and slope for the simple linear regression model, and draw the fitted line on your plot from part (a). (c) Obtain the estimate of σ2 and find the estimated standard errors of AO and A (d) Find the estimated covariance between Bo and B (e) Carry out t-tests for the two null hypotheses Bo 0 and B0, reporting the value of the test statistic and a p-value in each case. (f) Use the R function lm() to fit the regressions of Ipsa on lcavol.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

R code with explanations (all statements starting with # are comments)

a) R code

#install the faraway package if it is not already installed
install.packages('faraway')

library(faraway)
names(prostate)
#a) Draw a scatter plot
plot(prostate$lcavol,prostate$lpsa,xlab="lcavol",ylab="lpsa",main="lpsa vs lcavol")

#get this plot

o O 0 2 4 Icavol

We can see that there is an overall positive linear relationship between lspa and lcavol. The log of prostate specific antigen (lspa) seems to increase with the increase in log cancer vol (lcavol).

A simple linear regression model seems reasonable.

b) The regression line that we want to fit is

y=eta_0+eta_1x+epsilon

where y = lspa

eta_0 is the intercept of the regression line

61 is the slope coefficient corresponding to x=lcavol

epsilon stackrel{iid}sim mathcal{N}(0,sigma^2) is a random error

We calculate the following

egin{align*} ar{x}&=rac{sum x}{n} ar{y}&=rac{sum y}{n} S_x&=sum(x_i-ar{x})^2 S_y&=sum(y_i-ar{y})^2 S_{xy}&=sum(x_i-ar{x})(y_i-ar{y}) end{align*}

and the estimates of slope and intercept using

egin{align*} hat{eta}_1&=rac{S_{xy}}{S_x} hat{eta}_0&=ar{y}-hat{eta}_1ar{x} end{align*}

The fitted value of y is

egin{align*} hat{y}=hat{eta}_0+hat{eta}_1x end{align*}

The following R code does all these

#part b)
y<-prostate$lpsa
x<-prostate$lcavol
#sample means
xbar<-mean(x)
ybar<-mean(y)
#sum of sqaures
Sx<-sum((x-xbar)^2)
Sy<-sum((y-ybar)^2)
Sxy<-sum((x-xbar)*(y-ybar))
#estimate the value of slope
beta1hat<-Sxy/Sx
#Estimate the value of intercept
beta0hat<-ybar-beta1hat*xbar
sprintf('The estimated value of the intercept is %.4f',beta0hat)
sprintf('The estimated value of the slope is %.4f',beta1hat)
sprintf('The estimated regression line is %.4f+%.4fx',beta0hat,beta1hat)
#calculate the fitted values
yhat<-beta0hat+beta1hat*x
#Draw the fitted line on to the plot from part a)
lines(sort(x),yhat[order(x)],col="red")

# get these outputs

> sprintf( The estimated value of the intercept is %·4f,beta°hat) [1] The estimated value of the intercept is 1.5073 > 3printf(The estimated value of the slope is 4f,betalhat) [1] The estimated value of the slope is 0.7193 > sprintf( The estimated regression line is % .4f .4fx,beta0ha t , betalhat) [1] The estimated regression line is 1.5073+0.7193x

get this plot

lpsa vs Icavol 寸 O oo 0D D O 2 4 lcavol

c&d) An estimate of egin{align*} sigma^2 end{align*} is

MSESSE 2 _

The standard errors of coefficients are

S.e s.elo cov(Bo, Bi)

R code

#part c)
#get the number of observations
n<-length(x)
# get the sum of square error
sse<-Sy-beta1hat*Sxy
#get mean square error, which is the estimate of sigma^2
mse<-sse/(n-2)
#estimates of stamdard errors
sb1<-sqrt(mse/Sx)
sb0<-sqrt(mse*sum(x^2)/(n*Sx))
sprintf('The estimated value of sigma^2 %.4f',mse)
sprintf('The standard error of beta1 %.4f',sb1)
sprintf('The standard error of beta0 %.4f',sb0)

#part d)
cov<--mse*xbar/Sx
sprintf('The estimated covariance between beta0&beta 1 %.4f',cov)

#get the following outputs

> sprint f(The estimated value of sigma ^2 %.4f,mse) > sprintf(The standard error of betal %.4f, sbl) [1] The standard error of betal 0.0682 > sprintf(The standard error of beta0 %.4f, sbO) [1] The standard error of beta0 0.1219 5 > #part d) > sprintf(The estimated covariance between beta0&beta 1 %.4f, cov) [1] The estimated covariance between beta0&beta 1 -0.0063

e) We want to test the following hypotheses for egin{align*} eta_i=0 end{align*} where i=0,1

Ho : β.. 0 null hypothesis Ha: B0alternative hypothesis 0.05level of significance to test the hypotheses

The test statistics is

egin{align*} t=rac{hat{eta}_i-eta_{iH_0}}{s.e(hat{eta}_i)}=rac{hat{eta}_i-0}{s.e(hat{eta}_i)}=rac{hat{eta}_i}{s.e(hat{eta}_i)} end{align*}

this is a 2 tailed test (the alternative hypothesis has "not equal to")

The p-value is

egin{align*} ext{p-value}=P(T>t)+P(T<-t) end{align*}

the degrees of freedom for t statistics is n-2

Following is the R code

#part e)
#test statistics for beta 0
tb0<-beta0hat/sb0
#p-value of beta0 = P(T>tb0)+P(T<-tb0)
pb0<-pt(abs(tb0),df=n-2,lower.tail=FALSE)+ pt(-abs(tb0),df=n-2,lower.tail=TRUE)
sprintf('The test statistics to test beta0=0 is %.4f, the p-value is %.4f',tb0,pb0)

#test statistics for beta 1
tb1<-beta1hat/sb1
#p-value of beta1 = P(T>tb1)+P(T<-tb1)
pb1<-pt(abs(tb1),df=n-2,lower.tail=FALSE)+ pt(-abs(tb1),df=n-2,lower.tail=TRUE)
sprintf('The test statistics to test beta1=0 is %.4f, the p-value is %.4f',tb1,pb1)

# get these

We will reject the null hypothesis if the p-value is less than the significance level of alpha=0.05

Here for both egin{align*} eta_0,eta_1 end{align*} the p-values are less than 0.05.

Hence we reject the null hypothesis.

We conclude that there is sufficient evidence to support the claim that the coefficients are significant.

f) Use lm()

R code

#part f) use lm()
m<-lm(lpsa~lcavol,data=prostate)
summary(m)

# get these

Call: 1m (formulalpsalcavol, dataprostate) Residuals: -1.67625 -0.41648 0.09859 0.50709 1. 89673 Min 1 Median 3Q Max Coefficients Estimate Std. Error t value Pr(>l ) (Intercept) 1.50730 0.12194 12.36 <2e-16 lcavol 0.71932 0.0681910.552e-16 Signif. codes: 0 0.0010.01 0.05 0.11 Residual standard error 0.7875 on 95 degrees of freedonm Multiple R-squared 0.5394, Adjusted R-squared: 0.5346 F-statistic: 111.3 on 1 and 95 DF, p-value: < 2.2e-16

we can see that what we have calculated in part a to e), match with this output

Add a comment
Know the answer?
Add Answer to:
2. The data set prostate in the faraway package is from a study on 97 men...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • 2. R programming 2·The data set prostate in the faraway package is froma study on 97...

    2. R programming 2·The data set prostate in the faraway package is froma study on 97 men with prostate cancer who were due to receive a radical prostatectomy We are interest is in predicting lpsa (log prostate specific antigen) with Icavol (log cancer volume). (a) Draw a scatterplot -does a simple linear regression model seem reasonable? (b) Without using the R function Im), compute the values T,Y, Sxx, Syy and Sxy. Com- pute the ordinary least squares estimates of the...

  • Please use RStudio, thanks! 3. This problem uses the prostate data set in the faraway package....

    Please use RStudio, thanks! 3. This problem uses the prostate data set in the faraway package. (a) Plot lpsa against lcavol. Use the R function lm() to fit the regressions of lpsa on lcavol and lcavol on lpsa. (b) Display both regression lines on the plot. At what point do the two lines intersetct? Give a brief explanation.

  • 1. The data set UN11 in the alr4 package contains several variables, including ppgdp, per capita...

    1. The data set UN11 in the alr4 package contains several variables, including ppgdp, per capita gross domestic product in US dollars, and fertility, number of children per woman, from the year 2009-2011. The data are for 199 localities, and we will study the regression of ppgdp on fertility. (a) Draw the scatterplot of ppgdp against fertility and describe the relationship between these two variables. Is the trend linear? (b) Replace both variables by their natural logarithms and draw another...

  • R programming question. Please use #comments too ! 1. The data set UN11 in the alr4...

    R programming question. Please use #comments too ! 1. The data set UN11 in the alr4 package contains several variables, including ppgdp, per capita gross domestic product in US dollars, and fertility, number of children per woman, from the year 2009-2011. The data are for 199 localities, and we will study the regression of ppgdp on fertility (a) Draw the scatterplot of ppgdp against fertility and describe the relationship between these two variables. Is the trend linear? nD the simple...

  • 2. Suppose Y ~ Exp(a), which has pdf f(y)-1 exp(-y/a). (a) Use the following R code to generate data from the model Yi...

    2. Suppose Y ~ Exp(a), which has pdf f(y)-1 exp(-y/a). (a) Use the following R code to generate data from the model Yi ~ Exp(0.05/Xi), and provide the scatterplot of Y against X set.seed(123) n <- 500 <-rnorm (n, x 3, 1) Y <- rexp(n, X) (b) Fit the model Yi-Ao + Ax, + ε¡ using the lm function in R and provide a plot of the best fit line on the scatterplot of Y vs X, and the residual...

  • 2. (Continmed from Onestion 2 in Homework 6) The data set cars gives the sneed (X) and stopping distance (Y) for n=...

    2. (Continmed from Onestion 2 in Homework 6) The data set cars gives the sneed (X) and stopping distance (Y) for n=50 (very old) statistics cars. Suppose you are given the following summary n ΣΧ- - 770; (Xi-X) 1370; Yi2149; XY 38482 i-1 i-1 i1 i1 In addition, suppose we fit the simple linear regression model Y= iid N(0, a2), and obtain RSS = (Y-Y) OLS for the ith observation. +B1Xi+ 478.02 where Y is the fitted value given by...

  • Exercise 2. [Data analysis, requires R] For this questions use the bac data set from the...

    Exercise 2. [Data analysis, requires R] For this questions use the bac data set from the openintro library. To access this data set first install the package using install.packages ("openintro") (this only needs to be done once). Then load the pack- age into R with the command library(openintro). You can read about this data set in the help menu by entering the command ?openintro or help(openintro). Many people believe that gender, weight, drinking habits, and many other factors are much...

  • 1. Consider data from a study of the association between vapor pressure (in mm and temperature...

    1. Consider data from a study of the association between vapor pressure (in mm and temperature (in degrees K). The vapor pressure y is the response and the temperature x is the predictor. We import the data with R and display a few rows. Hg) of water > vapor<-read.csv("VaporPressure.csv") > head(vapor) Temp.. in.K. Vapor.Pressure 4.6 1 273 283 9.2 2 3 293 17.5 4 303 31.8 313 55.3 323 92.5 (a) Here is a scatter plot of vapor pressure against...

  • 1. For each of the following regression models, write down the X matrix and 3 vector....

    1. For each of the following regression models, write down the X matrix and 3 vector. Assume in both cases that there are four observations (a) Y BoB1X1 + B2X1X2 (b) log Y Bo B1XiB2X2+ 2. For each of the following regression models, write down the X matrix and vector. Assume in both cases that there are five observations. (a) YB1XB2X2+BXE (b) VYBoB, X,a +2 log10 X2+E regression model never reduces R2, why 3. If adding predictor variables to a...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT