Solution:
R Programming:
a) R code
#install the faraway package if it is not already
installed
install.packages('faraway')
library(faraway)
names(prostate)
#a) Draw a scatter plot
plot(prostate$lcavol,prostate$lpsa,xlab="lcavol",ylab="lpsa",main="lpsa
vs lcavol")
#get this plot
We can see that there is an overall positive linear relationship between lspa and lcavol. The log of prostate specific antigen (lspa) seems to increase with the increase in log cancer vol (lcavol).
A simple linear regression model seems reasonable.
b) The regression line that we want to fit is
where y = lspa
is the intercept of the regression line
is the slope coefficient corresponding to x=lcavol
is a random error
We calculate the following
and the estimates of slope and intercept using
The fitted value of y is
The following R code does all these
#part b)
y<-prostate$lpsa
x<-prostate$lcavol
#sample means
xbar<-mean(x)
ybar<-mean(y)
#sum of sqaures
Sx<-sum((x-xbar)^2)
Sy<-sum((y-ybar)^2)
Sxy<-sum((x-xbar)*(y-ybar))
#estimate the value of slope
beta1hat<-Sxy/Sx
#Estimate the value of intercept
beta0hat<-ybar-beta1hat*xbar
sprintf('The estimated value of the intercept is
%.4f',beta0hat)
sprintf('The estimated value of the slope is %.4f',beta1hat)
sprintf('The estimated regression line is
%.4f+%.4fx',beta0hat,beta1hat)
#calculate the fitted values
yhat<-beta0hat+beta1hat*x
#Draw the fitted line on to the plot from part a)
lines(sort(x),yhat[order(x)],col="red")
# get these outputs
get this plot
c&d) An estimate of is
The standard errors of coefficients are
R code
#part c)
#get the number of observations
n<-length(x)
# get the sum of square error
sse<-Sy-beta1hat*Sxy
#get mean square error, which is the estimate of sigma^2
mse<-sse/(n-2)
#estimates of stamdard errors
sb1<-sqrt(mse/Sx)
sb0<-sqrt(mse*sum(x^2)/(n*Sx))
sprintf('The estimated value of sigma^2 %.4f',mse)
sprintf('The standard error of beta1 %.4f',sb1)
sprintf('The standard error of beta0 %.4f',sb0)
#part d)
cov<--mse*xbar/Sx
sprintf('The estimated covariance between beta0&beta 1
%.4f',cov)
#get the following outputs
e) We want to test the following hypotheses for where i=0,1
The test statistics is
this is a 2 tailed test (the alternative hypothesis has "not equal to")
The p-value is
the degrees of freedom for t statistics is n-2
Following is the R code
#part e)
#test statistics for beta 0
tb0<-beta0hat/sb0
#p-value of beta0 = P(T>tb0)+P(T<-tb0)
pb0<-pt(abs(tb0),df=n-2,lower.tail=FALSE)+
pt(-abs(tb0),df=n-2,lower.tail=TRUE)
sprintf('The test statistics to test beta0=0 is %.4f, the p-value
is %.4f',tb0,pb0)
#test statistics for beta 1
tb1<-beta1hat/sb1
#p-value of beta1 = P(T>tb1)+P(T<-tb1)
pb1<-pt(abs(tb1),df=n-2,lower.tail=FALSE)+
pt(-abs(tb1),df=n-2,lower.tail=TRUE)
sprintf('The test statistics to test beta1=0 is %.4f, the p-value
is %.4f',tb1,pb1)
# get these
We will reject the null hypothesis if the p-value is less than the significance level of alpha=0.05
Here for both the p-values are less than 0.05.
Hence we reject the null hypothesis.
We conclude that there is sufficient evidence to support the claim that the coefficients are significant.
f) Use lm()
R code
#part f) use lm()
m<-lm(lpsa~lcavol,data=prostate)
summary(m)
# get these
we can see that what we have calculated in part a to e), match with this output
2. R programming 2·The data set prostate in the faraway package is froma study on 97...
2. The data set prostate in the faraway package is from a study on 97 men with prostate cancer who were due to receive a radical prostatectomy. We are interest is in predicting lpsa (log prostate specific antigen) with lcavol (log cancer volume). (a) Draw a scatterplot - does a simple linear regression model seem reasonable? (b) Without using the R function Im(0, compute the values , Y,Sxx, Syy and Sxy. Com pute the ordinary least squares estimates of the...
Please use RStudio, thanks! 3. This problem uses the prostate data set in the faraway package. (a) Plot lpsa against lcavol. Use the R function lm() to fit the regressions of lpsa on lcavol and lcavol on lpsa. (b) Display both regression lines on the plot. At what point do the two lines intersetct? Give a brief explanation.
1. The data set UN11 in the alr4 package contains several variables, including ppgdp, per capita gross domestic product in US dollars, and fertility, number of children per woman, from the year 2009-2011. The data are for 199 localities, and we will study the regression of ppgdp on fertility. (a) Draw the scatterplot of ppgdp against fertility and describe the relationship between these two variables. Is the trend linear? (b) Replace both variables by their natural logarithms and draw another...
R programming question. Please use #comments too ! 1. The data set UN11 in the alr4 package contains several variables, including ppgdp, per capita gross domestic product in US dollars, and fertility, number of children per woman, from the year 2009-2011. The data are for 199 localities, and we will study the regression of ppgdp on fertility (a) Draw the scatterplot of ppgdp against fertility and describe the relationship between these two variables. Is the trend linear? nD the simple...
R is a little difficult for me, please answer if you can interpret the R code, I want to learn better how to interpret the R code 4. each 2 pts] Below is the R output for a simple linear regression model Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 77.863 4.199 18.544 3.54e-13 3.485 3.386 0.00329* 11.801 Signif. codes: 0 0.0010.010.05 0.11 Residual standard error: 3.597 on 18 degrees of freedom Multiple R-squared: 0.3891, Adjusted R-squared: 0.3552 F-statistic: 11.47...
The Book of R (Question 20.2) Please answer using R code. Continue using the survey data frame from the package MASS for the next few exercises. The survey data set has a variable named Exer , a factor with k = 3 levels describing the amount of physical exercise time each student gets: none, some, or frequent. Obtain a count of the number of students in each category and produce side-by-side boxplots of student height split by exercise. Assuming independence...