Simulations are a very powerful tool data scientists use to test and verify statistical behaviors. Let’s pretend we know that the true underlying population regression line is as follows (this is almost never the case in real life) : Yi = 2 + 3xi + i (i = 1, . . . , n), i ∼ N (0, 2 2 ). This tells us that the parameters β0 = 2, β1 = 3, and σ = 2. a. Generate 100 observations Yi under this normal error model for the following X values: X = seq(0,10,length.out =100). b. Draw a scatterplot of X and Y. c. Design a simple simulation to show that βˆ 1 is an unbiased estimator of β1. Attach your code in the Appendix. d. Plot a histogram of the sampling distribution of the βˆ 1’s you generated. Add a vertical line to the plot showing β1 = 3.
b.
c.We use monte-carlo estimation technique to estimate using simulation.The estimate for
d.
The vertical line is shown in red for
Appendix:
i=rnorm(100,mean=0,sd=2)
x=seq(from=0,to=10,length=100)
y=2+3*x+i
plot(x,y,main="scatterplot")
beta=NULL
for(i in 1:100)
{
y=NULL
for(j in 1:100)
{
y=c(y,rnorm(1,mean=2+3*x[j],sd=2))
}
c=99*cov(x,y)/100
v=99*var(x)/100
beta=c(beta,c/v)
}
beta_hat=mean(beta)
beta_hat
hist(beta,freq=FALSE,main="Histogram for the sampling
distribution")
abline(v=3,col=2)
Simulations are a very powerful tool data scientists use to test and verify statistical behaviors. Let’s...