2.2 LEAST-SQUARES ESTIMATION OF THE PARAMETERS The parameters B and B, are unknown and must be estimated using sample data. Suppose that we have n pairs of data, say (71, X1), (2, X2),..., (yn, X.,). As noted in Chapter 1, these data may result either from a controlled experiment designed specifically to collect the data, from an observational study, or from existing histori- cal records (a retrospective study). 2.2.1 Estimation of B, and B. The method of least squares is used to estimate B, and B. That is, we estimate B and B. so that the sum of the squares of the differences between the observations y; and the straight line is a minimum. From Eq. (2.1) we may write y; = B + Bıx; + Ej, i = 1, 2,...,n (2.3) Equation (2.1) maybe viewed as a population regression model while Eq. (2.3) is a sample regression model, written in terms of the n pairs of data (yi, x;) (i = 1, 2, ...,n). Thus, the least-squares criterion is s(Bw, B.) = § (»: – Bo – Bıx;) (2.4) The least-squares estimators of ßo and Bı, say B, and ßı, must satisfy 2011--2Ś(>= =Box) = 0 and OS=-2Ś(– – Box.)xx = 0 OB.IBBL -1 Simplifying these two equations yields
nßo + B.=3 Bo * +B. Ex = 3 * (2.5) Equations (2.5) are called the least-squares normal equations. The solution to the normal equations is Bo = 9 – Bit (2.6) and (2.7) where 1-Zx and 1=- are the averages of y; and xi, respectively. Therefore, B, and B. in Eqs. (2.6) and (2.7) are the least-squares estimators of the intercept and slope, respectively. The fitted simple linear regression model is then ŷ = Ro + ßix (2.8) Equation (2.8) gives a point estimate of the mean of y for a particular x. Since the denominator of Eq. (2.7) is the corrected sum of squares of the x; and the numerator is the corrected sum of cross products of x; and y, we may write these quantities in a more compact notation as --- * E-54-33 (2.9) and s, -Em 2 L). 219
Thus, a convenient way to write Eq. (2.7) is B = PASSE (2.11) The difference between the observed value y; and the corresponding fitted value ỹ, is a residual. Mathematically the ith residual is e = y = y = y -(86 + ), i=1, 2,..., n (2.12) Residuals play an important role in investigating model adequacy and in detecting departures from the underlying assumptions. This topic is discussed in subsequent chapters.
We know that the formula for correlation coefficient = 5(XIX) I Cor CX,Y) JV1x). Vey) Where, cov (x142 = $xiyi XI. X= { xi, n= No of obs?inx WALA in= .. v«= (xi-je tv9=E(92-9) Corciy) = 193:6343 – 15.6571X11.4857 Co exiy) = -2.6820 to Corrixiy=f2 -2.6820 12.1167 x 54469 r=s= -0.4689 r ? 0:5913
. We know that. Y = Bot pa o = 5 - BX I sxy=syi(xi) 5xx= (xi-x)? 344 = 107.6039 Sxx = 14.81744 e = SX4 R lot. 6029 14.81714 ULES-1-1.2 677x C 16.6671) - 3-1-26702 = 32.2914 canned with Qamarer32.8914 -1.2670
X = 10,5 - 32.8914–1.267 X16.5 - 1 8g. My © = 32.8914 -1.267X = | 0 2 . 10 - 32-892 _ -1.267 CamScanner X= 18.0624 - = a Scanned withi
© r = -0.460g that mean X and Y are negatively correlated to each other on that the value of Bike speed is increases then the value of Run pace is decreases.
3:55 5 H+ 420 37% keshav - Saved 14.8 17.3 17.1 15 18.8 18.1 15.5 xbar= 16.65714 Xbar)^2 14.4 213.12 3.44898 11.1 192.03 0.413265 12.6 215.46 0.196122 11.1 166.5 2.746122 8.8 165.44 4.591837 8.9 161.09 2.081837 15.6 241.8 1.33898 ybar= xybar= varX= 11.78571 193.6343 2.116735 covar.= -1.26707 crrl.=r 32.8914 r^2= SXX= SXY= (Y- y*(x- Ybar)^2 xbar) 6.83449 -26.7429 0.470204 | 7.135714 0.663061 5.58 0.470204 -18.3943 8.91449 18.85714 8.327347 12.84143 14.54878 -18.0514 varY= sum= 5.746939-18.7743 -2.68204 -0.76898 0.591327 14.81714 -18.7743 b1= b0= |