SOLUTION:
(a) Let us consider the sample correlation coefficients between each pair of variables.
The following figure is the correlation matrix obtained using Minitab.
Correlations: Weeks. Age, Educ, Married. Head. Tenure. Manager. Sales
Weeks Age Educ Married Head Tenure Manager
Age 0.577
Educ 0.007 0.100
Married -0.130 -0.209 -0.151
Head -0.205 0.027 -0.158 -0.449
Tenure 0.398 0.459 0.174 -0.057 -0.046
Manager -0.198 0.097 0.160 0.073 -0.200 -0.113
Sales -0.134 0.137 0.124 -0.148 -0.013 0.097 -0.156
Looking at the sample correlation coefficients between Weeks and
each of the independent variables can give us a quick indication of
which independent variables are. by themselves good predictors. We
see that the single best predictor of Weeks is Age. because it has
the highest sample correlation coefficient.
Age can explain (0.577)2 (000)= 33.29 %. of the
variability in weeks.
So we construct an estimated regression equation using the variable Age:
Regression Analysis: Weeks versus Age
The regression equation is
Weeks = - 8.0 + 1.51 Age
Predictor Coef SE Coef T P
Constant -8.8611.01 -0.80 0.425
Age 1.5092 0.3080 4.90 0.000
S = 19.5342 R-Sq = 33.3% R-Sq (adj) = 32.0%
Analysis of Variance
Source DF SS MS F P
Regression 1 9161.4 9161.4 24.01 0.000
Residual Error 48 18316.1 381.6
Total 49 27477.5
The estimated regression equation used to predict the number of weeks a worker has been jobless due to a layoff given the age of the worker is:
(b)
The following figure shows the results obtained by using the minitab stepwise regression procedure for the given data using values of 0.05 for Alpha to remove and 0.05 for Alpha to enter.
Step wise Regression : Weeks versus Age , Educ, Married, Head , Tenure, Manager , Sales
Alpha - to - Enter : 0.05 Alpha - to - Removes : 0.05
Response is Weeks on 7 predictors. with N = 50
Step 1 2 3 4
Constant -8.86002 -9.09741 -0.10922 -0.06890
Age 1.51 1.57 1.61 1.73
T-Value 4.90 5.30 5.74 6.51
P-Value 0.000 0.000 0.000 0.000
Manager -20.1 -24.6 -28.7
T-Value -2.26 -2.88 -3.53
P-Value 0.029 0.006 0.001
Head -14.3 -15.1
T-Value -2.61 -2.95
P-Value 0.012 0.005
Sales -17.4
T-Value -2.79
P-Value 0.008
S 19.5 18.7 17.7 16.5
R-Sq 33.34 39.87 47.64 55.38
R-Sq(ad) 31.95 37.31 44.22 51.41
Mallows C-p 22.5 17.8 11.8 5.9
The stepwise procedure terminated after four steps. The estimated regression equation identified by the Minitab stepwise regression procedure is:
Weeks = -0.06890+1.73Age-28.7Manger-15.1Head-17.4Sales
The value of R-sq has been increased from 33.34% to 55.38% and the recommended estimated regression equation has an R-Sq (adj) value of 51.41%
(c)
The following figure shows the results obtained by using the Minitab stepwise regression procedure for the given data using values of am for Alpha to remove and 0.05 for Alpha to enter.
Stepwise Regression: Weeks versus Age, Educ, Married, Head, Tenure, Manager, Sales Forward selection: Alpha-to-Enter. 0.5
Response is Weeks on 7 predictors. with N = 50
Step 1 2 3 4
Constant -8.86002 -9.09741 -0.10922 -0.06890
Age 1.51 1.57 1.61 1.73
T-Value 4.90 5.30 5.74 6.51
P-Value 0.000 0.000 0.000 0.000
Manager -20.1 -24.6 -28.7
T-Value -2.26 -2.88 -3.53
P-Value 0 029 0.006 0.001
Head -14.3 -15.1
T-Value -2.61 -2.95
P-Value 0.012 0.005
Sales -17.4
T-Value -2.79
P-Value 0.008
S 19.5 18.7 17.7 16.5
R-Sq 33.34 39.87 47.64 55.38
R-Sq(adj) 31.95 37.31 44.22 51.41
Mallows C-p 22.5 17.8 11.8 5.9
The forward selection procedure terminated after four steps. The estimated regression equation identified by the Minitab forward selection procedure. similar to stepwise regression procedure is:
The value of R-sq has been increased from 33.34% to 55.38% and the recommended estimated regression equation has an R-Sq (adj) value of 51.41%
(d)
Stepwise Regression: Weeks versus Age, Educ,
Backward elimination: Alpha-to-Remove: 0.05
Response isiAteks on 7 predictors. with N = 50
Step 1234
Constant 22.85070 13.62308 13.06817 -0.06890
Age 1.51 1.52 1.64 1.73
T-Value 4.96 5.04 6.18 6.51
P-Value 0.000 0.000 0.000 0.000
Educ -0.61
T-Value -0.66
P-Value 0.516
Married -10.7 -9.9 -9.8
T-Value -1.79 -1.69 -1.69
P-Value 0.081 0.098 0099
Head -19.8 -19.0 -19.4 -15.1
T-Value -3.39 -3.35 -3.44 -2.95
P-value 0.002 0.002 0.001 0.005
Tenure 0.43 0.37
T-Value 0.91 0.82
P-Value 0.366 0.418
Manager -26.7 -27.7 -29.0 -28.7
T-Value -3.21 -3.40-3.64 -3.53
P-Value 0.003 0.001 0.001 0.001
Sales -18.6 -19.0 -19.0 -17.4
T-Value -2.96 -3.06-3.07 -2.79
P-Value 0.005 0.004 0.004 0.008
S 16.3 16.2 16.2 16.5
R-Sq 59.14 58.72 58.08 55.38
R-Sq (adj) 52.33 52.96 53.32 51.41
Mallows C-p 8.06.4 5.1 5.9
The backward selection procedure terminated after four steps. The estimated regression equation identified by the Minitab forward selection procedure, similar to stepwise regression procedure is:
The value of R-sq has been decreased from 59.14% to 55.38% and the recommended estimated regression equation has an R-Sq (adj) value of 51.41%
(e)
Best-subsets regression enables the user to find the best regression model given a specified number of independent variables.
The following figure is a portion of the computer output obtained by using the best-subsets procedures for the Layoffs data set.
Results ton LAYOFFS.MTW
Best Subsets Regression: Weeks versus Age. Educ....
Response is weeks
This output identifies the two best one-variable estimated regression equations, the two best two-variable equations. the two three-variable equations, and so on. The criterion used in determining which estimated regression equations are best for any number of predictors is the value of the coefficient of determination (R-Sq.).
For instance. Age with an R-Sq = 33.3%. provides the best
estimated regression equation using only one independent variable:
Age and Manager. with an R-Sq = 39.9% provides the best estimated
regression equation using two independent variables: Age. Head and
Manager with an R-Sq = 47.6%. provides the best estimated
regression equation with three independent variables.
The adjusted coefficient of determination (Adj. R-Sq = 53.3%) is
largest for the model with five independent variables: Age. Marred.
Head. Manager and Sales
The best-subsets procedure shows that the best five-variable model contains the independent variables Age. Married, Manage. Head and Sales.
The best estimated regression equation is obtained by using the regression routine of Minitab: Regression Analysis: Weeks versus Age, Married, Head, Manager, Sales
The regression equation is
Weeks = 13.1 + 1.64 Age - 9.76 Married - 19 .4 Head - 29.0 Manager - 19.0 Sales
Predictor Coef SE Coef T P
Constant 13.07 12.40 1.05 0.298
Age 1.6369 0.2651 6.18 0.000
Married -9.764 5.794 -1.69 0.099
Head -19.405 5.636 -3.44 0.001
Manager -28.986 7.958 -3.64 0.001
Sales -18.967 6.181 -307 0.004
S = 16.1794 R-Sq = 58.1% R-Sq (adj) = 53.3%
Analysis of Variance Source OF SS MS F P
Regression 5 15959.5 3191.9 12.19 0.000
Residual Error 44 115180 261.8
Total 49 27477.5
***************** PLEASE GIVE ME RATE****************
Applications 16. A study provided data on variables that may be related to the number of...
A 10-year study conducted by the American Heart Association provided data on how age, blood pressure, and smoking relate to the risk of strokes. Data from a portion of this study follow. Risk is interpreted as the probability (times 100) that a person will have a stroke over the next 10-year period. For the smoker variable, 1 indicates a smoker and 0 indicates a nonsmoker Click on the datafile logo to reference the data. DATA file Risk Blood Pressure Smoker...
A 10-year study conducted by the American Heart Association provided data on how age, blood pressure, and smoking relate to the risk of strokes. Assume the following data are from a portion of this study. Risk is interpreted as the probability (times 100) that the patient will have a stroke over the next 10-year period. For the smoking variable, define a dummy variable with 1 indicating a smoker and 0 indicating a nonsmoker. (See the Stroke file in the document...
A 10-year study conducted by the American Heart Association provided data on how age, blood pressure, and smoking relate to the risk of strokes. Data from a portion of this study follow. Risk is interpreted as the probability (times 100) that a person will have a stroke over the next 10-year period. For the smoker variable, 1 indicates a smoker and 0 indicates a nonsmoker. Risk Age Blood Pressure Smoker 14 58 201 0 23 82 98 1 25 74...
How does fertility affect labor supply? That is, how much does a woman's labor supply fall when she has an additional child? We would like to estimate this effect using data for married women from the 1980 U.S. Census. The data set contains information on n = 254654 married women aged 21-35 with two or more children. In particular, the variable weeksworked is equal to the number of weeks a mother has worked in 1979; morekids is a dummy variable...
The quarterly sales data (number of book sold) for Christian book over the past three years in California follow: (You can use Excel to compute the equation) Quarter Year 1 Year 2 Year 3 1 2 3 4 1230 1020 2534 2600 1470 990 2800 2590 1520 1020 2850 2700 1. Use the following dummy variables to develop an estimated regression equation to account for any seasonal effects in the data: Quarter1=1 if the sales data point is in Quarter...
The quarterly sales data (number of book sold) for Christian book over the past three years in California follow: (You can use Excel to compute the equation) Quarter Year 1 Year 2 Year 3 1 2 3 4 1230 1020 2534 2600 1470 990 2800 2590 1520 1020 2850 2700 1. Use the following dummy variables to develop an estimated regression equation to account for any seasonal effects in the data: Quarter1=1 if the sales data point is in Quarter...
Using the Excel’s Regression Tool, develop the estimated regression equation to show how income (y annual income in $1000s) is related to the independent variables education (x1 level of education attained in number of years), age (x2 in years), and gender x3 dummy variable, 1= female, 0 = male. Develop the dummy variable for the gender variable first. Use the t test to test whether each of the coefficients obtained in part (a) are significant at .05 level of significance....
6. eBook The quarterly sales data (number of copies sold) for a college textbook over the past three years follow Quarter Year 1 Year 2 Year 3 1,765 1,063 2,974 2,554 1,591 1,827 935 2,646 2,423 980 2,812 2,358 4 There appears to be a seasonal pattern in the data and perhaps amoderate upward linear trend b. Use the following dummy variables to develop an estimated regression equation to account for any seasonal effects in the data: Qtrl 1 if...
3. A researcher collected data to study the effect of smoking on the risk of a heart attack. The variables were x - a categorical variable with the categories: (1) Present smoker (2) Past smoker (smoked but quit) (3) Non-smoker Y - a binary variable defined by: Y 1 if the person had a heart attack Y-0 if the person didn't have a heart attack Since the X-variables are categorical, the researcher coded the X-variable by two dummy variables: X2...
A recent 10-year study conducted by a research team at the Medical School was conducted to assess how age, blood pressure, and smoking relate to the risk of strokes. Assume that the following data are from a portion of this study. Risk is interpreted as the probability (times 100) that the patient will have a stroke over the next 10-year period. For the smoking variable, define a dummy variable with 1 indicating a smoker and 0 indicating a nonsmoker. Blood...