2. Data were collected at a large university on n = 224 computer science majors in a certain year. The purpose was to predict the cumulative GPA after three semesters in college. Among the predictors were high school grades in mathematics (HSM), science (HSS), and English (HSE). The follow multiple linear regression model was considered:
Y = b0 + b1x1 + b2x2 + b3x3 + e (1)
where x1 = HSM, x2 = HSS, and x3 = HSE.
(a) What does stand for in (1)? and what are the assumptions for e?
(b) Partial results for the estimated regression coefficients for (1) are:
b0 = 0:590, b1 = 0:169, and b3 = 0:045. (Note result for b2 is not here.)
The corresponding standard errors are 0:294, 0:035, and 0:039, respectively. Which of the above regression coefficients is(are) NOT significant at 5% level? Please justify your
answer.
(Note: a regression coefficient, bj , is significant at 5% level if the null hypothesis H0 : j = 0 vs H1 : j 6= 0 is rejected at the level = 0:05. Also, because the sample size, n, is fairly large, normal table can be used instead of the t-table.)
(c) The following is a partial ANOVA table for fitting model (1):
Source SS d.f. MS F
Regression 9.237
Error 107.750
Total
In addition, another regression model was considered by dropping HSS from model (1), that is,
Y = b0 + b1x1 + b3x3 + e (2)
The corresponding partial ANOVA table is given below:
Source SS d.f. MS F
Regression 27.303
Error
Total
Based on the information, do you think dropping HSS from (1) is a right decision? Test using = 0:05.
(Note: You do not have to complete the entire ANOVA tables in order to answer the question; you just have to get the relevant parts.)