STAT 361 Applied Methods in Statistics I (Fall 2023) Assignment 3: Linear regression

Engage in a Conversation

STAT 361 (Fall 2023) Assignment 3 CourseNana.COM

The assignment is due on Nov. 04 (Saturday) at 23:00 (time of Kingston Ontario). Please submit to Crowd Mark. CourseNana.COM

You can still submit your assignment after the scheduled submission deadline; the penalty for a late assignment is 1% per hour. Watch out for a crowdmark techinical feature: If you have clicked submission for a question before the dead- line, then you CANNOT resubmit for that question once the deadline has passed. CourseNana.COM

Please read the course outline posted in Week 1, OnQ, if you need special accommodation for your assignment.
Requests for extending the submission deadline by < 24 hours (say 1 hour late) will not be considered. CourseNana.COM

Guidelines for Preparing Solutions CourseNana.COM

For questions that needs R coding, please only include the important R output and the necessary results in the main text of your solutions. Present them in a clear and concise fashion (for example, tabulate models and output).
If there are other long code and output that are related to your work and exploration, please put them in an Appendix at the end of EACH problem. CourseNana.COM

These Appendix sections will NOT be marked, but you could submit them as evidence of your independent work.
If you will not submit Appendix sections, make sure your assignment solutions are presented clearly, and show your independent work. CourseNana.COM

Do not expect TAs to search everywhere for your answers from lengthy code and output. Identical solutions between students or copying from other sources will be investigated for academic integrity violations. CourseNana.COM

1. How is R2 related to the sample correlation coefficient? Recall the correlation coefficient CourseNana.COM

E{[X − E(X)][Y − E(Y )]} = q . CourseNana.COM

V ar(X)V ar(Y ) CourseNana.COM

forrandomvariablesXandY,definedasρ= q
The sample correlation coefficient for the observed data x and y is CourseNana.COM

P[(xi − x)(yi − y)] ρˆ= qP(xi −x)2 P(yi −y)2. CourseNana.COM

Cov(X, Y )
V ar(X)V ar(Y ) CourseNana.COM

Show that the R2 of the simple linear regression, model (1) of Chapter 2, is the square of the sample correlation coefficient between x and y, CourseNana.COM

22 CourseNana.COM

R = ρˆ . CourseNana.COM

2. Consider the multiple regression model Y = Xβ + ε, where ε ∼ MVNn(0, σ2I). See descriptions of model forms (1) and (2) in Chapter 4. CourseNana.COM

(a) Show that the residual vector r = (I − P)Y, where P = X(XT X)−1XT , and show that 1 CourseNana.COM

CourseNana.COM

I − P is also a projection matrix.
(b) Let U = (βˆ , r)T . Find the joint distribution of the random vector U. It may be helpful CourseNana.COM

(XT X)−1XT ! to notice that U = (I − P) CourseNana.COM

Y. (c) Show that βˆ and r are independent. CourseNana.COM

Hint: For (b) and (c), properties of multivariate normal distribution may be useful. CourseNana.COM

3. Consider the “Savings.txt” data posted. It is an economic dataset collected in 48 different countries. The variable “sr” is ratio of savings (aggregate personal saving divided by dis- posable income). The variables “pop15” and “pop75” are percentages of population under 15 and over 75 respectively. The variable “dpi” is disposable income (per-capita, in dollars) while the variable “ddpi” is the rate (percent) of change in disposable income (per capita). (a) Draw scatter plot matrix for all the variables involved. Comment on the possible rela- tionships between variables, focus on those appear interesting to you. CourseNana.COM

(b) Fit a simple linear regression model with disposable income (“dpi”) as response and percentage of population under 15 as the only covariate. Describe the model clearly in mathematical form. Report and interpret the fitted model: is there a significant association between the variables, is this what you expect? CourseNana.COM

(c) Find the sample correlation coefficient between the two variables you studied in (b). How is it related to R2 of the model you fitted in (b)?
(d) Fit a regression model with ratio of savings (Y , “sr”) as the response, and all other variables as the covariates. Describe the model clearly in mathematical form, report and discuss the fit of the model. Interpret the estimated coefficient for the rate of change in disposable income. CourseNana.COM

(e) Present the analysis of variance table for the model in (c), i.e, the ANOVA table in the form of Table 1 of Section 4.4. The model you specified in (d) assumes that the error terms are i.i.d. normal with mean 0 and variance σ2. An estimate of σ, denoted by σˆ, can be extracted from your fitted model (supposed it’s named “fitd” in your code), by the R code “sigma(fitd)”. How is σˆ related to SS(Res), the residual sum of squares? CourseNana.COM

STAT 361 Applied Methods in Statistics I (Fall 2023) Assignment 3: Linear regression

Get in Touch with Our Experts