STAT 2000 - Assignment 3
knitr::opts_chunk$set(echo = TRUE)
Instructions
To properly view the assignment questions, knit this file to .PDF and view the output.
To enter your answers, add code as needed into the R code chunks given below, and, where applicable, replace the "Delete me; ..." and add in your own text response. Be sure when adding in text responses to never copy-paste symbols from outside of the document. Only use the symbols on your keyboard. Do not delete the question text, or modify any other part of the code except for the "author" in Line 3. All numerical and graphical answers must be done using R, unless stated otherwise.
You will have a link in your email that takes you to the Crowdmark submission page. Once you have completed the worksheet, knit it to .PDF and upload your output to Crowdmark. Also, upload your .Rmd file to Crowdmark where prompted. To see where your .Rmd file is saved, click File > Save As in the top-left of your screen. Make sure you set your Name and Student Number in the Author section of this document (Line 3). Do not alter the title or the date. Please note that if you do not submit a knit .PDF file, you will be given a grade of zero.
After you knit your assignment to PDF, check your code chunks. If your code at any point runs off the page, find the nearest comma, click to the right of it, and press Enter (or Return if you are on a Mac). This will force a break in the code so that it goes onto the next line. All of your code must be readable in the final submission.
All calculations and output must be visible in the final document, and all text responses should be in complete English sentences. Your work should be done using the same formatting, functions, and packages as in your labs and course notes, unless otherwise specified. You may speak to your class mates about ideas and what functions/optional arguments you may need to use but you may not directly show your code/output to your classmates.
Your full submission is due by 11:59 p.m. on Friday, March 24. Crowdmark may allow you to submit late, but you will be given an automatic grade of zero if you do. If you have an issue that you can't resolve without someone looking at your work (e.g., you get an error when knitting your document), please see the Help Centre in 311 Machray Hall.
Setup [1 mark]
- Import the FirstYear dataset, available on the UMLearn page. Make sure you have "Heading" set to "Yes" when you import the data, and make sure you name the object FirstYear. [1 mark]
#Insert code here; delete this line (including the # symbol) when done.
This dataset contains the final course grades for 4000 students in a large, first year university class, as well as their faculty (either Arts, Business, Nursing, or Science). The grades are measured out of 100, and the faculty is recorded only by its first letter.
The line of code below will extract 200 random observations, which will be the dataset you work with. After importing the data, replace 1111111 with your seven-digit student id number in the set.seed function below, and click the green arrow at the top-right hand side of the code chunk. This part is not worth marks, but you will receive a 5-mark deduction on your assignment if it is not completed correctly.
set.seed(1111111)
if(exists("FirstYear"))
{
FirstYear = FirstYear[sample(1:NROW(FirstYear), 200), ]
}
Make sure you import your data and shuffle it (click the green arrow in the top-right) before beginning the assignment questions.
Questions [24 marks]
Now that you have completed the setup, the dataset FirstYear
should contain only 200 observations. Check your environment in the top-right and verify that you see the dataset "FirstYear" and that it says "200 obs. of 2 variables".
Suppose you are the instructor of one of the sections of this first year class, and that this dataset consists of the grades of all of your students.
- Make a side-by-side boxplot comparing the final grades between each faculty. Use the
main
,xlab
, andylab
arguments to set meaningful titles for the plot, x-axis, and the y-axis. [2 mark]
Hint: you can add a title to the graph with the code boxplot(..., main = "Title", xlab = "X Label", ylab = "Y label")
.
#Insert code here; delete this line (including the # symbol) when done.
- Comment on what you see in the previous boxplot. In terms of centre, which two groups are the most separated? Do the groups appear to be at least roughly symmetric? [2 marks]
Delete me; type your answer here. Do not copy-paste any symbols from outside sources, and do not remove the asterisks.
- We wish to conduct a hypothesis test at the 5% level to see if the mean grades are equal for each faculty. Use TeX formatting to produce the hypotheses for this test. [2 marks]
$$...$$
- Use
aov
to conduct the appropriate hypothesis test for this problem, and display the resulting ANOVA table. [3 marks]
#Insert code here; delete this line (including the # symbol) when done.
- Give a fully worded conclusion to this test. [2 marks]
Delete me; type your answer here. Do not copy-paste any symbols from outside sources, and do not remove the asterisks.
- Use
qf
to determine the critical value for this test. [1 mark]
Hint: get the critical value on an ANOVA test, at the $\alpha$ level of significance, you can enter qf(1 - alpha, df1, df2)
, where df1
is your numerator degrees of freedom, and df2
is your denominator degrees of freedom.
#Insert code here; delete this line (including the # symbol) when done.
- Using the critical value method, what would your decision regarding $H_0$ be? Give your complete reasoning. [1 mark]
Delete me; type your answer here. Do not copy-paste any symbols from outside sources, and do not remove the asterisks.
- Use
confint
to produce a 99% confidence interval for the mean grades of the Arts students. [1 mark]
#Insert code here; delete this line (including the # symbol) when done.
- Type out the confidence interval below, along with an interpretation. [2 marks]
Delete me; type your answer here. Do not copy-paste any symbols from outside sources, and do not remove the asterisks.
- Use
confint
to produce a 99% confidence interval for the difference in mean grades between the Arts and Business students. Type out the interval below as well. [1 mark]
#Insert code here; delete this line (including the # symbol) when done.
Delete me; type your answer here. Do not copy-paste any symbols from outside sources, and do not remove the asterisks.
- Calculate a 99% confidence interval for the difference in mean grades between the Nursing and Science students (that is, $\text{Nursing} - \text{Science}$). You may either use
relevel
,factor
andconfint
, or use the ANOVA table along withaggregate
andqt
. [3 marks]
#Insert code here; delete this line (including the # symbol) when done.
Delete me; type your answer here. Do not copy-paste any symbols from outside sources, and do not remove the asterisks.
- Use the
TukeyHSD
function to determine which groups are significantly different. Use a Family-Wise Error Rate of 5% (that is, setconf.level = 0.95
in theTukeyHSD
function). [1 mark]
#Insert code here; delete this line (including the # symbol) when done.
- Type out below which groups are significantly different at this level, and give your reasoning. [1 mark]
Delete me; type your answer here. Do not copy-paste any symbols from outside sources, and do not remove the asterisks.
- Use
aggregate
to determine whether the equal-variances assumption appears to be satisfied for the ANOVA test conducted earlier. Give your full reasoning. [2 marks]
#Insert code here; delete this line (including the # symbol) when done.
Delete me; type your answer here. Do not copy-paste any symbols from outside sources, and do not remove the asterisks.