STATS 326/786 Assignment 01
- All the plots should be labelled appropriately.
- Please submit both your .Rmd, and the generated output file .html or .pdf on Canvas before the due date.
- Please make sure that the .Rmd file compiles without any errors. The marker will not spend time fixing the bugs in your code.
- Please avoid specifying absolute paths.
- Your submission must be original, and if we recognize that you have copied answers from another student in the course, we will deduct your marks.
- You will need to use tidyverse packages to answer the questions in this assignment. Please use dplyr for data manipulation, ggplot2 for data visualisation, and lubridate for dates/times. Some parts of Problem 2 will use plots from the fpp3 packages.
- IMPORTANT NOTE: There are some questions that are for STATS 786 only. Students taking STATS 326, while you are welcome to attempt these questions, please do not submit answers to them.
Problem 1: Kobe Bryant
The lakers data set (in the lubridate package) contains play-by-play statistics of each Los Angeles Lakers basketball game in the 2008-2009 regular season. It includes the date, opponent, and game type (home or away). Each play is described by the time on the game clock when the play was made, the period the play was attempted, the type of play made, the name of the player, the result of the play, and the location on the court. Most NBA games have four periods, each 12 minutes in duration. The time variable in this data set is the amount of time left in the period. If a game is tied at the end of the fourth period, the game goes into overtime, meaning some games have more than four periods.
1. 4 Marks
- Read in the lakers data set and convert this into a tibble object.
- Keep only the rows relating to Kobe Bryant.
- Transform the date variable into a lubridate date format (noting that it is currently in integer format). 2. 7 Marks
- Calculate Kobe Bryant’s points per game by summing points by date.
- Create a time plot of Kobe Bryant’s points per game using ggplot.
- Include a horizontal line for Kobe Bryant’s average points per game for the entire 2008-2009 regular season (geom_hline may be useful).
- Comment briefly on any interesting features in your plot. 3. 7 Marks
- Calculate Kobe Bryant’s points per period per game. Note that the period variable is currently in integer format, but you need to transform this into factor format; you can do this using the factor function from the forcats library.
- Create a stacked bar chart where the x -axis is the date and the y -axis is the points per game, and coloured/filled by period. You will need to use geom_bar(stat = "identity") and you will also need to include a position argument within geom_bar.
- Comment briefly on any interesting features in your plot. 4. 8 Marks
- Kobe Bryant scored 61 points on the 2nd of February, 2009. We are interested in how his points accumulate throughout this game. The time variable in the data set measures the amount of time left in a period, where each period is 12 minutes in duration.
- Transform the time variable into time elapsed per period. The ms function may be helpful.
- Transform the time elapsed per period into the time elapsed in the entire game. You will need to think of a formula to convert this. The minutes function may be helpful.
- Calculate Kobe Bryant’s cumulative points throughout the game. The cumsum function may be helpful.
- Plot a step plot to display how Kobe Bryant’s points accumulate over the duration of the game. You will need to use geom_step. Note: You will also need to use the scale_x_time function to tell R that your x -axis is a time period.
Total possible marks for Problem 1: 26 Marks
Problem 2: Sunspots
Sunspots are temporary dark spots on the surface of the Sun caused by strong magnetic fields. The data set sunspot.month counts the number of sunspots observed per month from January 1749 until September 2013. This data set is readily available in R (just type sunspot.month).
1. Mark Read in the sunspot.month data set into R and turn this into a tsibble. 2. 5 Marks
- Plot the seasonal plot using gg_season and the seasonal subseries plot using gg_subseries for the monthly sunspot count.
- Comment on which plot is easier to interpret and why.
- Based on these plots, comment on whether there is monthly seasonality. 3. 9 Marks
- Create a new variable called year that finds the year of the date.
- Calculate the total annual sunspots and call this total. Note that when summarising time series, you need to use index_by, and not group_by.
- Plot the annual sunspot time series.
- Comment on any patterns you observe in your time series plot. 4. 3 Marks
- Create a variable called sqrt.total that computes the square-root of the annual sunspot count.
- Plot this time series.
- Comment on any similarities/differences that you observe when you compare this to the original time series from part (3). 5. 2 Marks
- A solar maximum is when the Sun has the most sunspots on its surface during its solar cycle. It is linked with more auroras in the sky. The most recent solar maximum was in 2014.
- Based on your plots in (3) and (4), discuss when you believe the next solar maximum will approximately be. 6. STATS 786 only 10 Marks
- In this question, you will recreate the seasonal subseries plot from (2).
- Instead of using gg_subseries, you will use functions from the dplyr and ggplot2 packages to create your own seasonal subseries plot.
- Try get your plot as close to the gg_subseries plot. You will get full marks if your plot is exactly the same as what you get with gg_subseries.
- Things you will need to consider include: how to calculate and plot the means for each month, how to rotate the x -axis text, and how to facet your plot.
Total possible marks for Problem 2: 20 Marks for 326 30 Marks for 786 Total possible marks for Assignment 1: 46 Marks for 326 56 Marks for 786