Assignment 1
Please attach R code and output together with your answers. (Try to combine the answers, R code and output into one file if you can.)
This data contains the details of confirmed/probable cases of COVID-19 in Hong Kong. Note: in below ALL questions, we treat both “confirmed cases” and “probable cases” as “cases”.
-
Load the data enhanced_sur_covid_19_eng.csv into R. (i) Get a pie chart for the cases based on HK/Non-HK resident classification. (ii) Get a pie chart for the cases based on Case classification. (iii) Summarize the proportions of cases in each Age group (<=19, 20-39, 40-59, 60-79, >=80). Then obtain a bar plot for the proportions.
-
Re-summarize and stored the data into a new dataframe based on the report date. The new dataframe looks like below:
Then answer below questions: (i) Throughout the whole 23/01/2020 – 21/1/2021 period, what is the mean, minimum, maximum, median of the daily total reported cases? (ii) Get a scatterplot for the daily total reported cases v.s. Date. (iii) Get a scatterplot for Epidemiologically linked with imported case v.s. Date.
Add the line of Epidemiologically linked with local case v.s. Date into the above plot. Add the line of Epidemiologically linked with possibly local case v.s. Date into the above plot. Add the line of Imported case v.s. Date into the above plot. Add the line of Local case v.s. Date into the above plot. Add the line of Possibly local case v.s. Date into the above plot. (Note: Use different color or line-type to represent for different case group.)
- Study the daily total reported cases throughout the whole 23/01/2020 – 21/1/2021 period.
a. Obtain the density histogram for the daily total reported cases. By just looking at the plot, is the distribution of daily total reported cases skewed? If yes, is it left- or right- skewed? b. What is the IQR, range, skewness and kurtosis for daily total reported cases? c. Obtain the QQ plot for the distribution of daily total reported cases. Does it look normal? d. Test the normality of the distribution of daily total reported cases using Shapiro-Wilk test and Kolmogorov-Smirnov test. State the null and alternative hypothesis, test statistic, p-value and your conclusion clearly. Use ? = 0.01.
- Obtain a side-by-side boxplot for daily total reported cases against gender. What is your observation?