1. Homepage
  2. Programming
  3. MATH20811 Practical Statistics: Coursework 1 - Exploratory data analysis and correlation

MATH20811 Practical Statistics: Coursework 1 - Exploratory data analysis and correlation

Engage in a Conversation
ManchesterMATH20811Practical StatisticsExploratory data analysis and correlationR

Coursework 1 – Exploratory data analysis and correlation CourseNana.COM

MATH20811 Practical Statistics: Coursework 1 CourseNana.COM

The marks awarded for this coursework constitute 30% of the total assessment for the module. CourseNana.COM

Your solution to the coursework should be fairly concise (maximum of about 10 pages) and it should take, on average, about 15 hours to complete. CourseNana.COM

Please read all the instructions and advice given below carefully. CourseNana.COM

The submission deadline is 10:00 am on Wed 1 November 2023. CourseNana.COM

Late Submission of Work: Any student’s work that is submitted after the given deadline will be classed as late, unless an extension has already been agreed via mitigating circumstances or a DASS extension. CourseNana.COM

The following rules for the application of penalties for late submission are quoted from the University Guidance on late submission document (dated July 2021): CourseNana.COM

Any work submitted at any time within the first 24 hours following the published submission deadline will receive a penalty of 10% of the maximum amount of marks available. Any work submitted at any time between 24 hours and up to 48 hours late will receive a deduction of 20% of the marks available, and so on, at the rate of an additional 10% of available marks deducted per 24 hours, until the assignment is submitted or no marks remain. CourseNana.COM

Your submitted solutions should all be in one document. This must be prepared using LaTeX. Failure to use LaTeX will result in a 5 mark penalty. For each part of the question you should provide explanations as to how you completed what is required, show your workings and also comment on computational results, where applicable. CourseNana.COM

When you include a plot, be sure to give it a title and label the axes correctly. CourseNana.COM

When you have written or used R code to answer any of the parts, then you should list this R code after the particular written answer to which it applies. This may be the R code for a function you have written and/or code you have used to produce numerical results, plots and tables. R code should also be clearly annotated. CourseNana.COM

Do not use screenshots of R code/output or plots. Instead, to include R code use the verbatim envi- ronment, summarise R output in tables using the table environment and use the figure environment to display graphics, as demonstrated in the solution of Example Sheet 2. CourseNana.COM

Your file should be submitted through the module site on Blackboard to the Turnitin assessment in the Coursework folder entitled “MATH20811 CW1” by the above time and date. The work will be marked anonymously on Blackboard so please ensure that your filename is clear but that it does not contain your name and student id number. Similarly, do not include your name and id number in the document itself. CourseNana.COM

There is a basic LaTeX template file on Blackboard which you may choose to use for typing-up your solutions. The file is called CW1_submitted_work.tex. CourseNana.COM

Coursework 1 – Exploratory data analysis and correlation CourseNana.COM

Turnitin will generate a similarity report for your submitted document and indicate matches to other sources, including billions of internet documents (both live and archived), a subscription repository of periodicals, journals and publications, as well as submissions from other students. Please ensure that the document you upload represents your own work and is written in your own words. The Turnitin report will be available for you to see shortly after the due date. CourseNana.COM

Marking rubric: There are 4 questions to complete in the coursework, with a total of 25 marks to be obtained. An additional 5 marks are awarded for the presentation of the report, where we assess the clarity of writing, graphs, diagrams, tables and code, and the use of consistent notation. CourseNana.COM

This coursework should hopefully help to reinforce some of the methodology you have been study- ing, as well as the skills in R you have been developing in the module. Correct interpretation and meaningful discussion of the results (i.e. attempt to put the results into context) are as important as correct calculation of the results, in order to achieve a high mark for the coursework. CourseNana.COM

Coursework 1 – Exploratory data analysis and correlation CourseNana.COM

The data in the file white_wine.csv (Cortez et al, 2009) contain various measurements on white wine variants of the Portuguese Vinho Verde wine. Import the data into R from your default folder using the command: CourseNana.COM

white_wine=read.table("white_wine.csv", sep = ";", header = TRUE)

The object white_wine contains measurements on 11 continuous variables: fixed.acidity, volatile.acidity, citric.acid, residual.sugar, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, density, pH, sulphates, alcohol plus one discrete, ordinal variable: quality. CourseNana.COM

For the purposes of this coursework we will just use the variables in columns 7,8 and 12 which are: CourseNana.COM

total.sulfur.dioxide
density
quality

Note that total.sulfur.dioxide and density are both numeric variables, quality is a discrete, ordinal variable. CourseNana.COM

(i) Using selected summary statistics and graphical displays from those discussed in weeks 1 and 2 of this module, explore the univariate empirical distribution of total.sulfur.dioxide. Comment on your results. CourseNana.COM

[4] CourseNana.COM

(ii) Using box-plots, look at the distributions of the total.sulfur.dioxide data at the different values of quality. Comment on the results, taking into account the differing sample sizes for each distribution. CourseNana.COM

[4] CourseNana.COM

(iii) Produce a scatterplot of the total.sulfur.dioxide and density data. On the same plot, superimpose the contours from a bivariate Normal density with appropriately estimated parameters. Comment (with justification) on your impression of the bivariate Normal distribution as a suitable probability model for these data. CourseNana.COM

[4] 2. Using the function cor, calculate both Pearson’s and Spearman’s correlation between: CourseNana.COM

total.sulfur.dioxide and density
log(total.sulfur.dioxide) and log(density) CourseNana.COM

Comment on the resulting estimates and give an explanation for any similarities or discrep- ancies between them. CourseNana.COM

  1. Let ρ1 denote the correlation in the joint distribution of total.sulfur.dioxide and density. Based on using Pearson’s correlation coefficient, perform a DIY (i.e. write your own code to do the calculations) hypothesis test for CourseNana.COM

    H0 :ρ1 =0.6 vs HA :ρ1 ̸=0.6
    at the 5% significance level using Fisher’s
    Z-transform. Compute the p-value and use it to CourseNana.COM

    decide whether to reject the null hypothesis in favour of the alternative.
    Calculate DIY an approximate 95% confidence interval (CI) for
    ρ1 based on Fisher’s Z- CourseNana.COM

    transform and verify that your calculations agree with the CI produced by cor.test.
    CourseNana.COM

  2. Write a function in R to verify via simulation that the distribution of Fisher’s Z-transform statistic, z, for a given sample size n, is approximately Normal. Your function should produce a plot comparing the sampling distribution of Fisher’s Z-transform statistic, z, and the appropriate approximate Normal distribution the statistic has under the assumption that the true correlation parameter equals zero. In your simulation, you may assume sample data pairs (x,y) come from independent Normal distributions having user-input parameter values. CourseNana.COM

    As your solution to this part, please submit the code for your function, and also run it in R to produce the plot described in the paragraph above and comment on the plot. CourseNana.COM

References CourseNana.COM

[1] P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.  CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
Manchester代写,MATH20811代写,Practical Statistics代写,Exploratory data analysis and correlation代写,R代写,Manchester代编,MATH20811代编,Practical Statistics代编,Exploratory data analysis and correlation代编,R代编,Manchester代考,MATH20811代考,Practical Statistics代考,Exploratory data analysis and correlation代考,R代考,Manchesterhelp,MATH20811help,Practical Statisticshelp,Exploratory data analysis and correlationhelp,Rhelp,Manchester作业代写,MATH20811作业代写,Practical Statistics作业代写,Exploratory data analysis and correlation作业代写,R作业代写,Manchester编程代写,MATH20811编程代写,Practical Statistics编程代写,Exploratory data analysis and correlation编程代写,R编程代写,Manchesterprogramming help,MATH20811programming help,Practical Statisticsprogramming help,Exploratory data analysis and correlationprogramming help,Rprogramming help,Manchesterassignment help,MATH20811assignment help,Practical Statisticsassignment help,Exploratory data analysis and correlationassignment help,Rassignment help,Manchestersolution,MATH20811solution,Practical Statisticssolution,Exploratory data analysis and correlationsolution,Rsolution,