BIA B452F Assignment 1
Learning outcome:
- Explain and select analytic techniques for business intelligence and big data analysis.
- Apply data visualization tools and predictive analytics to summarize and analyze business data.
Task
In this assignment, you need to perform exploratory analysis to investigate the world’s top-ten highest-paid athletes as listed by Forbes since 1990. The sample dataset “Forbes Richest Athletes (1990-2020).csv” consists of 301 observations for the following 7 features. The dataset doesn’t have records for 2001 due to the changing of the reporting period from the full calendar year to June-to-June. (Source: https://www.kaggle.com/datasets/parulpandey/forbes-highest-paid-athletes-19902019).
- name – Name
- nationality – Nationality
- current_rank – Current worldwide ranking
- previous_year_rank – Worldwide ranking in last year
- sport – Type of sport
- year– Year
- earnings – Earnings (in US millions)
You must use R to perform exploratory analysis on the Forebes richest athletes (1990-2020) dataset. You must define your own research questions (or hypotheses) and use summary statistics and data visualization to find the answers for your research questions. For example, you may hypothesize that USA athletes dominates in Sport earnings. To collect evidence to verify the hypothesis, you use stack column chart to present the total earnings of athletes in by nationality and year and then draw your conclusion about the hypothesis.
You must pre-process the data and select appropriate visualization methods in the analysis. You may need to handle the missing data, re-code the variables, and perform data aggregation. You may use any appropriate approach to handle the missing data and make reasonable assumptions in the analysis, if necessary. You must justify your methods and assumptions made. You must analyze the statistics and graphical output in detail and write up your interpretation.
The following two references should be a good start for preparing this assignment:
- “Who earned the most in Sports in 2020” at https://www.kaggle.com/code/parulpandey/whoearned-the-most-in-sports-in-2020/notebook
- “Assignment 1 Sample Analysis” on OLE (Note: The sample analysis only illustrates how to write up an analysis report on using R to perform the exploratory analysis of credit card usage. The program and analysis are not directly applicable to the given problem. You are expected to provide more in-depth discussion of the findings in your analysis.)
Write a report to present and discuss your findings of the exploratory analysis. You are strongly recommended to use R markdown to prepare the report. The report must include an overview of the problem, describe analysis of the data, your hypotheses, R programs/outputs, and analysis. This individual assignment will be graded based on the following components (for further details please see rubrics on OLE):
- Describe analysis (20 marks)
- Research questions and data analysis (60 marks)
- Organization and writing skills (20 marks)
Submission Details Your completed works should be uploaded to OLE before deadline (6 March, Monday), as follows:
- Analysis report – “Assignment 1”
- R program (or R markdown) – R program” Marks will be deducted if any non-compliance with the submission requirements.