1. Homepage
  2. Programming
  3. EBIS 2023 Business Analytics - Individual Assignment 1: Data manipulation and visualization

EBIS 2023 Business Analytics - Individual Assignment 1: Data manipulation and visualization

Engage in a Conversation
BMU HKBU UICEBIS 2023Business AnalyticsR

EBIS 2023 Business Analytics - CourseNana.COM

Individual Assignment 1 (Total 100 Marks, 3 pages) CourseNana.COM

  CourseNana.COM

This assignment is designed to walk you through data manipulation and basic visualization techniques for given research questions. You will be also asked to answer several questions about data distribution using the real-world donorchoose.org data. CourseNana.COM

  CourseNana.COM

Now, suppose we are interested in the characteristics of projects proposed on the donorschoose.org platform. Specifically, we have two research interests: (a) whether teachers get higher optional support ratio as they complete more donation projects over time, and (b) whether teachers get better at attracting a higher amount of donations when the poverty level of their schools is high. CourseNana.COM

  CourseNana.COM

Here are the steps you should follow: CourseNana.COM

  CourseNana.COM

  CourseNana.COM

1. Load projects.csv data into a dataframe or tibble named “dt”. [5 marks] CourseNana.COM

  CourseNana.COM

  CourseNana.COM

2. Before analyzing the data, you need to create new variables that are needed for the analysis. [5 marks per each, total 20 marks] CourseNana.COM

  CourseNana.COM

(1) Change the data type of the *date_posted* and *date_ended* column into date, and store these changed values in a new column named *post_date* and *end_date*. (Hint use lubridate package). CourseNana.COM

  CourseNana.COM

(2) Remove projects from the dataset if the value of their *resource_type* column is *Other*. Then recode *NA* in the *margin* and *margin_percentage* to 0. In other words, after the manipulation, the *resource_type* column should not contain *Other* as a part of its values, and *margin* and *margin_percentage* columns should not contain any missing values (NA). CourseNana.COM

Tips: CourseNana.COM

-       for this step, you may use ifelse() in mutate () to recode the values. CourseNana.COM

-       is.na() returns TRUE if the function finds NA among values. CourseNana.COM

  CourseNana.COM

  CourseNana.COM

(3) Create a new column called *completed_project_no* that shows how many complected projects a teacher has in the dataset. For example, teacher A has his/her first complected projects with *completed* in the *funding_status* column, then this projects record should have 1 as the value in the *completed_project_no*  column, his/her second complected project should have 2, etc. CourseNana.COM

Tips: CourseNana.COM

-       for counting with conditions, you may use cumsum(condition_column== "value") to calculate the count. CourseNana.COM

(4) Create a new column called *optional_ratio* that contains a ratio (in terms of %) of the amount of optional support relative to total donation. Use “summed_donations_excluding_optional_support” and “summed_donations_including_optional_support” for the calculation. CourseNana.COM

  CourseNana.COM

  CourseNana.COM

3. Graph the density of the *optional_ratio* column for completed_project_no==1, completed_project_no==2, and completed_project_no ==4. If you use ggplot2 library, you can use the *geom_density* function. CourseNana.COM

- Do you think they are close to normal density? CourseNana.COM

- Interpret any difference you notice from these three densities. CourseNana.COM

[A correct visualization: 10 marks / Interpretation: 10 marks] CourseNana.COM

Tips: CourseNana.COM

- %in% can be used for matching values and returns a vector of the positions of matches of its first argument in its second. CourseNana.COM

- xlim(0,50) can be used to set a limit of x-axis, up to 50 for visualization. This action does not remove large values in the variable. CourseNana.COM

  CourseNana.COM

  CourseNana.COM

  CourseNana.COM

4. Now using the *dt* data frame (or tibble), it is time to create a new data set that you will name *ts*. This table should have two columns for each teacher. [10 marks per each, total 20 marks] CourseNana.COM

(a) *avg_donation*: the average donation to the completed projects created by each teacher. CourseNana.COM

(b) *poverty_level*: the poverty level of the area where the teachers are located. CourseNana.COM

Tips: CourseNana.COM

- use total_donations column for measuring the amount of donations received for each project CourseNana.COM

  CourseNana.COM

  CourseNana.COM

  CourseNana.COM

5. Using the new “ts” table, graph the densities of *avg_donation* only for poverty_level== minimal and poverty_level==high. CourseNana.COM

- Do you think they are close to normal density? CourseNana.COM

- Interpret any difference you notice among these densities. CourseNana.COM

[A correct visualization: 10 marks / Interpretation: 10 marks] CourseNana.COM

Tips: CourseNana.COM

- %in% can be used for matching values and returns a vector of the positions of matches of its first argument in its second. CourseNana.COM

- xlim(0,1000) can be used to set a limit of x-axis, up to 10 for visualization. This action does not remove large values in the variable. CourseNana.COM

  CourseNana.COM

  CourseNana.COM

  CourseNana.COM

6. Now, interpret the differences between outputs of step 3 and outputs of step 5. What would you conclude concerning the given the two research interests presented at the beginning of this assignment? Please try to explain why. [7.5 marks per each, total 15 marks] CourseNana.COM

Instructions: CourseNana.COM

  CourseNana.COM

1. Use R. CourseNana.COM

You do not have to use tidyverse, but it is recommended. If you are to use tidyverse, here’s a useful online resource that has most of what you need to finish this assignment: CourseNana.COM

https://r4ds.had.co.nz/transform.html CourseNana.COM

  CourseNana.COM

2. Use Rmarkdown, and compile your codes, results, and explanations into an HTML or a PDF file. You should submit only one final compiled report file (other formats will NOT be graded). CourseNana.COM

  CourseNana.COM

3. Do not include more than twenty lines of output of your code. For example, if you want to show that your code successfully transforms the entire table, only shows, say, the first ten rows of the table. CourseNana.COM

  CourseNana.COM

4. Do not create a new data frame unless you are instructed to do so. When you create a new column, use the instructed column name. If you have to make your own, you need to justify. CourseNana.COM

  CourseNana.COM

5. The use of any generative AI tool is strictly prohibited for this assignment. If such use is detected, it will be considered an attempt at plagiarism. CourseNana.COM

  CourseNana.COM

  CourseNana.COM

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
BMU HKBU UIC代写,EBIS 2023代写,Business Analytics代写,R代写,BMU HKBU UIC代编,EBIS 2023代编,Business Analytics代编,R代编,BMU HKBU UIC代考,EBIS 2023代考,Business Analytics代考,R代考,BMU HKBU UIChelp,EBIS 2023help,Business Analyticshelp,Rhelp,BMU HKBU UIC作业代写,EBIS 2023作业代写,Business Analytics作业代写,R作业代写,BMU HKBU UIC编程代写,EBIS 2023编程代写,Business Analytics编程代写,R编程代写,BMU HKBU UICprogramming help,EBIS 2023programming help,Business Analyticsprogramming help,Rprogramming help,BMU HKBU UICassignment help,EBIS 2023assignment help,Business Analyticsassignment help,Rassignment help,BMU HKBU UICsolution,EBIS 2023solution,Business Analyticssolution,Rsolution,