MGT1181 Introduction to Integration & Professional Decision Making – Fall 2023 Group Assignment 2
Sustainability Group Assignment
Due date: As stated on Quercus
Deliverables:
1. Report including a detailed description of insights. At most 10 pages + appendices.
2. All Python source code either in a Jupyter Notebook (*.ipynb) or a Python file (*.py). One file!
3. One Power Bi file showing your dashboard and other visualizations as needed (*pbix).
4. Presentation submitted in form of a video file. At most 10 minutes duration.
Background
Related to the Sustainability Week, consider the given data set FFF.csv.
“At the recent Sustainability: Transdisciplinary Theory, Practice, and Action (STTPA) Conference, there was an emphasis on collaborative solutions to complex societal issues, including the challenges of sustainability.
One such avenue for promoting sustainability is through sustainable investment.
Directly correlating with the themes of the conference, our project seeks to objectively analyze the financial viability of ESG funds. These funds are designed with an environmental focus, aiming to provide both financial returns and positive environmental impacts. Using data from https://fossilfreefunds.org/, you will have the opportunity to evaluate financial data to understand if environmentally conscious investments earn higher financial returns.
This project aligns with the STTPA Conference's goals, offering a hands-on approach to understanding the intersections of sustainability and finance.”
Data Manipulation, Descriptive Statistics, Data Visualization, and Estimation
Missing Data
Investigate whether any data are missing.
(What do you do with columns/rows that contain missing data? Do you delete them, do you fill them? If
you fill them, what do you fill them with? Explain your approach.)
Creating/Deleting and Transforming Data
Investigate whether there are any attributes that you want to add based on the existing data.
(In-class example: Our Online Retail Data: Calculate Revenue; Transform Pound Sterling amounts into
Euro amounts.)
Explain why you are creating these new attributes and what you want to use them for.
Hint: With “dataset = pd.get_dummies(dataset, columns=['<some column>'], drop_first = True)”, you can
create k-1 dummy variables for “some column” with k levels.
Do you want to delete any variables? Explain why?
Grouping, Aggregating and Sorting Data
What kind of groupings will be useful?
(In-class example: Our Online Retail Data: Group by country / SKU / customer / ... Find the country with
the most sales based on: number of transactions, number of customers, total quantity, total revenue, ...)
Explain what you are looking for and why, and comment on your results.
Outliers
Identify outliers
(In-class example: Our Online Retail Data: Large negative unit prices are outliers: They are in fact fees
that have to be paid.)
Explain whether or not you want to include those outliers and why.
Distributions: Center, Spread and Shape
What can we say about Mean, Median and Mode?
Standard deviation, Variance, Range, Interquartile range?
Skewness?
What observations can you make about the distributions?
Data Visualization and Dashboard
Create suitable univariate and bivariate plots to perform exploratory data analysis.
By creating a dashboard, show added insights and put further visualizations into context.
Note: “
Machine Learning
Create a machine learning model that estimates the “Financial performance: Month end trailing returns, year 1” variable.
At least one model should be used but you are encouraged you to explore different models. Use at least one evaluation metric to explain the model performance.
Insights and Summarizing Results
Clearly explain your results. (Are your predictions meaningful?) If not, how could you improve them?
A dashboard is a type of graphical user interface which often provides at-a-glance views
of key performance indicators (KPIs) relevant to a particular objective or business process.”
(Wikipedia)
Behind the scenes, a dashboard connects to your data files, but on the surface displays all this data
in the form of tables, line charts, bar charts and gauges, etc. A data dashboard is the most efficient
way to track multiple data sources because it provides a central location for businesses to monitor
and analyze performance.
Report and Presentation
Report
The report should tell a coherent story explaining the individual steps, their relationships, the visualizations, and deeper insights.
The report should be limited to ten pages.
Appendices are permitted if necessary. There is no strict requirement on the formatting.
Number the pages.
Presentation
Your group must present the assigned case in the form of a video.
Each group member must be actively involved in the presentation. Presentations will be limited to ten minutes.
The goal of the presentation is to encourage students to learn how to communicate their findings collaboratively, effectively, and concisely. Specifically, students have the opportunity to practice how to effectively collaborate online and prepare their presentations. Evidently, most jobs in future likely entail some sort of working from home and working on joint projects with other colleagues that require collective problem solving and presentation. Presentations must be submitted online through Quercus.