1. Homepage
  2. Programming
  3. [2022] Applied Data Science (MAST30034) - Project 1: Quantitative Analysis

[2022] Applied Data Science (MAST30034) - Project 1: Quantitative Analysis

Engage in a Conversation
UnimelbUniversity Of MelbourneMAST30034Applied Data ScienceQuantitative AnalysisPython

School of Mathematics and Statistics CourseNana.COM

Applied Data Science (MAST30034) Project 1: Quantitative Analysis CourseNana.COM

Due date: 22nd of August 09:00 AM (AEST) CourseNana.COM

Project Weight: 30% CourseNana.COM

Project Overview CourseNana.COM

This project aims to make a quantitative analysis of the New York City Taxi and Limousine Service Trip Record Data. The dataset covers trips taken in various types of taxi and for-hire vehicle services in the New York City area. The data in parquet format is directly downloadable from here, with corresponding usage guide linked here. You will need to choose a minimum of 6 months if working with Spark or 3 months if working with pandas from 2016 or later (ensure your data includes Zones, not coordinates). CourseNana.COM

Students will be required to prepare a self-contained report which must be at most 8 pages including figures, excluding references. CourseNana.COM

Project Expectations CourseNana.COM

Please refer to the Canvas Subject Overview for expectations and further information. CourseNana.COM

We understand that the page limit is strict and quite short. This project aims to get students to be able to concisely summarise information professionally. This is because the results of Project 1 will be used to allocate which project student groups get for Project 2 (Industry Project). CourseNana.COM

Lastly, we know that the best way to learn new tools is to use and apply them in a project - this is “the project”. Please try your best, the tutor team will be here to support you where possible. CourseNana.COM

Project Assumptions CourseNana.COM

·       Students are free to choose any software, language, or package that is deemed useful to complete this project, although it is strongly recommended that Python and Apache Spark be used. CourseNana.COM

·       A Latex report template will be provided and students are not allowed to change the margins or font size. Students who prepare their document templates will be required to add margin commands to adhere to the requirements. Otherwise, there will be penalties. CourseNana.COM

·       Students must maintain a GitHub repository with an appropriate and documented README.md file. A template repository has been provided for your benefit under Canvas Modules Project 1 Links Templates via GitHub Classrooms. CourseNana.COM

·       Students have the freedom of choice to select their timeline to analyze, the type of Licensed Taxi you wish to focus on (i.e Yellow vs Green Taxi, Taxi vs For-Hire Vehicles), and the choice of attributes for their area of study. Once again, make sure the time frame chosen is 2016 onward. CourseNana.COM

·       Students may use any external datasets which are deemed sufficiently relevant to support the analysis and attributes of the study. CourseNana.COM

·       The timeline and dataset must be sufficiently “large” to support your research goal. Students may subsample the data when visualizing or fitting a model (please state this in the report or you will be penalised), but, must use the full distribution when analyzing the distribution, aggregating attributes, or performing outlier analysis. CourseNana.COM

Report Format
The report must be at most 8 pages (including figures, excluding references), covering at least, but not limited to, the following items: CourseNana.COM

·       First and foremost, there should be no code present in the report. Please see the sample solutions for examples. CourseNana.COM

·       Identify the taxi dataset, external datasets, attributes, timeline, target audience, and relevant research goal. Justification is required for each point. CourseNana.COM

·       Outline the high-level methodology and preprocessing for visualization and statistical modelling for the research goal. We will be reading your code for the detailed preprocessing steps, so make sure it is well commented on and described. CourseNana.COM

·       Preliminary data analysis with interpretation and discussion. CourseNana.COM

·       A modelling section with at least two contrasting models and approaches with relevant evaluation metrics. CourseNana.COM

·       Make practical and realistic recommendations based on the final results for the identified audience. CourseNana.COM

·       Tables and figures should be referenced where appropriate. Here are some examples: “From (Figure 3) we find ...” or “... the Gini Impurity Metric [3] suggests that ...” or “(Table 3) shows the ...”. CourseNana.COM

·       Ensure that figures are reasonably placed and readable, as ineligible figures or tables will be ignored. CourseNana.COM

·       Less is more, choose the information you present carefully. Irrelevant information will make the report hard to follow and lead to significant reductions in marks. CourseNana.COM

Finally, the report should be proofread several times before submission to minimise grammatical and spelling errors. CourseNana.COM

The Latex template is available via Overleaf or found under Canvas Modules Project 1 Links Templates. You can download the source code and upload the main.tex to Overleaf or copy the project under Menu Actions Copy Project (located top left corner) on Overleaf. If you wish to use your Latex template, ensure your margins and document class adhere to our requirements by adding the following commands: CourseNana.COM

\documentclass[11pt]{article}
\usepackage[top=0.9in, left=0.9in, bottom=0.9in, right=0.9in]{geometry} CourseNana.COM

GitHub Requirement CourseNana.COM

The GitHub repository template is available via GitHub Classrooms or found under Canvas Modules Project 1 Links Repo Template. You must use GitHub Classrooms and not your own personal repository. CourseNana.COM

All repositories will be cloned, executed (run), and used during marking, so please ensure the code is reproducible and readable. For example, if a student uses Python and uses external libraries, then a requirements.txt for a pip installation should be provided, such that anyone can run the command, install the packages, and run the code without errors. Repositories that fail to run will incur a penalty. CourseNana.COM

Assessment CourseNana.COM

This project is worth 30% of your final grade with the following requirements: CourseNana.COM

1.     If no external dataset is used OR the student has chosen an insufficient dataset size, then the maximum number of marks is limited to 22.5/30 marks. CourseNana.COM

o   For example, if a student achieved 28/30 overall without meeting the requirements, their mark will be reduced to a maximum of 22.5/30. CourseNana.COM

o   If for some very unexpected reason you are unable to parse more than 6 months of data with Spark (or 3 months with pandas), you must let us know in advance via email with your reasoning. CourseNana.COM

o   We will provide a JupyterHub server to students with insufficient resources in a first-in, best-dressed manner. CourseNana.COM

2.     If the chosen external datasets are relevant, justified, and used to complement the research goal, then full marks are awarded. CourseNana.COM

o   Some examples of suitable external datasets may be ongoing sports events, protests, weather forecasts (such as the impact of snow), vehicle crashes, etc. CourseNana.COM

o   There are several sources and some may require web scraping or direct contact with the owner of the dataset. It is up to students to choose and find one. CourseNana.COM

Strictly speaking, more marks will be available for students who perform additional analysis, with the highest marks available for students who perform exceptional analysis by drawing upon several external resources. CourseNana.COM

Hurdle Requirement CourseNana.COM

There is a hurdle requirement for you to submit a working GitHub repository and report. We have provided a template GitHub and Latex report for your benefit. Please ensure you do not leave this until the last minute to sort out as the submission deadline is strict. CourseNana.COM

Submission Details CourseNana.COM

·       Report submissions must be made via Turnitin on Canvas in PDF format written using Latex. We will not be accepting and marking any other format. CourseNana.COM

·       Your final code must be in the GitHub repository and hyperlinked in the report.Any submission without a GitHub link will fail this component. CourseNana.COM

·       Late submissions will incur a deduction of 10% (3 marks) per 24 hours past the submission deadline. If you submit late, you must email Calvin Huang (head tutor) at calvin.huang@unimelb.edu.au with your reason. CourseNana.COM

Extension Policy CourseNana.COM

If you have a valid reason with proof to request an extension, you must email Calvin Huang (head tutor) sufficiently before the submission deadline. Requests for extensions are not automated and will be carefully considered on a case-by-case basis. You must provide sufficient supporting evidence such as a medical certificate. Additionally, we will consider your git commits from your repository to illustrate the progress made on the project until the date of your request. CourseNana.COM

Getting Started CourseNana.COM

(This is an example approach for the bare minimum marks.) CourseNana.COM

1.     YoucouldperformsomebasicgeospatialvisualizationsontheTaxidata,computedescriptive statistics, and analyze summary statistics for your chosen attributes. CourseNana.COM

2.     Then, you might formulate a relevant research goal and identify your client/stakeholder for your quantitative analysis. CourseNana.COM

3.     Following this, you can build a Statistical Model to explain relationships between your input and response variables or use a Machine Learning model to classify/predict an attribute of choice. CourseNana.COM

4.     Afterwards, you might investigate the correlation and feature relevance between your attributes, refine your model, and highlight key findings backed by your statistical analysis. CourseNana.COM

5.     Finally, you should summarise and give recommendations to your identified clients or stakeholders. CourseNana.COM

In the event your results are unexpected or lead to unanticipated results, you should aim to discuss why they occurred and what it entails. This scenario happens quite commonly, so it’s still in your best interest to make recommendations that support your unexpected results! CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
Unimelb代写,University Of Melbourne代写,MAST30034代写,Applied Data Science代写,Quantitative Analysis代写,Python代写,Unimelb代编,University Of Melbourne代编,MAST30034代编,Applied Data Science代编,Quantitative Analysis代编,Python代编,Unimelb代考,University Of Melbourne代考,MAST30034代考,Applied Data Science代考,Quantitative Analysis代考,Python代考,Unimelbhelp,University Of Melbournehelp,MAST30034help,Applied Data Sciencehelp,Quantitative Analysishelp,Pythonhelp,Unimelb作业代写,University Of Melbourne作业代写,MAST30034作业代写,Applied Data Science作业代写,Quantitative Analysis作业代写,Python作业代写,Unimelb编程代写,University Of Melbourne编程代写,MAST30034编程代写,Applied Data Science编程代写,Quantitative Analysis编程代写,Python编程代写,Unimelbprogramming help,University Of Melbourneprogramming help,MAST30034programming help,Applied Data Scienceprogramming help,Quantitative Analysisprogramming help,Pythonprogramming help,Unimelbassignment help,University Of Melbourneassignment help,MAST30034assignment help,Applied Data Scienceassignment help,Quantitative Analysisassignment help,Pythonassignment help,Unimelbsolution,University Of Melbournesolution,MAST30034solution,Applied Data Sciencesolution,Quantitative Analysissolution,Pythonsolution,