1. Homepage
  2. Programming
  3. Assignment 1: International Airlines - Traffic by city pairs

Assignment 1: International Airlines - Traffic by city pairs

Engage in a Conversation
International AirlinesData AnalysisPython

Assignment 1 (15 marks)

The assignment is based on two datasets: CourseNana.COM

International Airlines - Traffic by city pairs (city_pairs.csv)

This dataset contains information about international flights to and from Australia in monthly intervals from January 1985 until September 2022. The data is provided by the Australian Government and covers passenger, freight, and mail carried between city pairs connected by single flight number services. https://data.gov.au/data/dataset/international-airlines-traffic-by-city-pairs/resource/ebcafd839514-4f72-a995-fe7ee90cb9da CourseNana.COM

International Airlines - Operated Flights and Seats (seats.csv)

This dataset contains information about international airlines to and from Australia in monthly intervals from September 2003 until September 2022. The data is provided by the Australian Government and covers airline activity, airport locations, and maximum seat capacity. https://data.datahub.freightaustralia.gov.au/dataset/international-airlines-operated-flights-andseats CourseNana.COM

You are expected to manually inspect the datasets and answer the questions. Attributes are self explanatory or elaborated below further. Please ask for clarification on Webcms if needed. CourseNana.COM

The data has been pulled directly from public sources, and we are not responsible for political correctness or other data inaccuracies as this is purely a learning exercise for data services engineering. CourseNana.COM

The output and marking scheme are included with each question, and penalties are given for issues such as incorrect values, row/column names, logic, errors, etc. Marks are awarded independently for each question – errors do not carry through between questions. CourseNana.COM

Question 1 (1.5 Marks) Description: Using the “city_pairs.csv” dataset, load it in a data frame and add three new columns to it. The values should be one of “IN”, “OUT”, or “SAME”, depending on whether inbound or outbound numbers are higher, or they are the same: “passenger_in_out” – whether “Passengers_In” or “Passengers_Out” passenger numbers are higher, or otherwise the same “freight_in_out” - whether “Freight_In” or “Freight_Out” freight numbers are higher, or otherwise the same “mail_in_out” - whether “Mail_In” or “Mail_Out” mail numbers are higher, or otherwise the same Output: a data frame with all the columns you have loaded from city_pairs.csv with three new columns. These should both have the added three columns "passenger_in_out", "freight_in_out", and “mail_in_out”. Marks: CourseNana.COM

0.5 mark – loaded data frame correctly 1 mark – all three column values correct Penalties of -0.25 mark per error are given. CourseNana.COM

Question 2 (1.5 Marks) Description: Using the data frame from Question 1, create a new data frame with unique “AustralianPort” values and their total count of the “IN” or “OUT” labels derived in Question 1 for passengers, freight, and mail since January 1985 in separate columns. This must be sorted in descending order by "PassengerInCount” (highest to lowest). There is no need to include counts for “SAME”. CourseNana.COM

“PassengerInCount” - count of “IN” for passengers for a unique “AustralianPort” “PassengerOutCount” - count of “OUT” for passengers for a unique “AustralianPort” “FreightInCount” - count of “IN” for freight for a unique “AustralianPort” “FreightOutCount” - count of “OUT” for freight for a unique “AustralianPort” “MailInCount” - count of “IN” for mail for a unique “AustralianPort” “MailOutCount” - count of “OUT” for mail for a unique “AustralianPort” Output: a data frame (7 columns) with a row for each unique “AustralianPort” value, and the following columns: “PassengerInCount”, “PassengerOutCount”, “FreightInCount”, “FreightOutCount”, “MailInCount”, “MailOutCount”. CourseNana.COM

Marks: 0.25 mark – each column with fully correct values and column name The 0.25 mark is not awarded for a column if the values or column name is wrong CourseNana.COM

Question 3 (2 marks) Description: Using the data frame from Question 1, create a new data frame with unique “Country” values and their average monthly passenger, freight, and mail values (including 2dp) for their respective "_In” and “_Out” columns separately. This must be sorted in ascending order by “Passengers_in_average” (lowest to highest). If a country does not appear in a month, they need to be considered as 0 and still contribute to the average. CourseNana.COM

Take the following example input data frame which is from Jan 2022 to Feb 2022: CourseNana.COM

Month AustralianPort ForeignPort PassengerIn PassengerOut ...
Jan 2022 Sydney Shenzhen 10 20
Jan 2022 Brisbane Toronto 5 10
Jan 2022 Melbourne Singapore 8 12
Feb 2022 Adelaide Shenzhen 6 15
Feb 2022 Sydney Montreal 1 25
An example output data frame would be as follows: Country Passengers_in_average Passengers_out_average ...
Canada (5 + 1) / 2 = 3.00 (10 + 25) / 2 = 17.50
Singapore (8 + 0) / 2 = 4.00 (12 + 0) / 2 = 6.00
China (10 + 6) / 2 = 8.00 (20 + 15) / 2 = 17.50

Output: a data frame (7 columns) with a row for each unique “Country” value, and the following columns: “Passengers_in_average”, “Passengers_out_average”, “Freight_in_average”, “Freight_out_average”, “Mail_in_average”, “Mail_out_average”. CourseNana.COM

Marks: 0.25 mark – each column with fully correct values and column name 0.5 mark for the correct country list / 0.25 mark for a partially correct country list CourseNana.COM

Question 4 (2 Marks) Description: Using the data frame from Question 1, create a new data frame with a descending (highest to lowest) column of the top 5 “Unique_ForeignPort_Count”. If the counts are the same for a country, use alphabetical order. CourseNana.COM

“Unique_ForeignPort_Count” is the count where routes with at least 1 passenger exist in a unique month between one unique “AustralianPort” to one unique “Country”, but with more than one “ForeignPort” used in that month by that “AustralianPort”. For example, if the input table had the below routes, they would be counted as 5 separate occurrences: CourseNana.COM

  1. Sydney to Dubai, United Arab Emirates in September 2022 CourseNana.COM

  2. Sydney to Abu Dhabi, United Arab Emirates in September 2022 CourseNana.COM

  3. Sydney to Dubai, United Arab Emirates in August 2022 CourseNana.COM

  4. Sydney to Abu Dhabi, United Arab Emirates in August 2022 CourseNana.COM

  5. Melbourne to Dubai, United Arab Emirates in August 2022 CourseNana.COM

  6. Melbourne to Abu Dhabi, United Arab Emirates in August 2022 CourseNana.COM

  7. Melbourne to Auckland, New Zealand in September 2022 CourseNana.COM

  8. Auckland to Melbourne, New Zealand in September 2022 CourseNana.COM

  9. Christchurch to Melbourne, New Zealand in September 2022 CourseNana.COM

  10. Melbourne to Queenstown, New Zealand in September 2022 CourseNana.COM

  11. Melbourne to Wellington, New Zealand in September 2022 CourseNana.COM

  12. Brisbane to Shenzhen, China in November 2021 CourseNana.COM

  13. Brisbane to Shanghai, China in November 2021 CourseNana.COM

  14. Brisbane to Hong Kong, Hong Kong in November 2021 (Note – this is not counted as there is only one ForeignPort used in Hong Kong in this month) CourseNana.COM

The output for the above would therefore be: CourseNana.COM

Output: a data frame with five rows and two columns for “Country” and “Unique_ForeignPort_Count”. Marks: 1 mark – “Country” column fully correct 1 mark – “Unique_ForeignPort_Count” column fully correct 0.2 mark penalty per incorrect value or incorrect column name CourseNana.COM

Question 5 (1 marks) Description: Using the “seats.csv dataset”, load it in a data frame and add two new columns “Source_City” and “Destination_City” to it: CourseNana.COM

If “In_Out” is “I”, then this flight is inbound to Australia. The “Source_City” is set to the “International_City” and the “Destination_City” is set to the “Australian_City”. If “In_Out” is “O”, then this flight is outbound from Australia. The “Source_City” is set to the “Australian_City” and the “Destination_City” is set to the “International_City”. Output: a data frame with all the columns you have loaded from “seats.csv” and the two new columns. The ordering should be preserved from the original CSV file. CourseNana.COM

Marks: 0.5 mark for each column fully correct -0.25 mark penalty each if column values are partially correct, a column name is incorrect, or ordering is incorrect CourseNana.COM

Question 6 (3 marks) CourseNana.COM

Context: Airlines invest resources to plan specific routes around passenger numbers and competing airlines. For example, Fiji, Jetstar, Qantas, and Virgin fly from Sydney to Nadi. A new airline operating this route may face greater competition than Sydney to Port Vila, which only has Air Vanuatu operating it. However, passenger trends and numbers may have changed over time and a greater number of travellers to Nadi means profitability should still be explored. CourseNana.COM

Planes are often used on multi city routes. For example, Jetstar can fly to Tokyo via the Gold Coast through Melbourne or Sydney. You may focus on the Australian_City and International_City, as international flight segments of the same plane are already considered in this data set. For example, MEL > SYD > SGN is considered separately for Australian_City values Melbourne and Sydney based on where passengers board. CourseNana.COM

Description: Using the data frame from Question 5, create a data frame to consolidate statistics for a new or existing airline to understand routes. Your data frame must only include information relevant for airlines to understand the market and how well origin and destination pairs are being serviced. You should add a 200-word comment with optional assumptions to explain your thinking. You may consider dropping unused columns, combining and/or adding values and/or rows, or focusing on regions as you wish. CourseNana.COM

Output: a data frame. Marks: 1 mark – table data provides insights that can be used by an airline to understand the market 1 mark – table data is correctly calculated and aligns with the objectives outlined in the comment 0.5 mark – a convincing comment strictly within the word limit that articulates the solutions objectives and how it provides meaningful insight to a new or existing airline 0.5 mark – table data is concise and does not include irrelevant values, rows, or columns Note: This data set does not include routes with layovers on two different flight numbers. For example, Emirates Airlines is a popular choice for Sydney to London stopovers through Dubai, but they are on two different flight numbers, and only the Sydney to Dubai stopover is included. However, British Airways maintains the same flight number for Sydney to London with a stopover in Singapore. You may focus only on the datasets in this CSV file. CourseNana.COM

Question 7 (4 marks) Context: Seat utilisation refers to how full flights are, and airlines optimise their operations to maximise seat utilisation. This has garnered international attention as many airlines oversell seats (https://time.com/6197994/airlines-overbook-flights-negotiate/) on the probability of no shows. To increase profitability, airlines need statistics to bring their passenger numbers closer to the max seat numbers as much as possible. If passenger numbers / maximum seats is < 100%, the airline is running a loss on empty seats. If passenger numbers / maximum seats is > 100%, the airline is running a loss for accommodation, compensation, etc on rebooking passengers. Airlines need statistics to understand their performance and whether their oversell margins or seat numbers need to be adjusted. CourseNana.COM

Description: From the “city_pairs.csv” and “seats.csv” datasets, across September 2003 to September 2022, and/or the data frames you have created in previous questions, create a visualisation to understand seat utilisation based on the following metrics:

“Passengers_In” “Passengers_Out” “Max_Seats” “Port_Region” CourseNana.COM

Output: an image titled “YOUR_ZID_q7.png”. You will be marked based on how good your plot demonstrates trends over time for seat utilisation around the world. You may optionally include multiple plots within one image. Quality of the visualization(s) such as choice of visualization, clarity, choice of colors, labeling/legends, etc. are to be considered in the marking. Be selective about which data you choose to use to uncover your desired insights, e.g. region specific metrics. Include a 200word comment with optional assumptions to explain your thinking. Marks: 2 marks – visualisation(s) provide insights that can be used by an airline to understand seat utilisation 1 mark – data used in visualisation(s) are correctly calculated and align with the objectives outlined in the comment 0.5 mark – a convincing comment strictly within the word limit that articulates the solutions objectives and how it provides meaningful insight to airlines looking to optimise seat utilisation 0.5 mark – visualisation(s) are concise and does not include irrelevant values, rows, or columns Note: 0 marks will be awarded for this question if the image does not save. CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
International Airlines代写,Data Analysis代写,Python代写,International Airlines代编,Data Analysis代编,Python代编,International Airlines代考,Data Analysis代考,Python代考,International Airlineshelp,Data Analysishelp,Pythonhelp,International Airlines作业代写,Data Analysis作业代写,Python作业代写,International Airlines编程代写,Data Analysis编程代写,Python编程代写,International Airlinesprogramming help,Data Analysisprogramming help,Pythonprogramming help,International Airlinesassignment help,Data Analysisassignment help,Pythonassignment help,International Airlinessolution,Data Analysissolution,Pythonsolution,