1. Homepage
  2. Programming
  3. MTM5004 Maritime Data Analytics - Assignment 1 - Exploratory Data Analysis and SQLite

MTM5004 Maritime Data Analytics - Assignment 1 - Exploratory Data Analysis and SQLite

Engage in a Conversation
SingaporeNational University of SingaporeNUSMTM5004Maritime Data AnalyticsExploratory Data Analysis and SQLiteData MiningData AnalysisPython

Assignment 1 - Exploratory Data Analysis and SQLite CourseNana.COM

MTM5004 (AY2022/23) CourseNana.COM


CourseNana.COM

European Maritime Safety Agency's Report CourseNana.COM

The European Maritime Safety Agency reports carbon dioxide emissions from ships, mrv.emsa.europa.eu , according to the European Community Regulation 2015/757. The data set CourseNana.COM

MRV 2019, provided in the CSV file MRV_2019.csv , contains technical data on vessel fuel consumption, carbon dioxide emissions, and technical efficiency. The dataset has 12,344 records and 14 variables. CourseNana.COM

The columns of the CSV file are: 'IMO Number Name', 'Ship type', 'Technical efficiency [gCO2/t·nm]', 'Port of Registry', 'DoC issue date', 'DoC expiry date', 'Verifier Name', 'Verifier Country', 'Total fuel consumption [m tonnes]', 'Total CO2 emissions [m tonnes]', 'Annual Total time spent at sea [hours]', 'Annual average Fuel consumption per distance [kg / n mile]', 'Annual average CO2 emissions per distance [kg CO2 / n mile]'. CourseNana.COM

They correspond to the following data. CourseNana.COM

1. IMO Number,
2. Name,
3. Ship type,
4. Technical efficiency (g/tM) 5. Port of Registry,
CourseNana.COM

6. DoC issue,
7. DoC expiry,
8. Verifier Name, 9. Verifier Country,
CourseNana.COM

10. Total fuel consumption (t),
11. Total CO2 emissions (t),
12. Annual Total time spent at sea (h),
13. Annual average Fuel consumption per distance (kg/M), 14. Annual average CO2 emissions per distance(kg/M).
CourseNana.COM

The symbols for the units of measurements are M for nautical mile, g for gram, kg for kilogram, t for tonne, h for hours. CourseNana.COM

Each record corresponds to a vessel, usually a ship. The ship is identified by its matriculation number by the International Maritime Organization (IMO). Other fields of a ship's records indicate the name of the ship, its type, its technical efficiency, namely, the ship’s energy efficiency design index (EEDI) or its estimated value (EIV), its port of registry, the issue and expiry dates of its declaration of CourseNana.COM

Instructions to submit the assignment CourseNana.COM

- Name your jupyter notebook as `Assignment1_[StudentID].ipynb`.
- Your solution notebook must contain the python code that we can run to verify the answers.
CourseNana.COM

conformity (DoC), the name and country of the verifier of the data in the record, its total consumption, its total carbon dioxide (CO2) emissions, its annual time at sea, and its annual average fuel consumption and annual average carbon dioxide emissions per distance. CourseNana.COM

It is recommended that you further research the above-mentioned terms and notions as they are frequently used in the maritime industry. CourseNana.COM

Part 1: Data Loading and Cleaning with Pandas CourseNana.COM

Let's begin by importing required libraries. CourseNana.COM

Inspect the dataset carefully and identify the anomalies in the dataset. Print the number of missing values in each column of the dataset. CourseNana.COM

import pandas as pd import numpy as np CourseNana.COM

Import the dataset from the table mrv2019 into a Pandas DataFrame. Load all the data as strings. Use the following names for the columns of the DataFrame. CourseNana.COM

1. 'IMO',
2. 'Name',
3. 'Type',
4. 'Technical Efficiency', 5. 'Port of Registry',
6. 'DoC issue',
7. 'DoC expiry',
8. 'Verifier Name',
9. 'Verifier Country',
CourseNana.COM

10. 'Total Fuel',
11. 'Total CO2', 12. 'Time at Sea', 13. 'Fuel per Mile', 14. 'CO2 per Mile'.
CourseNana.COM

You would have observed that the data even contains anomalous values such as Division by zero! and Not Applicable. Remove the rows with missing values and other anomalies. CourseNana.COM

Capitalise (change to uppercase) the names of the ports of registry. CourseNana.COM

Change the types of the columns 'Total Fuel', 'Total CO2', 'Time at Sea', 'Fuel per Mile', and 'CO2 per Mile' to pynum float type np.float64 . CourseNana.COM

Replace the column Technical Efficiency by two new columns Technical Efficiency Index and Technical Efficiency Value that contains the acronym of the technical efficiency index EEDI or EIV as a string and the value of the technical effieciency index as pynum float type CourseNana.COM

np.float64 ., respectively. CourseNana.COM

You would have observed that there are anomalied in the format of Technical Efficiency. We want to esure that the dataset contains only two indices EEDI and EIV . We do so by executing following code. CourseNana.COM

df = df[df['Technical Efficiency Index'].isin(['EEDI', 'EIV'])] CourseNana.COM

Part 2: Energy Efficiency Analysis CourseNana.COM

The technical energy efficiency or design energy efficiency of a vessel measures the CO2 emission relative to the amount of transport work done. The Energy Efficiency Design Index (EEDI) is commonly used to represent the technical efficiency of a vessel. The Estimated Index Value (EIV) is a simplified form of EEDI. Interested students are encouraged to read more about these values here (https://www.linkedin.com/pulse/eiv-another-irrelevant-shipping-acronym-poul-woodall). CourseNana.COM

Partition the dataset into two subsets, one for ships for which EEDI is available and another subset for EIV. To do so, create two Pandas DataFrames: df_eedi , which contains data for the ships for which Technical Efficiency Index is EEDI, and df_eiv , which contains data for the ships for which Technical Efficiency Index is EIV. Both of these DataFrames must be indexed by IMO number of the ships. CourseNana.COM

(2 points) CourseNana.COM

df_eedi = df[df['Technical Efficiency Index'].isin(['EEDI'])] df_eiv = df[df['Technical Efficiency Index'].isin(['EIV'])] CourseNana.COM

We import the Matplotlib library for drawing plots. CourseNana.COM

Remove the datapoints from df_eedi and df_eiv that have Technical Efficiency Value smaller than the lower bound and larger than the upper bound. CourseNana.COM

import matplotlib.pyplot as plt CourseNana.COM

We want to compare distribution of EEDI an EIV values. We used matplotlib boxplot to draw two boxplots side by side to each other as shown below (code not provided!). We do not see anything since the code also raised multiple errors. The errors are due to presence of outliers in the Technical Efficiency Values. Let's use the IQR method to remove such outliers. CourseNana.COM

EEDI versus EIV CourseNana.COM

IQR Method: The interquartile range (IQR) is calculated as the difference between the 75th and the 25th percentiles of the data. The IQR method considers observations with values 1.5 times the IQR below that of the 25th percentile or above that of the 75th percentile as outliers. CourseNana.COM

For each each DataFrame, df_eedi and df_eiv , calculate the lower and upper limit for the Technical Efficiency Value. The lower limit is 1.5 times of IQR below 25th percentile whereas the upper limit is 1.5 times of IQR above 75th percentile. CourseNana.COM

Write the code to draw boxplots to compare Technical Efficiency Values under two indices. The final visualisation must follow specifications in the boxplot shown earlier such as, two boxplots next to each other with a common Y-axis. CourseNana.COM

Generate the bivariate Pearson correlation matrix for the following five variables for each subset of the data between technical efficiency, fuel consumption, CO2 emissions, fuel consumption per mile, CO2 emissions per mile. CourseNana.COM

Generate the boxplot of Technical Efficiency Value for EEDI indexed data acroos different types of ships. This boxplots comprises of technical efficiency values on the Y-axis and ship types on the X- axis. CourseNana.COM

Qualitative Analysis CourseNana.COM

Please provide your observations/comments (up to two lines for each) on: CourseNana.COM

Boxplot
Correlation matrix
Technical efficiency values across ship types
CourseNana.COM

Part 3: SQLite Integration CourseNana.COM

(4 points) CourseNana.COM

If we had done the data preparation and cleaning in a database with SQL, our code could scale to a much larger data set without putting much stress on the main memory of our computer. You may want to try and redo the above questions entirely in SQL with SQLite by first loading the original CSV file into a table, rawmrv2019 using sqlite3 interactive terminal command .import -csv MRV_2019.csv rawmrv2019 . This is not part of this assignment. CourseNana.COM

Instead, we can create a table and store the data from the data frame into the table. We use SQLite and the sqlite3 Python library. CourseNana.COM

import sqlite3 CourseNana.COM

Create create and connect to a database mrv2019.db . In [ ]: CourseNana.COM

Create a table mrv2019 with appropriate domains to load the records of the data frame. Use the following names for the columns of the table. Choose the appropriate domain for the columns among CourseNana.COM

TEXT , DATE , and NUMERIC . We underscore instead of spaces for the column names in order to avoid having to escape double quotes in SQL and Python. CourseNana.COM

1. 'IMO',
2. 'Name',
3. 'Type',
4. 'Technical_Efficiency_Index', 5. 'Technical Efficiency_Value', 6. 'Port_of_Registry',
7. 'DoC_issue',
8. 'DoC_expiry',
CourseNana.COM

connection = sqlite3.connect('mrv2019.db') CourseNana.COM

9. 'Verifier_Name', 10. 'Verifier_Country', 11. 'Total_Fuel',
12. 'Total_CO2',
13. 'Time_at_Sea', 14. 'Fuel_per_Mile', 15. 'CO2_per_Mile'.
CourseNana.COM

Note that SQL is actually case insensitive. Capitalise SQL keywords for readability. CourseNana.COM

 (1 point) CourseNana.COM

Insert the data from the DataFrame into the table. CourseNana.COM

cursor = connection.cursor() CourseNana.COM

for index, row in df.iterrows():
cursor.execute("INSERT INTO mrv2019 VALUES (? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,?, ?) ))
CourseNana.COM

Let us do a simple exploratory analysis. In this dataset, we have different ships, each of which is registered in different port. CourseNana.COM

Now that the analysis is over, we should drop the table, commit changes to the database and gracefully close the database connections. CourseNana.COM

Write an SQL query that finds the ports with the top 21 highest number of ship registrations. Draw a barplot of the result. The X-axis is labeled by the names of the ports whereas the Y-axis is labeled by the corresponding number of ship registrations. CourseNana.COM

(3 points) CourseNana.COM

cursor = connection.cursor() cursor.execute("DROP TABLE mrv2019") cursor.close() CourseNana.COM

connection.commit() connection.close() CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
Singapore代写,National University of Singapore代写,NUS代写,MTM5004代写,Maritime Data Analytics代写,Exploratory Data Analysis and SQLite代写,Data Mining代写,Data Analysis代写,Python代写,Singapore代编,National University of Singapore代编,NUS代编,MTM5004代编,Maritime Data Analytics代编,Exploratory Data Analysis and SQLite代编,Data Mining代编,Data Analysis代编,Python代编,Singapore代考,National University of Singapore代考,NUS代考,MTM5004代考,Maritime Data Analytics代考,Exploratory Data Analysis and SQLite代考,Data Mining代考,Data Analysis代考,Python代考,Singaporehelp,National University of Singaporehelp,NUShelp,MTM5004help,Maritime Data Analyticshelp,Exploratory Data Analysis and SQLitehelp,Data Mininghelp,Data Analysishelp,Pythonhelp,Singapore作业代写,National University of Singapore作业代写,NUS作业代写,MTM5004作业代写,Maritime Data Analytics作业代写,Exploratory Data Analysis and SQLite作业代写,Data Mining作业代写,Data Analysis作业代写,Python作业代写,Singapore编程代写,National University of Singapore编程代写,NUS编程代写,MTM5004编程代写,Maritime Data Analytics编程代写,Exploratory Data Analysis and SQLite编程代写,Data Mining编程代写,Data Analysis编程代写,Python编程代写,Singaporeprogramming help,National University of Singaporeprogramming help,NUSprogramming help,MTM5004programming help,Maritime Data Analyticsprogramming help,Exploratory Data Analysis and SQLiteprogramming help,Data Miningprogramming help,Data Analysisprogramming help,Pythonprogramming help,Singaporeassignment help,National University of Singaporeassignment help,NUSassignment help,MTM5004assignment help,Maritime Data Analyticsassignment help,Exploratory Data Analysis and SQLiteassignment help,Data Miningassignment help,Data Analysisassignment help,Pythonassignment help,Singaporesolution,National University of Singaporesolution,NUSsolution,MTM5004solution,Maritime Data Analyticssolution,Exploratory Data Analysis and SQLitesolution,Data Miningsolution,Data Analysissolution,Pythonsolution,