FIT3176 Advanced database design - S2 2023
Assignment 1: NoSQL database design - Document-Oriented
[MongoDB]
FIT3176 Individual Assignment - Sem 2/2023 (Weight: 40%)
The assignment is divided into THREE main tasks:
A. Data Modelling – Monash Travel Agency (MTA) B. CRUD – Monash Nature Club (MNC)
B.1. Modifying the Database
B.2. Querying the Database
C. Charts and Drivers – Monash Nature Club (MNC)
A. Data Modelling – Monash Travel Agency (MTA) - 12%
Case Study
Monash Travel Agency (MTA), headquartered near Monash University Clayton Campus, is a new and popular travel initiative from Monash University. MTA offers a wide range of travel services, including transportations, hotel reservations, activity itineraries, and more for Monash Staff and Students required to travel domestic and international locations. However, with the recent increase in travel MTA is facing challenges in managing their operations efficiently due to the use of relational database management systems which is hindered by fixed data structures, manual processes, and long query processing times. As a result, to maintain their competitive advantage in the travel business and provide better customer experience, MTA has recognized the need for MongoDB, a modern NoSQL database solution.
The data required to be stored in MTA’s MongoDB comes from their use case detailed
below:
MTA allows booking to be made by customers and managed by agents. The agency has
a fixed set of itineraries (e.g. travel to Malaysia Campus and Explore Perth on the way)
which customers are able to book for particular dates throughout the year.
Itineraries contain details of accommodations, activities, transportations and travel dates. Accommodations are usually a fixed set of hotels for each location specific to the itinerary. Itineraries have a fixed set of activities to happen between a fixed number of days (e.g. Explore Malaysia for 7 days).
Transportations can be by car, flights, ships etc and each transportations has its own set
of features and the details of the Transportations may change for each activity depending
on availability.
Agents are also responsible for managing the finance side of the agency where they
have to determine the monthly profit/loss of the agency by analysing all bookings and
forecasting which itineraries are profitable to keep and which to remove due unpopularity
or additional expenses.
Given the above data requirements, create a Data Model Diagram for a MongoDB Database which can efficiently store the data required by Monash Travel Agency (MTA).
As part of the data model you are required to create at least one collection(s) / index(or indices) in the MongoDB database (named <your_atuthcate>_MTA) using the data provided in appendix A. Then, using MongoDB Compass provide screenshots of all created collection(s), index(/indices) and at least one document(s). Note each collection and index should be given relevant names (and are not required to include your authcate).
Task Outputs |
1. Data Model Diagram index(/indices) with each screenshot clearly displaying:
|
Software to be use |
MongoDB Compass/Atlas |
Restrictions |
Screenshots should not be taken from MongoDB Shell |
Output files to use |
<your authcate>_task_A.js code added for creating the collection(s), document(s) and index(/indices) <your authcate>_FIT3176_Assignment_1.gdoc Code and screenshot of collection(s), document(s) and index(/indices) |
B. CRUD – Monash Nature Club (MNC) -14%
Monash Nature Club (MNC) is recently studying the parks and different wildlife habitats across the United States in hopes of finding any links between the parks and wildlife of Australia.
The club has hired your team of Advanced Database Experts to use the following sample data files to help with the parks and wildlife analysis:
● wildlife.json ● parks.csv
Note: These data are raw data that does not follow any particular schema. For more information about the fields/columns in the data please refer to Appendix C.
For the analysis MAC has asked your team to perform the following tasks:
B.1. Loading and Modifying the Database - 7%
Create a MongoDB database named using the format: <your_atuthcate>_MNC Load the wildlife.json and parks.csv files using MongoDB Atlas/Compass into the newly created MongoDB database after creating one collection called parks and another collection called wildlife.
Using the newly created collections and one command for each subquestion (e.g. B1.1.) below perform the following modifications:
-
B.1.1. for each document in the wildlife collection, modify the common names by converting the comma separated values to array elements.
-
B.1.2. modify each document in the wildlife collection, to add a new field called ts with the time each document was created.
<your authcate>_Assignment_1.pdf Code and screenshot of collection(s), document(s) and index(/indices) |
Note: you will need to find out first when each record in the document was
created, not when they were inserted into the database e.g. try to find any
field that indicates the creation date.
B.1.3. modify the parks collection to combine the date data (from parkEstDay,
parkEstMonth and parkEstYear) to dateEst a date data type storing the
combined year, month and data. Afterwards, remove the fields for year, month
and day.
-
Note: For the date data other than day, month and year, the remaining components of the time can be taken as any value.
-
B.1.4. combine the data for parks into the wildlife collection and store the output in the wildlife collection, replacing the previous data in the wildlife collection. In the updated wildlife collection the park details should reside inside an array and should not contain a separate _id field.
-
B.1.5. Modify the database to ensure that all new data contains no new values (allowing only values already in the database) for recordStatus, occurrence, nativeness and abundance. Demonstrate the new modification is working by incorrectly adding new data to the wildlife collection.
B.2. Querying the Database - 7%
For each question, use one aggregation query to answer using the resulting collection from B.1. Unless stated otherwise the queries must not add/remove/modify fields in the database.
Note: Marks for this section depend on the query efficiency e.g. the processing speed, the storage, the number of queries used etc.Therefore, using too many queries/indices to answer a section or using temporary variables (such as const, var etc), collections, for each loop may incur mark penalties.
For each question in this section you are expected to:
use your own judgement when it comes to query efficiency provide justification for each index (if any are created) include query execution plans from MongoDB compass
B.2.1. What was the total number of parks established each year? The output should be ordered by highest to lowest year and display the distinct year and number of parks established.
Note: you will need to first check what are the existing categories/values for
recordStatus, occurrence, nativeness and abundance, and then ensue if new
data is entered it only contains values from the existing categories
B.2.2. What was the average number of Bird category species in each park? The
output should display the distinct park name and the average number of Birds.
B.2.3. List all parks within 25 km and 400km of the park with the most Uncommon seasonality abundance species. The list should not show duplicate park names. (if required for this query only the collection can be modified).
-
B.2.4. Display the species with the most common names that belong to the same category.
-
B.2.5. Display the total number of distinct categories, orders and families in each park for all species with common names starting with the letter “a”. The output should display the distinct category, order and family names and the count rounded to 2 decimal places.
The output should display the species id, the category and number
of common names. distinct park name and the average number of Birds.
Task Inputs |
The data for this task is contained in the following files:
(i) wildlife.json |
Task Outputs |
Screenshots for each question
|
C. Charts and Drivers - 14%
C.1. MongoDB Charts - 6%
Create a MongoDB Charts dashboard named <your authcate>_Assignment1
For each question in task B.2. Querying the Database, create a mongoDB charts visualisation to visualise the results. You may use your own judgement with appropriate justifications provided in the report when selecting the type of chart to clearly display your findings.
C.2. MongoDB Driver - 4%
Create a runnable python script using the pymongo driver to display the result of each question in task B.2 and save it named <your authcate>_task_C2.py (must not be a Jupyter notebook file)
Your script should:
be executable using the terminal and command prompt,
contain adequate comments
output the query description and query results on the terminal and command prompt
when run.
Task Inputs |
The data for this task is contained in the following files:
(i) wildlife.json |
Task Outputs |
Screenshots for each question
|
Software to be use |
MongoDB Atlas for Charts Screenshots. A software that can run scripts such as Python script commands (e.g. VS Code etc.). |
Output files to use |
<your authcate>_task_C.py Code added for each sub question <your authcate>_FIT3176_Assignment_1.gdoc Code and screenshot for each sub question <your authcate>_Assignment_1.pdf Code and screenshot for each sub question |
C.3. Attention to Detail - 4%
Less than 4 updates on different days in google doc -4% OR Incorrect numbers of files/Incorrect file types -4%
OR Inconsistent data in screenshots and/or code between files -4%
D. General Information and Submission Checklist
o Submission method: Submission is online through Moodle.
o Penalty for late submission: 10% deduction for each day (including
weekends).
o Assignment Cover Sheet: You will need to sign and attach the assignment
cover sheet as part of the pdf file.
o Please carefully read the requirements for EACH section, especially the Task
Outputs and Restrictions.
o You will be required to regularly upload your work to Google Docs (for tracking
the history of your work)
-
D.1. One combined pdf file containing all tasks mentioned above and named <your authcate>_Assignment_1.pdf
The pdf file should contain:
-
Cover page
-
A signed cover sheet
-
A report that combines all the tasks (including explanations, code and screenshots)
Note: It is expected that this PDF document is quite large due to incorporating the above components. Please note that the report readability also contributes to your assignment mark.
-
-
D.2. Two executable javascript files for Task A and B
Note: All of the above files must be runnable in MongoDBShell.
You should also clearly indicate each task using comments and follow appropriate naming conventions e.g. as specified in the MongoDB documentations.