CourseNana | CSC8112 Internet of Things Assessment 1: Machine learning-based IoT data processing pipeline

CSC8112 - Internet of Things CourseNana.COM

This assessment contributes 40% towards the total mark for this module. Out of this, 70% marks are assigned for implementation tasks and 30% marks are for the final report. It is an individual exercise: no group work is permitted for the assessment. You are advised to read and view all the instructional tutorial resources before you start implementing the solutions for the coursework Tasks 1 to 3. Each Task has been assigned a specific mark, which you will be awarded once you successfully demonstrate the completion of the same. CourseNana.COM

Once you complete Tasks 1-3, you will need to prepare the final Report (Task 4). This coursework Report must be submitted on NESS by 4 pm on November 17, 2023. In this Report, you will need to provide an in-depth discussion of how you implemented the solutions (e.g., code and commands) to solve Tasks 1-3. Additionally, you will be required to demonstrate successful executions of Tasks 1-3. Before the report submission deadline, you will be provided with a 15 mins slot to conduct live demonstration. In case of unforeseen disrup- tions (e.g., further lockdowns), we may also allow recorded demonstrations. CourseNana.COM

Final marks for this coursework will be decided by your performance in the live demonstrations and the technical details you will provide in the final report. While the final report needs to be submitted to NESS by 4 pm on November 17, 2023, the live demonstration session will be organised on November 16, 2023. CourseNana.COM

You are required to complete Tasks 1-3 using the command line interface ([https: //docs.docker.com/engine/reference/commandline/cli/]), provided by the Docker En- gine, as well as by implementing a programmatic solution in Python language. CourseNana.COM

Objectives CourseNana.COM

The learning outcomes of this coursework include the following: CourseNana.COM

Understand how to process Internet of Things (IoT) sensor data in the edge-cloud setting? CourseNana.COM
Be able to develop a machine learning-based IoT data processing pipeline ( data collection, data preprocessing, prediction and visualisation) in the edge-cloud setting? CourseNana.COM
Be able to use a lightweight virtualisation technology stack, such as Docker, to implement IoT data processing pipeline in the edge-cloud setting? CourseNana.COM

1 CourseNana.COM

A high-level picture showing the overall system design scope of the coursework is shown in Figure 1, a short explanation of the components is given below: CourseNana.COM

IoT tier: CourseNana.COM

• Newcastle Urban Observatory (NCL UO) [https: // urbanobservatory. ac. uk/ ] : The largest set of publicly available real-time urban data in the UK. NCL UO sensors are gathering data across Newcastle city. With over 50 data types and counting, there are lots of live data for you to access. CourseNana.COM

Edge tier: CourseNana.COM

Data Injector: This will be a software component that you will design and implement in Task 1, focusing on (i) reading data from Urban Observatory API and (ii) transmitting data to the machine learning pipeline. CourseNana.COM
EMQX : A broker of MQTT protocol, a message queuing system given to you as a Docker image, which forms the basis for enabling asynchronous service-to-service communication in a complex Machine Learning (ML)-based IoT data processing pipeline. CourseNana.COM
Data Preprocessing Operator: A software component that you will develop in Task 2, Responsible for preparing training data of Machine Learning model. CourseNana.COM

Cloud tier: CourseNana.COM

RabbitMQ: A cloud-based message queuing system. CourseNana.COM
Machine Learning Model/Classifier/Engine: A software component that can be trained to CourseNana.COM

predict particular types of future events. CourseNana.COM
Visualization: A component that will visualize the trend of raw time-series data and the prediction results (input from the Machine Learning Model/Classifier/Engine). CourseNana.COM

After successfully completing the coursework, you will be able to gain hands-on experience in the following interrelated technology stacks, including: CourseNana.COM

configuring a Docker-based IoT data processing pipeline; CourseNana.COM
pulling images from the Docker Hub (a global repository of software components’ images maintained by the developers); CourseNana.COM
creating and deploying a self-contained IoT data processing service, which is often referred to as microservices; CourseNana.COM

2 CourseNana.COM

• training a machine-learning-based predictor based on real-world data streams available from Newcastle Urban Observatory; CourseNana.COM

• implementing a machine learning-based air-quality prediction micro-service; • visualizing time-series data using graphs. CourseNana.COM

Cloud CourseNana.COM

Azure Cloud VM CourseNana.COM

Machine Learning Engine CourseNana.COM

Visualization CourseNana.COM

RabbitMQ CourseNana.COM

Edge CourseNana.COM

Azure Edge VM CourseNana.COM

IoT CourseNana.COM

Data Preprocessing Operator CourseNana.COM

EMQX CourseNana.COM

Data Injector CourseNana.COM

Air Quality Sensors Data CourseNana.COM

Data Flow in Cloud Data Flow in Edge Data Flow in IoT CourseNana.COM

Figure 1: Overview CourseNana.COM

Pre-Requisites CourseNana.COM

Before starting the coursework, you are advised to carefully go through the training content covered in Lecture 1 and extra supplements provided in the Yuque Document (an online document platform) at . Together, these provide in-depth detail on: CourseNana.COM

• how to access and start Azure VMs, as shown in Figure 2; 3 CourseNana.COM

• how to download and run a docker image on Azure Labs; • how to run your experiments on Azure Labs;
• some hints for system structures of every task. CourseNana.COM

Azure Lab CourseNana.COM

Ubuntu VM(Edge) Ubuntu VM(Cloud) CourseNana.COM

Figure 2: Relationship structure of Azure Lab and Ubuntu VMs. CourseNana.COM

Specification of Tasks CourseNana.COM

The coursework consists of 4 tasks. Please note that Tasks 1-3 need to be done by both the command line and implementing the logic using Python language. CourseNana.COM

Task 1: Design a data injector component by leveraging Newcastle Urban Observa- tory IoT data streams (20 Marks) CourseNana.COM

Task Objectives : Understand and learn how to pull and run a Docker image from Docker Hub using the command line interface, how to collect real-world IoT data streams by invoking the Application Programming Interface (API) of Newcastle Urban Observatory, how to save data into the EMQX (a scalable MQTT broker for IoT applications), and how to re-compile and build Docker image using the command line and programmatic interfaces. CourseNana.COM

Hints : You are advised to carefully read and view the tutorial content relevant to Task 1 that we have provided in the Yuque Doc [https://github.com/ncl-iot-team/CSC8112]. To download the EMQX docker image, please go to the following link [https://hub.docker.com/ r/emqx/emqx]. To install python dependency package "requests" for sending HTTP request, please use the following Python package. [https://pypi.org/project/requests/]. Finally, the Phyton MQTT SDK ["paho.mqtt"] is available from [https://pypi.org/project/paho-mqtt/]. CourseNana.COM

Pull and run the Docker image "emqx/emqx" from Docker Hub in the virtual machine run- ning on Azure lab (Edge). Perform this task first using the command line interface (CLI). CourseNana.COM
Develop a data injector component with the following functions (Code) in Azure Lab (Edge) or the Azure Lab localhost CourseNana.COM

Task 2: Data preprocessing operator design (30 Marks) CourseNana.COM

Task Objectives : Understand how to clean and prepare data for machine learning training by applying data processing operations, such as outliers cleaning and data reformatting. Moreover, you will also learn how to collect/send data from/to message queuing systems (e.g., EMQX and RabbitMQ), which are central to IoT data stream management. This task will also help you understand how native Docker Compose techniques can be leveraged to manage and deploy a complex IoT application stack/pipeline. CourseNana.COM

Hints : You are advised to carefully view the content relevant to Task 2 in Yuque Doc, which is given at To install a python dependency package for sending messages to RabbitMQ(a message queue broker), please download "pika" from [https://pypi.org/project/pika/]. CourseNana.COM

1. Define a Docker compose file which contains the following necessary configurations and instructions for deploying and instantiating the following set of Docker images (as shown in Figure 1) on Azure lab (Cloud): CourseNana.COM

(a) Download and run RabbitMQ image (rabbitmq:management);
2. Designadatapreprocessingoperatorwiththefollowingfunctions(code)inAzureLab(Edge): CourseNana.COM

(a) Collect all PM2.5 data published by Task 1.2 (c) from EMQX service, and please print out the PM2.5 data to the console (this operator will run as a Docker container, so the logs can be seen in the docker logs console automatically). CourseNana.COM
(b) Filter out outliers (the value greater than 50), and please print out outliers to the console (docker logs console). CourseNana.COM
(c) Since the original PM2.5 data readings are collected every 15 mins, so please imple- ment a python code to calculate the averaging value of PM2.5 data on daily basis (every 24 hours), please pick the first start date of a day or a 24 hours interval as the new timestamp of averaged PM2.5 data, and please print out the result to the console (docker logs console). CourseNana.COM

5 CourseNana.COM

(d) Transfer all results (averaged PM2.5 data) to be used by Task 3.2 (a) into RabbitMQ service on Azure lab (Cloud). CourseNana.COM

3. Define a Dockerfile to migrate your "data preprocessing operator" source code into a Docker image and then define a docker-compose file to run it as a container locally on the Azure lab (Edge). If you need the example code please refer to Yuque Doc CourseNana.COM

Task 3: Time-series data prediction and visualization (20 Marks) CourseNana.COM

Task Objectives : Understand how to use a machine learning model/classifier with time- series sensor data, that you prepared in Task 2, to make a prediction, and how to visualize those data and predicted results. CourseNana.COM

Hints : You are advised to carefully view the relevant content of Task 3 in Yuque Doc, given at . To install the python dependency package "matplotlib" (a data visualization tool) use the following library including [https://pypi.org/ project/matplotlib/]. To download the package "prophet" (a machine learning tool) use the following link: [https://pypi.org/project/prophet/]. CourseNana.COM

Download a pre-defined Machine Learning (ML) engine code from CourseNana.COM
DesignaPM2.5predictionoperatorwiththefollowingfunctions(code)inAzureLab(Cloud) or the Azure Lab localhost: CourseNana.COM
1. (a) Collect all averaged daily PM2.5 data computed by Task 2.2 (d) from RabbitMQ service, and please print out them to the console. CourseNana.COM
2. (b) Convert timestamp to date time format ( year-month-day hour:minute:second), and please print out the PM2.5 data with the reformatted timestamp to the console. CourseNana.COM
3. (c) Use the line chart component of matplotlib to visualize averaged PM2.5 daily data, directly display the figure or save it as a file. CourseNana.COM
4. (d) Feed averaged PM2.5 data to machine learning model to predict the trend of PM2.5 for the next 15 days (this predicted time period is a default setting of provided machine learning predictor/classifier model). CourseNana.COM
5. (e) Visualize predicted results from Machine Learning predictor/classifier model, directly display the figure or save as it a file (pre-defined in the provided Machine Learning code). CourseNana.COM

Task 4: Report (30 Marks) CourseNana.COM

Prepare the Final Report in plain English. There is no word or page limit, however, we appreciate a clear, concise and focussed presentation style. The report should consist of: CourseNana.COM

1. Detailed response to each task and related sub-tasks. 6 CourseNana.COM

2. Screenshots of running services in the Docker Environment. 3. Screenshots of Code Snippets and/or Docker console.
4. Plots of data and prediction results by using Matplotlib.
5. Analytical discussion of the results and related conclusions. CourseNana.COM

7 CourseNana.COM

CSC8112 Internet of Things Assessment 1: Machine learning-based IoT data processing pipeline

Get in Touch with Our Experts