This assessment is worth 20% of your overall grade. Due: 05 March 2023, 23:59pm (Week 5).
Course Learning Outcomes
· CLO 2: Apply suitable algorithms for particular data mining problems.
· CLO 3: Design and develop processes and products to solve business problems related to data mining.
· CLO 5: Communicate effectively in a variety of forms using appropriate terminology.
Task Description
Purpose:
To practice the basic flow of machine learning and to apply regression techniques to solve a practical problem.
Task description:
The task is to predict future energy use in a household based on weather conditions by building an advanced regression model.
You need to write Python or R code to predict the energy use and analyse the impact of different factors based on your model.
Instructions
Please read and follow the instructions below to complete the task.Download the dataset and code template provided.
1. Read the paper 4, below before you start working on the task - you may find the information useful. Candanedo, LNI, Feldheim, V & Deramaix, D 2017, Data driven prediction models of energy use of appliances in a low-energy house', Energy and buildings, vol. 140, pp. 81-97. 4,
2. Construct a code in Python Jupyter notebook or R Notebook/Markdown. Python is preferred.
3. Analyse and visualise the data (word limit: 200 words).
· Identify data dependencies that might be useful for this task and visualise those dependencies using suitable techniques and charts.
· Use this analysis to select suitable prediction models for experimentation and justify your selection.
· Include the charts and diagrams together with the code, e.g., in Jupyter Notebook.
· Pre-process data: apply suitable processing techniques such as scaling, conversion and imputation of missing values.
5. Based on your analysis:
· Implement and train at least two prediction model(s).
You can use the paper and the paper code in your assignment. If you use the code from the paper 4. , clearly identify which part of the code is used and where, and how it has been adapted to your task. You can also use common Python and R libraries. Do not use any other code except the code from the seminar, workshop and the abovementioned paper.
· Use suitable training/testing methodology, such as data training/test split or cross-validation and justify your decision (write up to 100 words).
· Use suitable model performance metrics and justify your selection (write up to 200 words).
6. Test the models and print/include results for all models using machine learning methodology.
7. Compare the results from all candidate models, choose the best model. justify your choice and discuss the results (word limit: 200 words).
· Show the results of all models in the form of suitable charts and tables,
· Select the best performing model, show the final results for this model and justify you selection_
8. Reflect on what you have learned by completing this assignment (word limit: 200 words) Submission requirements:
You are required to submit all the runnable code(s), analysis and results in one file (e.g., .1pynb), do not zip it, just submit that file.
Every submitted file name must be in the form "<your id ›_<your_name>_assign1" (as in the template)