Coursework 1: Experimental Comparison of Different Supervised Machine Learning Algorithms Using UCI Dataset
For this coursework 1, you are required to evaluate and compare five supervised machine learning algorithms using UCI dataset in Python programming language methods. Every student is expected to have their individual dataset according to their class grouping. This coursework 1 is worth 30% of the module mark.
Learning Outcomes
- Evaluate and articulate the issues and challenges in machine learning, including model selection, complexity and feature selection.
- Demonstrate a working knowledge of the variety of mathematical techniques normally adopted for machine learning problems, and of their application to creating effective solutions.
- Critically evaluate the performance and drawbacks of a proposed solution to a machine learning problem.
- Create solutions to machine learning problems using appropriate software.
Data set
This coursework is designed to allow you to work freely and make sure that your report is unique by avoiding collusions. No two students ought to possess an identical or comparable dataset. Each student will receive a different UCI dataset at random, and you will need to download it from the student website as designated by the module leader. The dataset that you have been given must be used and followed strictly. The purpose of this instruction is to encourage students to work independently, avoid cheating and collusion; any infringement will result in a deduction of twenty points.
Machine Learning and Evaluation
For this coursework you will evaluate five supervised learning methods on UCI dataset in Python. The first algorithm is linear regression, second algorithm is logistic regression, third algorithm is neural network, fourth model is decision tree and the fifth model is k-nearest neighbour.
You may implement these algorithms using the inbuilt classifiers; however you are highly encouraged to implement the functions yourself to train the classifiers. More so, inbuilt function for error measurement is not allowed.
The objective of this coursework is to experimentally investigate which supervised algorithm is best suited for the dataset, and which parameter values are best. In order to answer this question you need to evaluate the error measurement rate and any other performance evaluation metrics you can provide.
Experiments must at least show:
- The training and test error for all the models.
- Develop appropriate data handling code.
- The use of inbuilt error measurement is not allowed for this coursework.
- Experimentally compare different hyper-parameters.
- Provide a visualization of how data was classified for each method (or parameter value), for example based on a scatter plot of two of the features. You are allowed to utilize any inbuilt visualization routines you like, such as plot, or scatter.
The entire experiment must be submitted as jupyter notebook script file (.ipynb) from which all results and figures can be reproduced.
Report structure and assessment (30% of module mark)
1) Write a brief introduction that introduces (5%)
a) Provide a brief introduction of the supervised learning problem as it relates to real-life challenges.
b) Give details of the dataset and other information that describe the dataset.
c) Briefly explain the five models as well as possible parameters.
d) Briefly explain how the models can be individually applied to the dataset.
2) Realize and describe the experiment that evaluates the error measurement rate for all the models on your specific dataset. Explain the choice (or necessity) of your error measurement method. Make sure you use appropriate illustrations and diagrams as well as statistics. What other evaluation metrics than just the error measurement method could be important to decide which method is most suited? More so, discuss the result of the chosen evaluate metrics. (20%)
3) Write a brief conclusion on the results. Mention the algorithm that provides the best result and mentioned the hyper-parameters used. Also, provide a comparison of all the model performance results. (5%)
Submission
Submit your report following the report structure provided above. Include step-by-step descriptions of the tasks you performed and the results obtained during the experiment. Ensure that your report is well-organized, clearly written, and includes all the necessary evaluation metrics and graphs as specified in the coursework requirements. The submission deadline is week 9, November 2023, by 16:00. Late submissions may incur penalties of up to 10 marks reduction, so make sure to plan your work accordingly. Failure to submit your coursework will result to Zero Mark. In the case of exceptional circumstances, contact the Award Administrator in advance.
Submission Format:
The coursework assignment submitted should be compressed into a .zip or .rar file, the following files should be contained in the compressed file:
§ A report as a Microsoft Word document.
File name format: ‘Student ID_MLCoursework1_Report.docx’
§ A .zip or .rar file containing the report experiments: all the program’s sources, including the code, graphs, model architecture, results, and diagrams from the experiments. All implementation source code must be submitted as a Jupyter Notebook script (.ipynb) for easy reproducibility. Your final zipped folder should be submitted digitally to the student website.
File name format: ‘Student ID_MLCoursework1_Files.zip/rar
This is an individual coursework. The university rules on academic conduct including collusion and plagiarism apply.