1. Homepage
2. Programming
3. DTS205TC High Performance Computing - Group Project Assignment 1: Random Forest

# DTS205TC High Performance Computing - Group Project Assignment 1: Random Forest

Random ForestPythonDTS205TCHigh Performance ComputingChinaXJTLU

DTS205TC High Performance Computing

# Group Project Assignment 1

## Overview

Random forest is an ensemble learning method for classification that operates by constructing a multitude of decision trees at training time.1 It often have very good predictive accuracy, and have been widely used in many applications. In this task, you will be asked to manually implement a random forest algorithm and parallelize it.

## Team policy

You are free to team up from minimum of two to maximum of three team members, and one of the team member must fill a online document contains all team members’ information before 6th March, 23:59. Students who fail to do so will be randomly assigned. Changes will not be allowed once settled.

Avoid Plagiarism ⚫ Do not submit work from other teams. ⚫ Do not share code/work to students other than your own team members. ⚫ Do not read code/work from other teams, discussions between teams should remain high level. ⚫ Do not use open-source code on the Web, or code from textbooks.

## 1. Group Tasks (60 marks)

### Dataset

We will provide a dataset ‘data.csv’, which can be used to test your program. It is a 2-category, 10feature dataset with 5×105samples.

In order to implement a parallel random forest, the following tasks should be accomplished: ✓ Decision tree (without stop-split-early condition) It includes the following components:

1) Calculate information gain; (10 marks) 2) split the data via finding the best feature based on a); (10 marks) 3) create branches recursively based on b) until every leaf only contains a single category; (10 marks) ✓ Random Forest 4) Bagging. Perform 100 Bootstrapping on the data, generate different decision trees based on task 3), and perform majority-voting on their prediction results. (10 marks) 5) When each node of the decision tree is split, 3 features are randomly selected in the way of non-replacement sampling, and task 2) is performed accordingly. (10 marks) ✓ Report 6) Based on task 2) single-layer decision tree, 3) decision tree, 4) bagging tree, 5) random forest, perform 5-fold CV, and compare their average prediction Accuracy2 on the validation set. (10 marks) Models Accuracy Single-layer Decision Tree Decision Tree Bagging Tree Random Forest

## Individual Challenge Tasks (40 marks)

7) For the above tasks 1)-4) (no task 5), if they need to be parallelized, what parallel method do you think should be used respectively? Please explain the reasons for your choice. (4*5 marks) 8) Choose one (not all!) of tasks 1)-4), and implement the parallelization of random forest based on the solution in task 7). (10 marks) NOTE: Marks are given based on the correctness and clarity of your code. Which approach you perfer will not affect your score. 9) Let the number of decision trees in the Random Forest be fixed at 100, change the number of processors, and measure the running time of the program respectively. Record the results in the table below

Please estimate the speedup of your program, does it achieve a linear speedup? If not, why? (10 marks) NOTE: There are no restrictions on the programming language, nor the parallelization library. You can use python, C, or any other language; and you can use MPI, OpenMP, multiprocessing, coroutine or various other parallelization libraries. To reiterate, no matter what language and library you use, you cannot directly call the off-the-shelf machine learning library, otherwise you will not get a score.

## 3. Submission

Group Submission One of the group members must submit the following files:

1) Cover letter with the student IDs and names of all group members (template can be found on LMO). 2) All runnable source code organised by folders. 3) A report (pdf) file contains all your answers, source code and charts. 4) Explain what part of the work each person did. Once you have all the files, please put them in a single directory (named groupid-assign1) and compress it to .zip file. Individual Challenge Submission 1) Cover letter with the student ID and Name (template can be found on LMO). 2) All runnable source code organised by folders. 3) A report (pdf) file contains all your answers, source code and charts. Once you have all the files, please put them in a single directory (named studentID-challenge) and compress it to .zip file.

## Get in Touch with Our Experts

QQ
WeChat
Whatsapp
Random Forest代写,Python代写,DTS205TC代写,High Performance Computing代写,China代写,XJTLU代写,Random Forest代编,Python代编,DTS205TC代编,High Performance Computing代编,China代编,XJTLU代编,Random Forest代考,Python代考,DTS205TC代考,High Performance Computing代考,China代考,XJTLU代考,Random Foresthelp,Pythonhelp,DTS205TChelp,High Performance Computinghelp,Chinahelp,XJTLUhelp,Random Forest作业代写,Python作业代写,DTS205TC作业代写,High Performance Computing作业代写,China作业代写,XJTLU作业代写,Random Forest编程代写,Python编程代写,DTS205TC编程代写,High Performance Computing编程代写,China编程代写,XJTLU编程代写,Random Forestprogramming help,Pythonprogramming help,DTS205TCprogramming help,High Performance Computingprogramming help,Chinaprogramming help,XJTLUprogramming help,Random Forestassignment help,Pythonassignment help,DTS205TCassignment help,High Performance Computingassignment help,Chinaassignment help,XJTLUassignment help,Random Forestsolution,Pythonsolution,DTS205TCsolution,High Performance Computingsolution,Chinasolution,XJTLUsolution,