1. Homepage
  2. Programming
  3. INT303 Big Data Analytics - Assignment 2: Will your employees leave

INT303 Big Data Analytics - Assignment 2: Will your employees leave

Contact Us On WeChat
CNINT303Big Data AnalyticsPythonData MiningMachine LearningDecision TreeRandom ForestsLinear RegressionXJTLU

Assignment 2: Will your employees leave? CourseNana.COM

Goals CourseNana.COM

The main focus of INT303, the class, is to give you the fundamental knowledge of big data CourseNana.COM

such that you can tackle a variety of situations yourself, but you shouldn’t always need to CourseNana.COM

reinvent the wheel from the basics when others have been perfecting the wheel you need CourseNana.COM

potentially for years or decades. CourseNana.COM

  • ·  Programming language Python and its libraries NumPy (to perform matrix operations) and SciKit-Learn (to apply machine learning algorithms)
  • ·  Practice summarizing a potential complex topic into usable information, distilling it down to the important points.
  • ·  Determining which modern big data libraries and tools are available for their project goals.
  • ·  Several machine learning algorithms (decision tree, random forests, extra trees, linear regression).

Feature Engineering techniques. CourseNana.COM

· CourseNana.COM

Problem CourseNana.COM

Employee attrition has become a focus of researchers and human resources because of CourseNana.COM

the effects of poor performance on organizations regardless of geography, industry, or CourseNana.COM

size. The goal of the project was to predict if an employee is likely to quit from the job CourseNana.COM

based on a set of data. We used the Kaggle competition " Will your employees leave? " CourseNana.COM

(see https://www.kaggle.com/competitions/int303-big-data-analysis-2223-s1/data) to CourseNana.COM

retrieve necessary data and evaluate the accuracy of our predictions. An IBM’s fictional CourseNana.COM

dataset has been split into two groups, a 'training set' and a 'test set'. For the training CourseNana.COM

set, we are provided with the outcome (whether or not an employee quit). We used this CourseNana.COM

set to build our model to generate predictions for the test set. For each employee in the CourseNana.COM

test set, we have to predict whether or not the employee quit from the job. Our score CourseNana.COM

was the percentage of correct predictions. CourseNana.COM

Competition Entrance CourseNana.COM

https://www.kaggle.com/competitions/int303-big-data-analysis-2223-s1/overview CourseNana.COM


Tasks 1 (40 Marks)
1. Create an account on https://www.kaggle.com/.

3. Submit your predictions (‘submission.csv’) for the test solution to Kaggle. Also, you are required to include your Kaggle score in your report (see below in Task 2). (30 Marks) CourseNana.COM

Tasks 2 (60 Marks)
Write a 1-page report, which
must contain 2 or 3 tables or figures. CourseNana.COM

The report must cover: CourseNana.COM

· Introduction: (6 Marks) CourseNana.COM

Why should we care about this technology? How is it related to Big Data? CourseNana.COM

· Methodology: (14 Marks) CourseNana.COM

A. Data Preprocessing
What are the steps of data pre-preprocessing explored before training? Data visualization, data cleaning and reduction, normalization and discretization, feature selection, imbalanced data, etc. No need to cover all of them.
B. Classification Algorithm
How does it work? Explain the algorithm or framework.

  • ·  Results: (14 Marks)
    Are there benchmarks for its use? How does it compare to similar technology?
  • ·  Discussion: (8 Marks)
    What are the good aspects, and what are the bad aspects? Be sure to add a sentence on “
    contributor thoughts:” What are your own unique thoughts on the

pros and cons of the technology? Do you envision an extension that might be CourseNana.COM

· Conclusion: (8 Marks) CourseNana.COM

Summarize the 2 to 4 points you think are most important. CourseNana.COM

Concise, information-rich content. For each of the sections above, you will not simply be graded on having content but on the quality of the content and how well it answers the questions in concise, clear, and engaging terms. CourseNana.COM

Style. (10 Marks)
In order to make your report consistent and visually appealing, as well as to make the evaluation of your work fairly, each page should be conformed to the following specifications:

·  Margins: approx. 0.5” on all 4 sides. CourseNana.COM

·  Columns: 2 with approx. 0.3in margin; justified text CourseNana.COM

·  Fonts: CourseNana.COM

·  Body text: Times New Roman, 11pt. CourseNana.COM

·  Section headings: Calibri 13pt bold-Italic CourseNana.COM

·  Within captions, tables, figures, or images: Calibri 9-11pt. CourseNana.COM

· Line Spacing: CourseNana.COM

·  Body text: Single (1.0) CourseNana.COM

·  Section headings: 6pt spacing above heading CourseNana.COM

Academic Honesty. Copying chunks of code or problem-solving answers from other CourseNana.COM

students, online or other resources is prohibited. You are responsible for both (1) not CourseNana.COM

copying others’ work, and (2) making sure your work is not accessible to others. CourseNana.COM

Assignments will be extensively checked for copying of others’ work. Problem-solving CourseNana.COM

solutions are expected to be original, using concepts discussed in the book, class, or CourseNana.COM

supplemental materials but not using any direct code or answers. Please see the syllabus CourseNana.COM

for additional policies. CourseNana.COM


Get Expert Help On This Assignment

Scan above qrcode with Wechat

CN代写,INT303代写,Big Data Analytics代写,Python代写,Data Mining代写,Machine Learning代写,Decision Tree代写,Random Forests代写,Linear Regression代写,XJTLU代写,CN代编,INT303代编,Big Data Analytics代编,Python代编,Data Mining代编,Machine Learning代编,Decision Tree代编,Random Forests代编,Linear Regression代编,XJTLU代编,CN代考,INT303代考,Big Data Analytics代考,Python代考,Data Mining代考,Machine Learning代考,Decision Tree代考,Random Forests代考,Linear Regression代考,XJTLU代考,CNhelp,INT303help,Big Data Analyticshelp,Pythonhelp,Data Mininghelp,Machine Learninghelp,Decision Treehelp,Random Forestshelp,Linear Regressionhelp,XJTLUhelp,CN作业代写,INT303作业代写,Big Data Analytics作业代写,Python作业代写,Data Mining作业代写,Machine Learning作业代写,Decision Tree作业代写,Random Forests作业代写,Linear Regression作业代写,XJTLU作业代写,CN编程代写,INT303编程代写,Big Data Analytics编程代写,Python编程代写,Data Mining编程代写,Machine Learning编程代写,Decision Tree编程代写,Random Forests编程代写,Linear Regression编程代写,XJTLU编程代写,CNprogramming help,INT303programming help,Big Data Analyticsprogramming help,Pythonprogramming help,Data Miningprogramming help,Machine Learningprogramming help,Decision Treeprogramming help,Random Forestsprogramming help,Linear Regressionprogramming help,XJTLUprogramming help,CNassignment help,INT303assignment help,Big Data Analyticsassignment help,Pythonassignment help,Data Miningassignment help,Machine Learningassignment help,Decision Treeassignment help,Random Forestsassignment help,Linear Regressionassignment help,XJTLUassignment help,CNsolution,INT303solution,Big Data Analyticssolution,Pythonsolution,Data Miningsolution,Machine Learningsolution,Decision Treesolution,Random Forestssolution,Linear Regressionsolution,XJTLUsolution,