1. Homepage
2. Programming
3. Lab 6 Manipulate and process the data - Data preprocessing, exploration, and analysis

# Lab 6 Manipulate and process the data - Data preprocessing, exploration, and analysis

Data AnalysisData PreprocessingData ExplorationPython

# Lab 6&7&8

Professor Julien Maitre, Ph.D. Winter 2023

This lab is graded. The grade is 100 points and represents a percentage of 20% in the final grade for this course. You must form groups of 7 or 8 persons to achieve this lab.

1. General Description The general objective of this lab 6 is to manipulate and process the data. It represents the step before exploiting machine learning algorithms for knowledge extraction or classification. Thus, this lab 6 will allow you to learn Python's programming language and its libraries for data science (e.g., NumPy, Pandas, Matplotlib, SciPy…). In addition, in this lab, you will have to produce a scientific report that analyzes/explores the data and describes the processing steps you have applied. The page limit for the scientific report is 10 pages. All student names of the group should appear on the first page.

2. Formalities The deadline for submitting your work is March 16th, 2023, at 11.59 p.m (China time). After this deadline, there will be a penalty of 10% per day of delay. You will email me a WeTransfer link with the scientific report and code.

1/4

3. What is expected? The scientific report should include:

the description of your dataset o For example: ▪ what are the variables?; ▪ the meaning of each variable; ▪ the number of instances; ▪ the number of classes (if applicable); ▪ the values (e.g., min-max interval) that each of the variables can take?.

data checking and pre-processing o For example : ▪ how many missing values does your dataset have?; ▪ what method(s) did you use to manage these missing values; ▪ a summary of the number of instances per class (if applicable); ▪ what are the statistics (e.g. mean, variance, standard deviation) for each variable; o In this part, you could use data visualization tools. a statistical study of the data and analyzes/interpretations of this statistical study o For example : ▪ statistical study for each variable with respect to each class; ▪ statistical study for each variable with respect to each other variable; ▪ hypothesis tests; ▪ correlation between two or more variables; ▪ Chi-square tests; o In this part, do not hesitate to use data visualization tools. a conclusion of the study o summarize the essential information of your data analysis/exploration. What did you learn about/thanks to the data? a general conclusion o summarize what you appreciated, learned, appreciated less in this lab.

1. Details Regarding the dataset, there is only one restriction. The number of variables should be more than 7 and lower than 12. Also, I recommend you select a dataset where there are classes. If the dataset has more than 12 variables, you can remove variables to reach the maximum number. Finally, you will search on the Web to find a dataset in a field that interests you for more "fun" (e,g., bioinformatics, marketing, commerce, etc.). Here is a sample of web links that provide access to datasets:

## Get Expert Help On This Assignment

#### Scan above qrcode with Wechat

Data Analysis代写,Data Preprocessing代写,Data Exploration代写,Python代写,Data Analysis代编,Data Preprocessing代编,Data Exploration代编,Python代编,Data Analysis代考,Data Preprocessing代考,Data Exploration代考,Python代考,Data Analysishelp,Data Preprocessinghelp,Data Explorationhelp,Pythonhelp,Data Analysis作业代写,Data Preprocessing作业代写,Data Exploration作业代写,Python作业代写,Data Analysis编程代写,Data Preprocessing编程代写,Data Exploration编程代写,Python编程代写,Data Analysisprogramming help,Data Preprocessingprogramming help,Data Explorationprogramming help,Pythonprogramming help,Data Analysisassignment help,Data Preprocessingassignment help,Data Explorationassignment help,Pythonassignment help,Data Analysissolution,Data Preprocessingsolution,Data Explorationsolution,Pythonsolution,