Homepage
Programming
Assignment 2: Part 2. Data Cleaning - Craft Beer Bar Sales Kaggle dataset

Assignment 2: Part 2. Data Cleaning - Craft Beer Bar Sales Kaggle dataset

Engage in a Conversation

Assignment 2: Part 2. Data Cleaning

The purpose of this assignment is to clean a data set so that it can be analyzed. The dataset is a modified version of the Craft Beer Bar Sales Kaggle dataset. The original dataset is available here: https://www.kaggle.com/datasets/podsyp/sales-in-craft-beer-bar. CourseNana.COM

Directions

rename this file as A2_W23_DataCleaning_CUID.ipynb where CUID Is your Carleton University Identification
pay attention to where your file is saved as you will need to upload it to Brightspace
follow the instructions below
when completed, upload your .ipynb file to Brightspace.
The data set should contain the following attributes
Product_code: unique product identifier
Vendor_code: Manufacturer's name
Name: SKU
Retail_price: Catalog price
Country_of_Origin: Manufacturer country
Size: sale item size
Vendor: Manufacturer's name
ABV: alcohol by volume
Product_type: the type of product
Please type your name here:

Please type your student number here:
```
import pandas as pd
import numpy as np
```
Step 1a. (1 pt) Read the CraftBeerV1.csv file into a dataframe named df and list its columns. This can be done in the same cell. CourseNana.COM

Step 1b. (2 pts) Remove column(s) from your dataframe that do not exist in the data dictionary above. Use as many cells as you need and ensure you save the results back to your dataframe df. Then, display the data frame. This can be done in the same cell. CourseNana.COM

Step 1c. (2 pts) Determine the number of unique values for each column in the dataframe. (1 pt) If an attribute has only a single unique value, remove it from the dataframe (1 pt). Show all steps. Use as many cells as you need and ensure you save the results back to your dataframe df. CourseNana.COM

Step 1d. (2 pts) Compare the Vendor_code column with the Vendor column (1 pt). If they contain the same data, delete one of the columns (1 pt). Show all steps (ie. prove that the two columns do/do not contain the same data). Use as many cells as you need and ensure you save the results back to your dataframe df. CourseNana.COM

Step 1.e. (3 pts) Determine whether there are any duplicate rows (1 pt). If there are duplicates, display them to the screen (1 pt). If duplicate rows exist, remove them from the dataset and reset the index of the dataframe (1 pt). Show all steps. Use as many cells as you need and ensure you save the results (if there any) back to your dataframe df. CourseNana.COM

Step 2a. (2 pts) Display basic statistical information for the dataframe's numeric attributes. Ensure that the numeric attributes are shown vertically on the left and the various statistical attributes are columns. CourseNana.COM

Step 2b. (4 pts) Look through the min column to determine if there are any 0s. If there are, think about whether a 0 is a valid value for the attribute in question. If you think it's an error, first determine how many occurrences of 0 exist for the attribute in the dataframe (1 pt). Next display only the rows that have a 0 for the attribute in question (1 pt). Assign Nan using the numpy library to all occurrences of 0 for the attribute under question (1 pt). Finally, display value counts for every unique value of the attribute in question ensuring that NaN gets counted (1 pt). CourseNana.COM

Step 2c. (3 pts) Create box plots for Size, and ABV. CourseNana.COM

Step 2d. (14 pts) Visually inspect each boxplot and determine if there are any statistical outliers for each of the two attributes. You can assume a statistical outlier is any data point that exceeds the upper or lower whisker. For each attribute that contains outlier(s): CourseNana.COM

calculate and display the IQR
calculate and display the necessary min and/or max values that are the thresholds for determining outliers (use the 75th + 1.5 IQR and 25th - 1.5 IQR formula discussed in class)
use the above information to display the row(s) that contain the outlier

Step 2e. (6 pts). For each of the attribute(s) that had outliers, would it make sense to impute values to replace the outliers? Why or why not? Answer in the cell below. Be sure to identify which attribute you are referring to in answering the questions. CourseNana.COM

Answer 2e here: CourseNana.COM

Step 3a. (3 pts) Determine how many missing values there are in each column and how many rows have missing values. Show all your steps. CourseNana.COM

Step 3b. (2 pts) Remove rows from the dataframe using the .dropNA() method that have more than 3 missing values. Use as many cells as you need and ensure you save the results back to your dataframe df. CourseNana.COM

Get in Touch with Our Experts

WeChat (微信)

Last: Computer Skills and Sustainability: MATLAB Assignment: Matrix Operation

Next: A5: Qt Simon Game

Data Mining代写,Data Analysis代写,Data Preprocessing代写,Data Exploration代写,Data Cleaning代写,Python代写,Craft Beer Bar Sales Kaggle dataset代写,Outlier代写,Canada代写,Carleton University代写,Data Mining代编,Data Analysis代编,Data Preprocessing代编,Data Exploration代编,Data Cleaning代编,Python代编,Craft Beer Bar Sales Kaggle dataset代编,Outlier代编,Canada代编,Carleton University代编,Data Mining代考,Data Analysis代考,Data Preprocessing代考,Data Exploration代考,Data Cleaning代考,Python代考,Craft Beer Bar Sales Kaggle dataset代考,Outlier代考,Canada代考,Carleton University代考,Data Mininghelp,Data Analysishelp,Data Preprocessinghelp,Data Explorationhelp,Data Cleaninghelp,Pythonhelp,Craft Beer Bar Sales Kaggle datasethelp,Outlierhelp,Canadahelp,Carleton Universityhelp,Data Mining作业代写,Data Analysis作业代写,Data Preprocessing作业代写,Data Exploration作业代写,Data Cleaning作业代写,Python作业代写,Craft Beer Bar Sales Kaggle dataset作业代写,Outlier作业代写,Canada作业代写,Carleton University作业代写,Data Mining编程代写,Data Analysis编程代写,Data Preprocessing编程代写,Data Exploration编程代写,Data Cleaning编程代写,Python编程代写,Craft Beer Bar Sales Kaggle dataset编程代写,Outlier编程代写,Canada编程代写,Carleton University编程代写,Data Miningprogramming help,Data Analysisprogramming help,Data Preprocessingprogramming help,Data Explorationprogramming help,Data Cleaningprogramming help,Pythonprogramming help,Craft Beer Bar Sales Kaggle datasetprogramming help,Outlierprogramming help,Canadaprogramming help,Carleton Universityprogramming help,Data Miningassignment help,Data Analysisassignment help,Data Preprocessingassignment help,Data Explorationassignment help,Data Cleaningassignment help,Pythonassignment help,Craft Beer Bar Sales Kaggle datasetassignment help,Outlierassignment help,Canadaassignment help,Carleton Universityassignment help,Data Miningsolution,Data Analysissolution,Data Preprocessingsolution,Data Explorationsolution,Data Cleaningsolution,Pythonsolution,Craft Beer Bar Sales Kaggle datasetsolution,Outliersolution,Canadasolution,Carleton Universitysolution,

Assignment 2: Part 2. Data Cleaning - Craft Beer Bar Sales Kaggle dataset

Assignment 2: Part 2. Data Cleaning

Directions

The data set should contain the following attributes

Please type your name here:

Please type your student number here:

Get in Touch with Our Experts