1. Homepage
  2. Programming
  3. Assignment 2: Part 2. Data Cleaning - Craft Beer Bar Sales Kaggle dataset

Assignment 2: Part 2. Data Cleaning - Craft Beer Bar Sales Kaggle dataset

Engage in a Conversation
Data MiningData AnalysisData PreprocessingData ExplorationData CleaningPythonCraft Beer Bar Sales Kaggle datasetOutlierCanadaCarleton University

Assignment 2: Part 2. Data Cleaning

The purpose of this assignment is to clean a data set so that it can be analyzed. The dataset is a modified version of the Craft Beer Bar Sales Kaggle dataset. The original dataset is available here: https://www.kaggle.com/datasets/podsyp/sales-in-craft-beer-bar. CourseNana.COM

Directions

  • rename this file as A2_W23_DataCleaning_CUID.ipynb where CUID Is your Carleton University Identification
  • pay attention to where your file is saved as you will need to upload it to Brightspace
  • follow the instructions below
  • when completed, upload your .ipynb file to Brightspace.

    The data set should contain the following attributes

  • Product_code: unique product identifier
  • Vendor_code: Manufacturer's name
  • Name: SKU
  • Retail_price: Catalog price
  • Country_of_Origin: Manufacturer country
  • Size: sale item size
  • Vendor: Manufacturer's name
  • ABV: alcohol by volume
  • Product_type: the type of product

    Please type your name here:

    Please type your student number here:

    import pandas as pd
    import numpy as np

    Step 1a. (1 pt) Read the CraftBeerV1.csv file into a dataframe named df and list its columns. This can be done in the same cell. CourseNana.COM

Step 1b. (2 pts) Remove column(s) from your dataframe that do not exist in the data dictionary above. Use as many cells as you need and ensure you save the results back to your dataframe df. Then, display the data frame. This can be done in the same cell. CourseNana.COM

Step 1c. (2 pts) Determine the number of unique values for each column in the dataframe. (1 pt) If an attribute has only a single unique value, remove it from the dataframe (1 pt). Show all steps. Use as many cells as you need and ensure you save the results back to your dataframe df. CourseNana.COM

Step 1d. (2 pts) Compare the Vendor_code column with the Vendor column (1 pt). If they contain the same data, delete one of the columns (1 pt). Show all steps (ie. prove that the two columns do/do not contain the same data). Use as many cells as you need and ensure you save the results back to your dataframe df. CourseNana.COM

Step 1.e. (3 pts) Determine whether there are any duplicate rows (1 pt). If there are duplicates, display them to the screen (1 pt). If duplicate rows exist, remove them from the dataset and reset the index of the dataframe (1 pt). Show all steps. Use as many cells as you need and ensure you save the results (if there any) back to your dataframe df. CourseNana.COM

Step 2a. (2 pts) Display basic statistical information for the dataframe's numeric attributes. Ensure that the numeric attributes are shown vertically on the left and the various statistical attributes are columns. CourseNana.COM

Step 2b. (4 pts) Look through the min column to determine if there are any 0s. If there are, think about whether a 0 is a valid value for the attribute in question. If you think it's an error, first determine how many occurrences of 0 exist for the attribute in the dataframe (1 pt). Next display only the rows that have a 0 for the attribute in question (1 pt). Assign Nan using the numpy library to all occurrences of 0 for the attribute under question (1 pt). Finally, display value counts for every unique value of the attribute in question ensuring that NaN gets counted (1 pt). CourseNana.COM

Step 2c. (3 pts) Create box plots for Size, and ABV. CourseNana.COM

Step 2d. (14 pts) Visually inspect each boxplot and determine if there are any statistical outliers for each of the two attributes. You can assume a statistical outlier is any data point that exceeds the upper or lower whisker. For each attribute that contains outlier(s): CourseNana.COM

  • calculate and display the IQR
  • calculate and display the necessary min and/or max values that are the thresholds for determining outliers (use the 75th + 1.5 IQR and 25th - 1.5 IQR formula discussed in class)
  • use the above information to display the row(s) that contain the outlier

Step 2e. (6 pts). For each of the attribute(s) that had outliers, would it make sense to impute values to replace the outliers? Why or why not? Answer in the cell below. Be sure to identify which attribute you are referring to in answering the questions. CourseNana.COM

Answer 2e here: CourseNana.COM

Step 3a. (3 pts) Determine how many missing values there are in each column and how many rows have missing values. Show all your steps. CourseNana.COM

Step 3b. (2 pts) Remove rows from the dataframe using the .dropNA() method that have more than 3 missing values. Use as many cells as you need and ensure you save the results back to your dataframe df. CourseNana.COM

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
Data Mining代写,Data Analysis代写,Data Preprocessing代写,Data Exploration代写,Data Cleaning代写,Python代写,Craft Beer Bar Sales Kaggle dataset代写,Outlier代写,Canada代写,Carleton University代写,Data Mining代编,Data Analysis代编,Data Preprocessing代编,Data Exploration代编,Data Cleaning代编,Python代编,Craft Beer Bar Sales Kaggle dataset代编,Outlier代编,Canada代编,Carleton University代编,Data Mining代考,Data Analysis代考,Data Preprocessing代考,Data Exploration代考,Data Cleaning代考,Python代考,Craft Beer Bar Sales Kaggle dataset代考,Outlier代考,Canada代考,Carleton University代考,Data Mininghelp,Data Analysishelp,Data Preprocessinghelp,Data Explorationhelp,Data Cleaninghelp,Pythonhelp,Craft Beer Bar Sales Kaggle datasethelp,Outlierhelp,Canadahelp,Carleton Universityhelp,Data Mining作业代写,Data Analysis作业代写,Data Preprocessing作业代写,Data Exploration作业代写,Data Cleaning作业代写,Python作业代写,Craft Beer Bar Sales Kaggle dataset作业代写,Outlier作业代写,Canada作业代写,Carleton University作业代写,Data Mining编程代写,Data Analysis编程代写,Data Preprocessing编程代写,Data Exploration编程代写,Data Cleaning编程代写,Python编程代写,Craft Beer Bar Sales Kaggle dataset编程代写,Outlier编程代写,Canada编程代写,Carleton University编程代写,Data Miningprogramming help,Data Analysisprogramming help,Data Preprocessingprogramming help,Data Explorationprogramming help,Data Cleaningprogramming help,Pythonprogramming help,Craft Beer Bar Sales Kaggle datasetprogramming help,Outlierprogramming help,Canadaprogramming help,Carleton Universityprogramming help,Data Miningassignment help,Data Analysisassignment help,Data Preprocessingassignment help,Data Explorationassignment help,Data Cleaningassignment help,Pythonassignment help,Craft Beer Bar Sales Kaggle datasetassignment help,Outlierassignment help,Canadaassignment help,Carleton Universityassignment help,Data Miningsolution,Data Analysissolution,Data Preprocessingsolution,Data Explorationsolution,Data Cleaningsolution,Pythonsolution,Craft Beer Bar Sales Kaggle datasetsolution,Outliersolution,Canadasolution,Carleton Universitysolution,