COMP2420/COMP6420 - 2020 Sample - Question 4 - KNN Classification

This question has been solved

Engage in a Conversation

Question 4: Classification [25 Marks]

q4_zap = pd.read_csv('./data/electricity.csv')

In this section, you will work on the dataframe provided above to implement a KNN Classifier while finding the best k value for your choice of features, and implement a Decision Tree Classifier. The data and scenario is as follows: CourseNana.COM

Note: You may use Sklearn or any equivlant packages provided they are within the standard Anaconda installation CourseNana.COM

You have been provided a dataset for the time period 1996-1998 of the subtle price adjustments of electricity in NSW over time. Electricity was transferred to and from Victoria based on NSW's supply and demand to alleviate fluctuations, and as such NSW prices can be affected by both states' supply and demand. Each row represents a period of 30 minutes where the price and demand in NSW and Victoria is shown. Your task is to predict whether the NSW price has increased or decreased based on supply and demand, which is denoted by the 'class' attribute. CourseNana.COM

The data is as follows: CourseNana.COM

Name	Description
nswprice	The price of electricity in NSW
nswdemand	The demand of electricity in NSW
vicprice	The price of electricity in VIC
vicdemand	The demand of electricity in VIC
transfer	The amount of electricity to be transferrerd between both states
class	The classification (outcome) of the time period {1: Increase in Price, 0: Decrease in Price}

Note: All values (except for class) have been normalised CourseNana.COM

Your tasks are: CourseNana.COM

Perform the following actions (they do not have to be performed in order): CourseNana.COM
- Choose two columns (other than class) to be your features to predict the outcome
  - The two features with the highest correlation are to be chosen.
- Split the data into testing and training datasets
  - Your testing dataset should be 18% of the size of the original dataset[5 marks]

Implement a program to determine the best k value for your chosen features based on the model's accuracy. Check the k values between 15 and 45 inclusive. Provide the highest and lowest accuracy scores and respective k values of the scores.[10 marks]

Using the same features as you chose above, implement a Decision Tree Classifier that uses entropy as the criterion for decisions, and provide the accuracy score.[7 marks]

Compare the accuracy scores of your best k KNN model and your Decision Tree Classifier. Which performs better based on the data? Which would be better to use in a real world scenario?

Get the Solution to This Question

WeChat (微信)

Last: COMP2420/COMP6420 - 2020 Sample - Question 3: Regression and Machine Learning

Next: COMP2420/COMP6420 - 2020 Sample - Question 5 - K-Means Clustering