Question 4: Classification [25 Marks]
q4_zap = pd.read_csv('./data/electricity.csv')
In this section, you will work on the dataframe provided above to implement a KNN Classifier while finding the best k value for your choice of features, and implement a Decision Tree Classifier. The data and scenario is as follows: CourseNana.COM
Note: You may use Sklearn or any equivlant packages provided they are within the standard Anaconda installation CourseNana.COM
You have been provided a dataset for the time period 1996-1998 of the subtle price adjustments of electricity in NSW over time. Electricity was transferred to and from Victoria based on NSW's supply and demand to alleviate fluctuations, and as such NSW prices can be affected by both states' supply and demand. Each row represents a period of 30 minutes where the price and demand in NSW and Victoria is shown. Your task is to predict whether the NSW price has increased or decreased based on supply and demand, which is denoted by the 'class' attribute. CourseNana.COM
The data is as follows: CourseNana.COM
Name | Description |
---|---|
nswprice | The price of electricity in NSW |
nswdemand | The demand of electricity in NSW |
vicprice | The price of electricity in VIC |
vicdemand | The demand of electricity in VIC |
transfer | The amount of electricity to be transferrerd between both states |
class | The classification (outcome) of the time period {1: Increase in Price, 0: Decrease in Price} |
Note: All values (except for class) have been normalised CourseNana.COM
Your tasks are: CourseNana.COM
Perform the following actions (they do not have to be performed in order): CourseNana.COM
- Choose two columns (other than
class
) to be your features to predict the outcome- The two features with the highest correlation are to be chosen.
- Split the data into testing and training datasets
- Your testing dataset should be 18% of the size of the original dataset[5 marks]
- Choose two columns (other than
- Implement a program to determine the best k value for your chosen features based on the model's accuracy. Check the k values between 15 and 45 inclusive. Provide the highest and lowest accuracy scores and respective k values of the scores.[10 marks]
- Using the same features as you chose above, implement a Decision Tree Classifier that uses
entropy
as the criterion for decisions, and provide the accuracy score.[7 marks]
- Compare the accuracy scores of your best k KNN model and your Decision Tree Classifier. Which performs better based on the data? Which would be better to use in a real world scenario?