Homepage
Programming
COMP SCI 3317 Using Machine Learning Tools - Assignment 1: Predicting bike rental demand

COMP SCI 3317 Using Machine Learning Tools - Assignment 1: Predicting bike rental demand

Engage in a Conversation

Using Machine Learning Tools Assignment 1

Overview

In this assignment, you will apply some popular machine learning techniques to the problem of predicting bike rental demand. A data set has been provided containing records of bike rentals in Seoul, collected during 2017-18. CourseNana.COM

General instructions

This assignment is divided into several tasks. Use the spaces provided in this notebook to answer the questions posed in each task. Note that some questions require writing a small amount of code and some require graphical results. It is your responsibility to make sure your responses are clearly labelled and your code has been fully executed (with the correct results displayed) before submission! CourseNana.COM

Do not manually edit the data set file we have provided! For marking purposes, it's important that your code is written to run correctly on the original data file. CourseNana.COM

When creating graphical output, label is clearly, with appropriate titles, xlabels and ylabels, as appropriate. Chapter 2 of the reference book is based on a similar workflow to this prac, so you may look there for some further background and ideas. You can also use any other general resources on the internet that are relevant although do not use ones which directly relate to these questions with this dataset (which would normally only be found in someone else's assignment answers). If you take a large portion of code or text from the internet then you should reference where this was taken from, but we do not expect any references for small pieces of code, such as from documentation, blogs or tutorials. Taking, and adapting, small portions of code is expected and is common practice when solving real problems. CourseNana.COM

The following code imports some of the essential libraries that you will need. You should not need to modify it, but you are expected to import other libraries as needed. CourseNana.COM

STEP1: Load the data set from the csv file (SeoulBikeData.csv) into a DataFrame, and summarise it with at least two appropriate pandas functions Download the data set from MyUni using the link provided on the assignment page. A paper that describes one related version of this dataset is: Sathishkumar V E, Jangwoo Park, and Yongyun Cho. 'Using data mining techniques for bike sharing demand prediction in metropolitan city.' Computer Communications, Vol.153, pp.353-366, March, 2020. Feel free to look at this if you want more information about the dataset. CourseNana.COM

The data is stored in a CSV (comma separated variable) file and contains the following information CourseNana.COM

Date: year-month-day
Rented Bike Count: Count of bikes rented at each hour
Hour: Hour of the day
Temperature: Temperature in Celsius
Humidity: %
Windspeed: m/s
Visibility: 10m
Dew point temperature: Celsius
Solar radiation: MJ/m2
Rainfall: mm
Snowfall: cm
Seasons: Winter, Spring, Summer, Autumn
Holiday: Holiday/No holiday
Functional Day: NoFunc(Non Functional Hours), Fun(Functional hours)

Load the data set from the csv file into a DataFrame, and summarise it with at least two appropriate pandas functions. CourseNana.COM

STEP2: To get a feeling for the data it is a good idea to do some form of simple visualisation. Display a set of histograms for the features as they are right now, prior to any cleaning steps. CourseNana.COM

STEP3: The "Functioning Day" feature records whether the bike rental was open for business on that day. For this assignment we are only interested in predicting demand on days when the business is open, so remove rows from the DataFrame where the business is closed. After doing this, delete the Functioning Day feature from the DataFrame and verify that this worked. CourseNana.COM

The goal is to predict bike rental demand using historical data. To achieve this, you will use regression techniques with "Bike Rental Count" as the target feature for this prediction, but for this, it is important that all other features in the data are numerical. STEP4: Two of the features in the data, "Holiday" and "Season", need to be converted to numerical format. Write code to convert the "Holiday" feature to 0 or 1 from its current format. For the "Season" feature, a better solution would be to add 4 new columns, labeled as "Winter", "Spring", "Summer", and "Autumn". Each of these columns should store a 0 or 1, depending on the corresponding season in each row. CourseNana.COM

STEP5 It is known that bike rentals depend strongly on whether it's a weekday or a weekend. Replace the Date feature with a Weekday feature that stores 0 or 1 depending on whether the date represents a weekend or weekday. To do this, use the function date_is_weekday below, which returns 1 if it is a weekday and 0 if it is a weekend. Apply the function to the Date column in your DataFrame (you can use DataFrame.transform to apply it). CourseNana.COM

STEP6 Convert all the remaining data to numerical format, with any non-numerical entries set to NaN. CourseNana.COM

STEP7 Use graphical methods to display your data and identify problematic entries. Set any problematic values in the numerical data to np.nan and check that this has worked. Once this is done, specify a sklearn pipeline that will perform imputation to replace problematic entries (nan values) with an appropriate median value and any other pre-processing that you think should be used. Just specify the pipeline - do not run it now. CourseNana.COM

STEP8: Generate a pre-processed version of the entire dataset by applying the pipeline defined in STEP7. Then, create separate scatter plots for each feature against the target variable "Bike Rental Count" to visualize the strength of the relationship. Additionally, calculate the correlation of each feature with the target using either the pandas function corr() or numpy corrcoef() and find the 3 attributes that are the most correlated with bike rentals. CourseNana.COM

STEP9: Divide the data into training and test sets using an appropriate splitting method such that 20% of the data is kept for testing. Create a pipeline that includes the linear regression model in addition to the pipeline defined in STEP7. Fit the pipeline to the training set and calculate the rmse of the fit to evaluate its performance. As a comparison, compute the rmse that would be obtained by predicting the mean value of bike rentals for all training examples. Type your answer here, replacing this text. CourseNana.COM

STEP10: Fit a Kernel Ridge regression model (imported from sklearn.kernel_ridge) to the X_train data from STEP9. Build a new pipeline that includes the Kernel Ridge regression model in addition to the pipeline defined in STEP7, and fit it to the training data using default settings. Generate a scatter plot of the predicted values against the actual values for the training data, and calculate the RMSE of the fit to the training data. CourseNana.COM

STEP11: fit a Support Vector Regression (from sklearn.svm import SVR). As you did for STEP10, create a new pipeline using the pipelinr from STEP7 and this model and fit it to your training data, using the default settings. Again, generate a scatter plot of the predicted values against the actual values for the training data, and calculate the RMSE of the fit to the training data. CourseNana.COM

STEP12: Perform a 10 fold cross validation for each of the three model (LinearRegression,KernelRidge,SVR). This splits the training set (that we've used above) into 10 equal size subsets, and uses each in turn as the validation set while training a model with the other 9. You should therefore have 10 rmse values for each cross validation run. Find the mean and standard deviation of the rmse values obtained for each model for the validation splits. CourseNana.COM

STEP13: Both the Kernel Ridge Regression and Support Vector Regression have hyperparameters that can be adjusted to suit the problem. Use grid search to systematically compare the generalisation performance (rmse) obtained with different hyperparameter settings (still with 10-fold CV). Use the sklearn function GridSearchCV to do this. CourseNana.COM

For KernelRidge, vary the hyperparameter alpha. (note, if you are using KernelRidge as the last step in a pipeline, alpha is refered to as kernelridge__alpha) CourseNana.COM

For SVR, vary the hyperparameter C. (note, if you are using SVR as the last step in a pipeline, C is refered to as SVR__C) CourseNana.COM

Find the hyperparameter setting for each medel. Finally, train and apply both models, with the best hyperparameter settings, to the test set and report the performance as rmse. CourseNana.COM

Get in Touch with Our Experts

WeChat (微信)

Last: Computer Science 2211b Software Tools and Systems Programming - ASSIGNMENT 4: Input Search

Next: COMPSCI4074 Text as Data - Coursework: Text Clustering and Classification on Reddit Datasets

Australia代写,The University of Adelaide代写,COMP SCI 3317代写,COMPSCI3317代写,Using Machine Learning Tools代写,Python代写,Data Mining代写,Kernel Ridge regression model代写,Support Vector Regression代写,LinearRegression代写,Australia代编,The University of Adelaide代编,COMP SCI 3317代编,COMPSCI3317代编,Using Machine Learning Tools代编,Python代编,Data Mining代编,Kernel Ridge regression model代编,Support Vector Regression代编,LinearRegression代编,Australia代考,The University of Adelaide代考,COMP SCI 3317代考,COMPSCI3317代考,Using Machine Learning Tools代考,Python代考,Data Mining代考,Kernel Ridge regression model代考,Support Vector Regression代考,LinearRegression代考,Australiahelp,The University of Adelaidehelp,COMP SCI 3317help,COMPSCI3317help,Using Machine Learning Toolshelp,Pythonhelp,Data Mininghelp,Kernel Ridge regression modelhelp,Support Vector Regressionhelp,LinearRegressionhelp,Australia作业代写,The University of Adelaide作业代写,COMP SCI 3317作业代写,COMPSCI3317作业代写,Using Machine Learning Tools作业代写,Python作业代写,Data Mining作业代写,Kernel Ridge regression model作业代写,Support Vector Regression作业代写,LinearRegression作业代写,Australia编程代写,The University of Adelaide编程代写,COMP SCI 3317编程代写,COMPSCI3317编程代写,Using Machine Learning Tools编程代写,Python编程代写,Data Mining编程代写,Kernel Ridge regression model编程代写,Support Vector Regression编程代写,LinearRegression编程代写,Australiaprogramming help,The University of Adelaideprogramming help,COMP SCI 3317programming help,COMPSCI3317programming help,Using Machine Learning Toolsprogramming help,Pythonprogramming help,Data Miningprogramming help,Kernel Ridge regression modelprogramming help,Support Vector Regressionprogramming help,LinearRegressionprogramming help,Australiaassignment help,The University of Adelaideassignment help,COMP SCI 3317assignment help,COMPSCI3317assignment help,Using Machine Learning Toolsassignment help,Pythonassignment help,Data Miningassignment help,Kernel Ridge regression modelassignment help,Support Vector Regressionassignment help,LinearRegressionassignment help,Australiasolution,The University of Adelaidesolution,COMP SCI 3317solution,COMPSCI3317solution,Using Machine Learning Toolssolution,Pythonsolution,Data Miningsolution,Kernel Ridge regression modelsolution,Support Vector Regressionsolution,LinearRegressionsolution,