1. Homepage
  2. Programming
  3. ROB313: Introduction to Learning from Data - Assignment 1: KNN Algorithm

ROB313: Introduction to Learning from Data - Assignment 1: KNN Algorithm

Engage in a Conversation
CAUniversity of TorontoUtorontoROB313Introduction to Learning from DataKNNRegressionClassificationPython

ROB313: Introduction to Learning from Data CourseNana.COM

Assignment 1 (12.5 pts)

Due February 9, 2023, 23:59 EST CourseNana.COM

Read the PythonSetup.pdf document (posted on Quercus) before beginning this assignment. CourseNana.COM

Q1) 4pts Implement the k-NN algorithm for regression with two different distance metrics (ℓ2 and ℓ1 ). Use 5-fold cross-validation1 to estimate k, and the preferred distance metric using a root-mean-square error (RMSE) loss. Briefly describe your search procedure to estimate k and the distance metric, making sure you use your training, validation, and test datasets correctly. Compute nearest neighbours using a brute-force approach. Apply your algorithm to all regression datasets (use n train=1000, d=2 for rosenbrock). For each dataset, report the estimated value of k and the preferred distance metric, and report the cross-validation RMSE and test RMSE with these settings. Format these results in a table. CourseNana.COM

Plot the cross-validation prediction curves (merging the predictions from all splits) for the one-dimensional regression dataset mauna loa at several values of k for the ℓ2 distance metric. In separate figures, plot the prediction on the test set, as well as the cross-validation loss across k for this model. Discuss your results. CourseNana.COM

Q2) 2pts Test the performance of your k-NN regression algorithm when a k-d tree data structure is used to compute the nearest neighbours2 for multiple test points simultaneously. Conduct performance studies by making predictions on the test set of the rosenbrock regression dataset with n train=5000. Report the run-time for varying values of d in a single plot. Use the ℓ2 distance metric and k = 5. Comment on the relative performance of the k-d tree algorithm versus the brute-force approach implemented in your answer to Q1. Use the time.time function to measure elapsed wall-clock time for your studies. CourseNana.COM

Q3) 2pts Implement the k-NN algorithm for classification with two different distance metrics (ℓ2 and ℓ1 ). Estimate k and the preferred distance metric by maximizing the accuracy (fraction of correct predictions) on the validation split. Briefly describe your search procedure to estimate k and the distance metric. Compute nearest neighbours using a k-d tree data structure, as you had done in Q2. Apply your algorithm to all classification datasets. For each dataset, report the estimated value of k and the preferred distance metric, and report the validation accuracy and test accuracy with these settings. Format these results in a table. CourseNana.COM

Q4) 4.5pts Implement a linear regression algorithm that minimizes the least-squares loss function (using the singular value decomposition). Apply to all datasets (regression and classification). Use n train=1000, d=2 for rosenbrock. Use both the training and validation sets to predict on the test set, and format your results in a table (present test RMSE for regression, and test accuracy for classification). Compare the performance of this method to the k-NN algorithm. CourseNana.COM

Submission guidelines: Submit an electronic copy of your report (maximum 10 pages in at least 10pt font) in pdf format and documented Python scripts. You should include a file named “README” outlining how the scripts should be run. Upload both your report in pdf format and a single tar or zip file containing your code and README to Quercus. You are expected to verify the integrity of your tar/zip file before uploading. Do not include (or modify) the supplied *.npz data files or the data utils.py module in your submission. The report must contain • Objectives of the assignment • A brief description of the structure of your code, and strategies employed • Relevant figures, tables, and discussion Do not use scikit-learn for this assignment, except where explicitly specified. Also, do not use the scipy.spatial module in this assignment. The intention is that you implement the simple algorithms required from scratch. Also, for reproducibility, always set a seed for any random number generator used in your code. For example, you can set the seed in numpy using numpy.random.seed CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
CA代写,University of Toronto代写,Utoronto代写,ROB313代写,Introduction to Learning from Data代写,KNN代写,Regression代写,Classification代写,Python代写,CA代编,University of Toronto代编,Utoronto代编,ROB313代编,Introduction to Learning from Data代编,KNN代编,Regression代编,Classification代编,Python代编,CA代考,University of Toronto代考,Utoronto代考,ROB313代考,Introduction to Learning from Data代考,KNN代考,Regression代考,Classification代考,Python代考,CAhelp,University of Torontohelp,Utorontohelp,ROB313help,Introduction to Learning from Datahelp,KNNhelp,Regressionhelp,Classificationhelp,Pythonhelp,CA作业代写,University of Toronto作业代写,Utoronto作业代写,ROB313作业代写,Introduction to Learning from Data作业代写,KNN作业代写,Regression作业代写,Classification作业代写,Python作业代写,CA编程代写,University of Toronto编程代写,Utoronto编程代写,ROB313编程代写,Introduction to Learning from Data编程代写,KNN编程代写,Regression编程代写,Classification编程代写,Python编程代写,CAprogramming help,University of Torontoprogramming help,Utorontoprogramming help,ROB313programming help,Introduction to Learning from Dataprogramming help,KNNprogramming help,Regressionprogramming help,Classificationprogramming help,Pythonprogramming help,CAassignment help,University of Torontoassignment help,Utorontoassignment help,ROB313assignment help,Introduction to Learning from Dataassignment help,KNNassignment help,Regressionassignment help,Classificationassignment help,Pythonassignment help,CAsolution,University of Torontosolution,Utorontosolution,ROB313solution,Introduction to Learning from Datasolution,KNNsolution,Regressionsolution,Classificationsolution,Pythonsolution,