1. Homepage
  2. Programming
  3. CSE 158, CSE 258, DSC 256, MGTA 461 Web Mining and Recommender Systems, Fall 2023 : Homework 3 - Play prediction

CSE 158, CSE 258, DSC 256, MGTA 461 Web Mining and Recommender Systems, Fall 2023 : Homework 3 - Play prediction

Engage in a Conversation
UCSDCSE 158CSE 258DSC 256MGTA 461Web Mining and Recommender SystemsPlay predictionJaccard

CSE 158/258, DSC 256, MGTA 461, Fall 2023: Homework 3 Instructions CourseNana.COM

Please submit your solution by Monday, Nov 13. Submissions should be made on gradescope. Please complete homework individually. CourseNana.COM

These homework exercises are intended to help you get started on potential solutions to Assignment 1. We’ll work directly with the Assignment 1 dataset to complete them, which is available from:  CourseNana.COM

You’ll probably want to implement your solution by modifying the baseline code provided in the assignment directory. CourseNana.COM

You should submit two files: CourseNana.COM

answers hw3.txt should contain a python dictionary containing your answers to each question. Its format should be like the following: CourseNana.COM

           { "Q1": 1.5, "Q2": [3,5,17,8], "Q2": "b", (etc.) }

The provided code stub demonstrates how to prepare your answers and includes an answer template for each question. CourseNana.COM

homework3.py A python file containing working code for your solutions. The autograder will not execute your code; this file is required so that we can assign partial grades in the event of incorrect solutions, check for plagiarism, etc. Your solution should clearly document which sections correspond to each question and answer. We may occasionally run code to confirm that your outputs match submitted answers, so please ensure that your code generates the submitted answers. CourseNana.COM

You may build your solution on top of the provided stub:
Homework 3 stub : https://cseweb.ucsd.edu/classes/fa23/cse258-a/stubs/ CourseNana.COM

Each question is worth 1 mark. CourseNana.COM

Play prediction CourseNana.COM

Since we don’t have access to the test labels, we’ll need to simulate validation/test sets of our own. So, let’s split the training data (‘train.json.gz’) as follows: CourseNana.COM

(1) Reviews 1-165,000 for training
(2) Reviews 165,001-175,000 for validation
(3) Upload to gradescope for testing only when you have a good model on the validation set.
CourseNana.COM

  1. Although we have built a validation set, it only consists of positive samples. For this task we also need examples of user/item pairs that weren’t played. For each entry (user,game) in the validation set, sample a negative entry by randomly choosing a game that user hasn’t played.1 Evaluate the performance (accuracy) of the baseline model on the validation set you have built (1 mark). CourseNana.COM

  2. The existing ‘played prediction’ baseline just returns True if the item in question is ‘popular,’ using a threshold of the 50th percentile of popularity (totalPlayed/2). Assuming that the ‘non-played’ test examples are a random sample of user-game pairs, this threshold may not be the best one. See if you can find a better threshold and report its performance on your validation set (1 mark). CourseNana.COM

  3. A stronger baseline than the one provided might make use of the Jaccard similarity (or another similarity metric).2 Given a pair (u,g) in the validation set, consider all training items gthat user u has played. For each, compute the Jaccard similarity between g and g, i.e., users (in the training set) who have played g and users who have played g. Predict as ‘played’ if the maximum of these Jaccard similarities exceeds a threshold (you may choose the threshold that works best). Report the performance on your validation set (1 mark). CourseNana.COM

1This is how I constructed the test set; a good solution should mimic this procedure as closely as possible so that your gradescope performance is close to their validation performance. CourseNana.COM

2Depending on the dataset, in practice this baseline is not always stronger than the one from Question 2. 1 CourseNana.COM

CourseNana.COM

  1. Improve the above predictor by incorporating both a Jaccard-based threshold and a popularity based threshold. Report the performance on your validation set.3 CourseNana.COM

  2. To run our model on the test set, we’ll have to use the files ‘pairs Played.txt’ to find the reviewerID/itemID pairs about which we have to make predictions. Using that data, run the above model and upload your solution to the Assignment 1 gradescope. If you’ve already uploaded a better solution to gradescope, that’s fine too! CourseNana.COM

Time played prediction CourseNana.COM

Let’s start by building our training/validation sets much as we did for the first task. This time building a validation set is more straightforward: you can simply use part of the data for validation, and do not need to randomly sample non-played users/games. CourseNana.COM

Note that you should use the time transformed field, which is computed as log2(time played + 1). This is the quantity we are trying to predict. CourseNana.COM

  1. Fit a predictor of the form
    by fitting the mean and the two bias terms as described in the lecture notes. Use a regularization
    CourseNana.COM

    parameter of λ = 1. Report the MSE on the validation set. CourseNana.COM

  2. Report the user and game IDs that have the largest and smallest values of β. CourseNana.COM

  3. Find a better value of λ using your validation set. Report the value you chose, its MSE, and upload your solution to the Assignment 1 gradescope. CourseNana.COM

time(user, item) α + βuser + βitem, CourseNana.COM

3This could be further improved by treating the two values as features in a classifier — the classifier would then determine the thresholds for you! CourseNana.COM

Get in Touch with Our Experts

QQ QQ
Wechat WeChat
Whatsapp Whatsapp
UCSD代写,CSE 158代写,CSE 258代写,DSC 256代写,MGTA 461代写,Web Mining and Recommender Systems代写,Play prediction代写,Jaccard代写,UCSD代编,CSE 158代编,CSE 258代编,DSC 256代编,MGTA 461代编,Web Mining and Recommender Systems代编,Play prediction代编,Jaccard代编,UCSD代考,CSE 158代考,CSE 258代考,DSC 256代考,MGTA 461代考,Web Mining and Recommender Systems代考,Play prediction代考,Jaccard代考,UCSDhelp,CSE 158help,CSE 258help,DSC 256help,MGTA 461help,Web Mining and Recommender Systemshelp,Play predictionhelp,Jaccardhelp,UCSD作业代写,CSE 158作业代写,CSE 258作业代写,DSC 256作业代写,MGTA 461作业代写,Web Mining and Recommender Systems作业代写,Play prediction作业代写,Jaccard作业代写,UCSD编程代写,CSE 158编程代写,CSE 258编程代写,DSC 256编程代写,MGTA 461编程代写,Web Mining and Recommender Systems编程代写,Play prediction编程代写,Jaccard编程代写,UCSDprogramming help,CSE 158programming help,CSE 258programming help,DSC 256programming help,MGTA 461programming help,Web Mining and Recommender Systemsprogramming help,Play predictionprogramming help,Jaccardprogramming help,UCSDassignment help,CSE 158assignment help,CSE 258assignment help,DSC 256assignment help,MGTA 461assignment help,Web Mining and Recommender Systemsassignment help,Play predictionassignment help,Jaccardassignment help,UCSDsolution,CSE 158solution,CSE 258solution,DSC 256solution,MGTA 461solution,Web Mining and Recommender Systemssolution,Play predictionsolution,Jaccardsolution,