HW4: implementing item-based CF with cosine
First, run recommenderDemo.ipynb and be familar with the code and data. Second, implement item-based CF with cosine
import gzip
from collections import defaultdict
import scipy
import scipy.optimize
import numpy
import random
- load the data, and convert integer-valued fields as we go. Note that here we use the same "Musical Instruments" dataset. Download the date from here: https://web.cs.wpi.edu/~kmlee/cs547/amazon_reviews_us_Musical_Instruments_v1_00_small.tsv.gz The dataset contains 20K user-item reviews.
# From https://web.cs.wpi.edu/~kmlee/cs547/amazon_reviews_us_Musical_Instruments_v1_00_small.tsv.gz
#----------------------------------------------
# Your code starts here
# Please add comments or text cells in between to explain the general idea of each block of the code.
# Please feel free to add more cells below this cell if necessary
- now store the loaded data into a matrix -- you may use numpy array/matrix to store the untility matrix or use sparse matrix (advanced approach)
#----------------------------------------------
# Your code starts here
# Please add comments or text cells in between to explain the general idea of each block of the code.
# Please feel free to add more cells below this cell if necessary
- Implement cosine function and rating prediction function by using the cosine function. If a hasn't rated any similar items before, then return ratingMean (i.e., global rating mean). Refer to predictRating() in hw4jaccard.ipynb
#----------------------------------------------
# Your code starts here
# Please add comments or text cells in between to explain the general idea of each block of the code.
# Please feel free to add more cells below this cell if necessary
- Measure and report MSE (don't need to change the below code)
def MSE(predictions, labels):
differences = [(x-y)**2 for x,y in zip(predictions,labels)]
return sum(differences) / len(differences)
cfPredictions = [predictRatingCosine(d['customer_id'], d['product_id']) for d in dataset]
print(MSE(cfPredictions, labels))
(optional/bonus task: you will get additional 25 points) download https://web.cs.wpi.edu/~kmlee/cs547/amazon_reviews_us_Musical_Instruments_v1_00_large.tsv.gz this dataset contains over 900K user-item reviews. repeat the above process (i.e., meauring MSE with cosine). report the MSE and compare it with MSE of alwaysPredictMean. This optional task would require better data structure and implementation.
#----------------------------------------------
# Your code starts here
# Please add comments or text cells in between to explain the general idea of each block of the code.
# Please feel free to add more cells below this cell if necessary
*-----------------
Done
All set!
What do you need to submit?
- hw4.ipynb Notebook File: Save this Jupyter notebook with all output, and find the notebook file in your folder (for example, "filename.ipynb"). This is the file you need to submit.
How to submit: Please submit through canvas.wpi.edu