1. Homepage
  2. Programming
  3. Midterm: Recommender System for Movies

Midterm: Recommender System for Movies

Engage in a Conversation
PythonRecommender SystemRecommendation

Midterm: Recommender System for Movies

(Note: This midterm assignment will have hidden test cases) CourseNana.COM

In this project, you will implement a recommender system for your classmates, professor and TAs based on the movie survey we have conducted. The movie preference file is at ./data/preference.csv CourseNana.COM

Recommender System

The objective of a Recommender System is to recommend relevant items to users, based on their preference. Recommender systems are prevalent in the digital space. For example, when you go shopping on Amazon, you notice that Amazon is recommending products on the front page before you even type anything in the search box. Similarly, when you go on YouTube, the top bar of Youtube is typically "videos recommended to you." All these features are based on recommmender systems. CourseNana.COM

What item to recommend to which user is arguably the most important business decision in many digital platforms. For instance, YouTube cannot control the videos its users upload to it. It cannot control which videos users like to watch either. Moreoveor, since watching videos is free, YouTube cannot control the behavior of its users by changing the price of its items. It does not have inventory either since each video can be viewed as many times as possible. In this case, what could YouTube control? Or in other words, what differentiates a good video streaming service from a bad one? The answer is its recommender system. CourseNana.COM

Types of Recommender Systems

There are three types of recommender systems. CourseNana.COM

Popularity-based Recommendation

The most obvious system is popularity-based recommendation. In this case, we recommend to a user the most popular items that the user has not previously consumed. In the movie setting, we will recommend the movie that most users have watched and liked. In other words, this system utilizes the "wisdom of the crowd." It usually provides good recommendations for most people. Since it is easy to implement, the popularity-based recommendation system is used as a baseline. Note: this system is not personalized. If two consumers have not watched Movie A, and Movie A is the most popular one, both of them will be recommended Movie A, no matter how different these two consumers are. CourseNana.COM

Content-based Recommendation

This recommender system leverages the data on a customer's historical actions. It first uses available data to identify a set of features that describes an item (for example, for movies, we can use the movie's director, main actor, main actress, genre, etc. to describe the movie). When a user comes in, the system will recommend the movie that is closest, in terms of these features, to the movies that the user has watched and liked. For instance, if a user likes action movies from Nolan the most, this system will recommend another action movie from Nolan that this user has not watched. Note: we will not implement this system in this project since it requires knowledge about supervised learning. We may come back to this topic at the end of this semester. CourseNana.COM

Collaborative Filtering Recommendation

The last type of recommender system is called collaborative filtering. This approach uses the memory from previous users' interactions to compute users' similarities based on items they've interacted (user-based approach) or compute items' similarities based on the users that have interacted with them (item-based approach). CourseNana.COM

A typical example of this approach is User Neighbourhood-based CF, in which the top-N similar users (usually computed using Pearson correlation) for a user are selected first. The items that are liked by these users are then used to identify the best candidate to recommend to the current user. CourseNana.COM

0. Read-in the preference file

The first exercise is to read in the movie preference csv file (you need to use relative path). CourseNana.COM

You must return two things: CourseNana.COM

  1. A dictionary where the key is username and the value is a vector of (-1, 0, 1) that indicates the user's preference across movies (in the order of the csv file). Note that 1 encodes a "like" and -1 encodes a "dislike". A zero means that the user has not watched that movie yet. CourseNana.COM

  2. A list of strings that contains movie names. (The order of movie names should be the same as the order in the original csv file) CourseNana.COM

Note 1: Your result should exactly match the results from the assert statements. This means you should pay attention to extra space, newline, etc. CourseNana.COM

Note 2: If there are two records with the same name, use the first record from the person. CourseNana.COM

import csv
def read_in_movie_preference():
    """Read the move data, and return a 
    preference dictionary."""
    preference = {}
    movies = []
    with open("preference.csv", newline='') as csvfile:

        reader = csv.reader(csvfile, delimiter=',')
        for i, row in enumerate(reader):
            if i == 0:
                movies = row[2:]
            elif i >= 1:
                user_name = row[1]
                if user_name not in preference:
                    preference[user_name] = [int(x) for x in row[2:]]

    return [movies, preference]
print (movies)
['The Shawshank Redemption', 'The Godfather', 'The Dark Knight ', 'Star Wars: The Force Awakens', 'The Lord of the Rings: The Return of the King', 'Inception', 'The Matrix ', 'Avengers: Infinity War ', 'Interstellar ', 'Spirited Away', 'Coco', 'The Dark Knight Rises', 'Braveheart', 'The Wolf of Wall Street', 'Gone Girl ', 'La La Land', 'Shutter Island', 'Ex Machina', 'The Martian', 'Kingsman: The Secret Service']
[movies, preference] = read_in_movie_preference()
assert len(movies) == 20
[movies, preference] = read_in_movie_preference()
assert movies == ['The Shawshank Redemption', 'The Godfather',
                       'The Dark Knight', 'Star Wars: The Force Awakens',
                       'The Lord of the Rings: The Return of the King',
                       'Inception', 'The Matrix', 'Avengers: Infinity War',
                       'Interstellar', 'Spirited Away', 'Coco', 'The Dark Knight Rises',
                       'Braveheart', 'The Wolf of Wall Street', 'Gone Girl', 'La La Land',
                       'Shutter Island', 'Ex Machina', 'The Martian', 'Kingsman: The Secret Service']
[movies, preference] = read_in_movie_preference()
assert preference["Jacob Scheinman"] == [1, 1, 1, 1, 1, 1, 1, 1, -1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1]
assert preference["Ziqing Ouyang"] == [0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]

1. Popularity-based Ranking

1.1 Compute the ranking of most popular movies

Your next task is to use the data stored in preference variable and compute the popularity scores of the movies. To compute a movie's popularity score, you should first compute the number of times people have liked movies in the entire dataset across all movies (i.e., total likes). You should then compute the number of times people have disliked movies in the entire dataset across all movies (i.e., total dislikes). CourseNana.COM

Let's assume that people have liked movies A times in the entire dataset and disliked movies B times in the entire dataset. The popularity score of a movie is then defined as *Num_of_People_Like_the_Movie - A / B Num_of_People_Dislike_the_Movie** CourseNana.COM

(We use A/B to normalize the weights of likes and dislikes because if one type of reaction is rare, it derseves more weights. For example, if a typical movie gets on average 100 likes and no dislike, a dislike conveys a much stronger message on a movie's quality than a like). CourseNana.COM

Your function should return: CourseNana.COM

  1. A dictionary where the keys are movie names and the values are correpsonding movie popularity score.
  2. A list of movie names sorted descendingly by their popularity. For example, if 'The Shawshank Redemption' is the second most popular movie, the second element in the list should be 'The Shawshank Redemption'.
  3. A and B as defined above.

Note: You may want to use prior functions to help you read data inside this function CourseNana.COM

def movies_popularity_ranking():
    movie_popularity = {}
    movie_popularity_rank = []
    total_likes = 0
    total_dislikes = 0

    total_likes = sum([sum([1 for rating in ratings if rating == 1]) for _, ratings in preference.items()])
    total_dislikes = sum([sum([1 for rating in ratings if rating == -1]) for _, ratings in preference.items()])

    # compute popularity scores for each movie
    for i, movie in enumerate(movies):
        likes = 0
        dislikes = 0
        for _, ratings in preference.items():
            if ratings[i] == 1:
                likes += 1
            elif ratings[i] == -1:
                dislikes += 1
        movie_popularity[movie] = ((likes - total_likes) / (total_dislikes) * dislikes)

    # sort movies by popularity score
    movie_popularity1 = sorted(movies, key=lambda x: movie_popularity[x], reverse=True)
    movie_popularity_rank = '\n'.join(movie_popularity1)
    return movie_popularity, movie_popularity_rank, total_likes, total_dislikes
print(movie_popularity_rank)
The Shawshank Redemption
Inception
The Matrix 
Braveheart
Kingsman: The Secret Service
Coco
The Wolf of Wall Street
The Godfather
Shutter Island
The Dark Knight Rises
The Martian
Gone Girl 
Interstellar 
Spirited Away
Ex Machina
The Dark Knight 
La La Land
Avengers: Infinity War 
The Lord of the Rings: The Return of the King
Star Wars: The Force Awakens
movie_popularity, movie_popularity_rank, total_likes, total_dislikes = movies_popularity_ranking()
assert total_likes == 1300
assert total_dislikes == 236
movie_popularity, movie_popularity_rank, total_likes, total_dislikes = movies_popularity_ranking()
assert round(movie_popularity["The Shawshank Redemption"], 2) == 66.98
assert round(movie_popularity["Avengers: Infinity War"], 2) == 14.86
movie_popularity, movie_popularity_rank, total_likes, total_dislikes = movies_popularity_ranking()
assert movie_popularity_rank == ['The Shawshank Redemption',
 'Inception',
 'Kingsman: The Secret Service',
 'The Wolf of Wall Street',
 'The Matrix',
 'Coco',
 'Avengers: Infinity War',
 'The Dark Knight Rises',
 'Interstellar',
 'The Dark Knight',
 'The Martian',
 'Spirited Away',
 'The Godfather',
 'Braveheart',
 'La La Land',
 'Shutter Island',
 'Gone Girl',
 'The Lord of the Rings: The Return of the King',
 'Ex Machina',
 'Star Wars: The Force Awakens']

1.2 Recommendation

You now implement a polularity-based recommendation function. This function takes in a user's name. It returns a string representing the name of a movie that satisfies the following three conditions: CourseNana.COM

  1. The user has not watched this movie.
  2. This movie has the best popularity score (among those that are not watched by the user).
  3. This movie has higher popularity score than the average of the popularity scores of the movies that this user has watched (the average is computed over all movies wateched by the user, regardless of whether they were liked by the user or not).

If the user name does not exit, this function should return "Invalid user." CourseNana.COM

If the user has watched all movies, this function should return "Unfortunately, no new movies for you." CourseNana.COM

If the unwatched movies all have lower popularity scores than the average score of the movies watched by this user, this function should return "Unfortunately, no new movies for you." CourseNana.COM

Note: Again, you may want to use prior functions to help you read data and rank movies inside this function CourseNana.COM

def Recommendation(name):
    recommended_movie = ""
    with open("preference.csv", newline='') as csvfile:
        reader = csv.reader(csvfile, delimiter=',')
    for i, row in enumerate(reader):
            if i == 0:
                movies = row[2:]
            elif i >= 1:
                user_name = row[1]
    if name not in user_name:
        return "Invalid user."

    watched_movies = user_name[name]
    unwatched_movies = set(movies) - set(watched_movies)

    if not unwatched_movies:
        return "Unfortunately, no new movies for you."

    unwatched_movies_popularity = {movie: movie_popularity[movie] for movie in unwatched_movies}
    max_popularity = max(unwatched_movies_popularity.values())

    watched_movies_popularity = [movie_popularity[movie] for movie in watched_movies]
    watched_movies_avg_popularity = sum(watched_movies_popularity) / len(watched_movies_popularity)

    if max_popularity <= watched_movies_avg_popularity:
        return "Unfortunately, no new movies for you."

    recommended_movie = None
    for movie in unwatched_movies:
        if movie_popularity[movie] == max_popularity:
            if recommended_movie is None or movie_popularity[recommended_movie] < movie_popularity[movie]:
                recommended_movie = movie

    if recommended_movie is None:
        return "Unfortunately, no new movies for you."

    return recommended_movie
assert Recommendation("Jiaxu Rong") == 'Inception'
assert Recommendation("Nobody") == 'The Shawshank Redemption'
assert Recommendation("Dennis Zhang") == 'Kingsman: The Secret Service'
assert Recommendation("Test Student 2") == 'Invalid user.'

2.1 Cosine Similarity

Let us now use collaborative filtering to find a good recommendation. CourseNana.COM

In order to do so, we need to get the cosine similarity beween movies and users. Again, we can use the preference file we used in Section 0. The file represents each person by a preference vector that consists of (0, 1, -1). Cosine similarity in our case is the dot product of the two preference vectors divided by the product of the magnitude of the two preference vectors. In other words, if person A has preference vector A, and person B has preference vector B, their cosine similarity is equal to CourseNana.COM

$$ \frac{A \cdot B}{||A||||B||} = \frac{\sum_i^n A_iB_i}{\sqrt{\sum_i^nA_i^2}\sqrt{\sum_i^nB_i^2}}$$ CourseNana.COM

If a person has not watched any movies, then the cosine similarity between this person and any other person is defined as 0. For more information on cosine simialrity, you can read this wiki page CourseNana.COM

As an example, let the following two vectors represent Naveed's and Jake's preference over 3 movies. CourseNana.COM

     Inception  Coco     The Dark Knight
Jake     1         -1        0

Naveed  -1          0        1

In this case, Naveed and Jake's cosine similarity is equal to CourseNana.COM

$$ \frac{1(-1)+(-1)0+0(-1)}{\sqrt{1+(-1)^2}\sqrt{(-1)^2+1}} = \frac{-1}{2} = -0.5$$ CourseNana.COM

Your task is to write a similarity function that takes in two names and returns the Cosine similarity between these two users. If one or both names do not exist in the database, return 0. CourseNana.COM

def Similarity(name_1, name_2):
    """Given two names and preference, get the similarity 
    between two people"""
    cosine = 0

    # YOUR CODE HERE
    raise NotImplementedError()

    return cosine
assert round(Similarity("Test Student", "Nobody"), 2) == 0.17
assert round(Similarity("Test Student", "DJZ2"), 2) == -0.27
assert round(Similarity("Test Student", "Test Student 2"), 2) == 0

2.2 Movie Soulmate

Your next task is to find the movie soulmate of a person. In order to find a person's movie soulmate, you will compute the cosine similarity between this person and every other person in the dataset. You will then return the person who has the highest cosine similarity with the focal person. If two people have the same cosine similarity with the focal person, you can tie break by the length of names (the person with shorter name will be the soulmate). If the focal person does not exist in the database, return an empty string as the soulmate name. CourseNana.COM

Your function will return two things: CourseNana.COM

  1. the name of the soulmate
  2. the largest cosine similarity
def Movie_Soul_Mate(name):
    """Given a name, get the player that has highest Jaccard 
    similarity with this person."""
    soulmate = ""
    cosine_similarity = -100

    # YOUR CODE HERE
    raise NotImplementedError()

    return soulmate, cosine_similarity
soulmate, cosine_similarity = Movie_Soul_Mate("Q")
assert soulmate == 'Yunong Tian'
assert round(cosine_similarity, 2) == 0.75
soulmate, cosine_similarity = Movie_Soul_Mate("Test Student")
assert soulmate == 'Michael Treiber'
assert round(cosine_similarity, 2) == 0.80
soulmate, cosine_similarity = Movie_Soul_Mate("Yunong Tian")
assert soulmate == 'Yuchen'
assert round(cosine_similarity, 2) == 0.81

2.3 Memory-based Collaborative Filtering Recommendation

Now after finding a person's movie soulmate, we can construct a (very preliminary) collaborative filtering recommendation. In our recommendation system, for a focal person, we first find his or her soulmate. We then find all the movies that he/she has not watched but the soulmate has watched and liked. Among all of these movies, we recommend the movie with the highest popularity score defined in Section 1.1 and 1.2 CourseNana.COM

Again, CourseNana.COM

if the user name does not exit, this function should return "Invalid user." CourseNana.COM

If the person has watched all the movies, return "Unfortunately, no new movies for you." CourseNana.COM

If there are no movies that are watched and liked by the soulmate but not watched by the focal person, then return the movie (or string) that should be returned in Section 1.2. CourseNana.COM

def Recommendation2(name):
    recommended_movie = ""

    # YOUR CODE HERE
    raise NotImplementedError()

    return recommended_movie
assert Recommendation2("Test Student") == 'Inception'
assert Recommendation2("Test Student Long Name") == 'The Shawshank Redemption'
assert Recommendation2("Test Student Long Name") == 'The Shawshank Redemption'

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
Python代写,Recommender System代写,Recommendation代写,Python代编,Recommender System代编,Recommendation代编,Python代考,Recommender System代考,Recommendation代考,Pythonhelp,Recommender Systemhelp,Recommendationhelp,Python作业代写,Recommender System作业代写,Recommendation作业代写,Python编程代写,Recommender System编程代写,Recommendation编程代写,Pythonprogramming help,Recommender Systemprogramming help,Recommendationprogramming help,Pythonassignment help,Recommender Systemassignment help,Recommendationassignment help,Pythonsolution,Recommender Systemsolution,Recommendationsolution,