1. Homepage
  2. Programming
  3. CS 135 Intro to Machine Learning - Project A: Classifying Sentiment

CS 135 Intro to Machine Learning - Project A: Classifying Sentiment

Engage in a Conversation
TuftsCS 135Intro to Machine LearningClassifying SentimentPythonBag-of-WordsPipelineSklearnNLTKCross ValidationLogistic Regression Classifier

Project A: Classifying Sentiment

Updates/Corrections: CourseNana.COM

  • Partner finding due-date changed from 10/3 to 10/6

Turn-in files (Gradescope links to be added): CourseNana.COM

  • PDF report turned in to
  • ZIP file of test-set predictions for Problem 1’s Bag-of-Words Leaderboard
  • ZIP file of test-set predictions for Problem 2’s Open-Ended Leaderboard

Overview

This is a multi-week project with lots of open-ended programming. Get started right away! CourseNana.COM

  • Release on Thu 9/26
  • Form partners by Sun 10/06 (complete signup form linked below)
  • Due on Thu 10/17

Suggested intermediate deadlines: CourseNana.COM

  • by Tue 10/08: Complete Problem 1 code/experimentation + leaderboard submission
  • by Thu 10/10: Complete Problem 1 writeup
  • by Tue 10/15: Complete Problem 2 code/experimentation + leaderboard submission
  • by Thu 10/17: Complete Problem 2 writeup

Team Formation

By start of class on the end of Sunday 10/6, you should have identified your partner and signed up here: CourseNana.COM

We extended the deadline from 10/3 to 10/6 to give folks a little more time, but we still encourage you to find your partner and start the project as soon as possible. CourseNana.COM

Even if you decide to work alone, you should fill this form out acknowledging that. CourseNana.COM

In this project, you are encouraged to work as a team of 2 people. If you prefer, you can work individually. Individual teams still need to complete all the parts below and will be evaluated no differently than teams. We strongly recommend working in pairs to keep your workload manageable. CourseNana.COM

If you need help finding a teammate, please post to our “Finding a Partner for Project A” post on Piazza. CourseNana.COM

Work to Complete

As a team, you will work on one semi-open problems, and then a completely open problem. CourseNana.COM

The 2 problems look at different representations of text for a common task. CourseNana.COM

  • Problem 1 looks at using bag-of-word feature representations
  • Problem 2 is an open-ended problem, where any feature representation is allowed

Throughout Problems 1 and 2, you will practice the development cycle of an ML practitioner: CourseNana.COM

  • Propose a reasonable ML pipeline (feature extraction + classifier)
  • Train that pipeline on available data
  • Evaluate results carefully on available data
  • Revise the pipeline and repeat

For all problems, we will maintain a leaderboard on Gradescope. You should periodically submit the predictions of your best model on the test set (we do not release the true labels of the test set to you in advance). CourseNana.COM

What to Turn In

Each team will prepare one PDF report covering all problems. CourseNana.COM

  • Suggested length 4 pages (upper limit is 6 pages)
  • This document will be manually graded as described in Section 5
  • Can use your favorite report writing tool (Word or G Docs or LaTeX or ….)
  • Should be human-readable. Do not include code. Do NOT just export a jupyter notebook to PDF.
  • Should have each subproblem marked via the in-browser Gradescope annotation tool)

Each team will prepare a ZIP file of test-set predictions for each of Problem 1 and Problem 2. CourseNana.COM

  • Each submission ZIP will contain just one plain text file: yproba1_test.txt
    • Each line contains float probability that the relevant example should be classified as a positive example given its features
    • Should be loadable into NumPy as a 1D array via this snippet: np.loadtxt('yproba1_test.txt')
    • Will be thresholded to produce hard binary predicted labels (either 0 or 1)

Each individual will turn in a reflection form (after completing the report) (link to be added). CourseNana.COM

Starter Code and Code Restrictions

For all required data and code, see the projectA folder of the public assignments repo for this class: CourseNana.COM

https://github.com/tufts-ml-courses/cs135-24f-assignments/tree/main/projectA CourseNana.COM

Our starter code repo provides a few scripts helping you load the data for each problem, but otherwise offers no other code. This is meant to simulate the lack of code you’d have in the “real world”, trying to build a text sentiment classifier from scratch using your machine learning skills. CourseNana.COM

For this assignment, you can use any Python package you like (sklearn, nltk, etc). You are welcome to consult the sklearn documentation website or other external web resources for snippets of code to guide your usage of different classifiers. However, you should understand every line of the code you use and not simply copy-paste without thinking carefully. You should also cite and acknowledge third-party code that made a significant impact on your work in your report. CourseNana.COM

Remember to keep the course collaboration policy in mind: do your own work! CourseNana.COM

Background

We have given you a dataset of several thousand single-sentence reviews collected from three domains: imdb.com, amazon.com, yelp.com. Each review consists of a sentence and a binary label indicating the emotional sentiment of the sentence (1 for reviews expressing positive feelings; 0 for reviews expressing negative feelings). All the provided reviews in the training and test set were scraped from websites whose assumed audience is primarily English speakers, but of course may contain slang, misspellings, some foreign characters, and many other properties that make working with natural language data challenging (and fun!). CourseNana.COM

Your goal is to develop a binary classifier that can correctly identify the sentiment of a new sentence. CourseNana.COM

Here are some example positive sentences: CourseNana.COM

imdb          The writers were "smack on" and I think the best actors and actresses were a bonus to the show.These characters were so real.
imdb          The Songs Were The Best And The Muppets Were So Hilarious.  
yelp          Food was so gooodd.
yelp          I could eat their bruschetta all day it is devine.

Here are some example negative sentences: CourseNana.COM

amazon        It always cuts out and makes a beep beep beep sound then says signal failed.
amazon        the only VERY DISAPPOINTING thing was there was NO SPEAKERPHONE!!!!
yelp          It sure does beat the nachos at the movies but I would expect a little bit more coming from a restaurant.
yelp          I'm not sure how long we stood there but it was long enough for me to begin to feel awkwardly out of place.

Dataset acknowledgment

This dataset comes from research work by D. Kotzias, M. Denil, N. De Freitas, and P. Smyth described in the KDD 2015 paper ‘From Group to Individual Labels using Deep Features’. We are grateful to these authors for making the dataset available. CourseNana.COM

Datasets

You are given the data in CSV file format, with 2400 input,output pairs in the training set, and 600 inputs in the test set. CourseNana.COM

Training set of 2400 examples CourseNana.COM

x_train.csv : input data, as text CourseNana.COM

  • Column 1: ‘website_name’ : one of [‘imdb’, ‘amazon’, ‘yelp’]
  • Column 2: ‘text’ : string sentence which represents the raw review

y_train.csv : binary labels to predict CourseNana.COM

  • Column 1: ‘is_positive_sentiment’ : 1 = positive sentiment, 0 = negative

Test set of 600 examples CourseNana.COM

x_test.csv : input data, as text CourseNana.COM

  • Column 1: ‘website_name’: as above
  • Column 2: ‘text’: as above

Performance metric

We will use Area under the ROC curve (AUROC) to judge your classifier’s quality. CourseNana.COM

Suggested Way to Load Data into Python

We suggest loading the sentence data using the read_csv method in Pandas: CourseNana.COM

x_train_df = pd.read_csv('x_train.csv')
tr_list_of_sentences = x_train_df['text'].values.tolist()

You can see a short example working Python script here: https://github.com/tufts-ml-courses/cs135-24f-assignments/blob/main/projectA/load_train_data.py CourseNana.COM

We’ll often refer to each review or sentence as a single “document”. Our goal is to classify each document into either the positive or negative sentiment class. CourseNana.COM

Preprocessing

As discussed in class, there are many possible approaches to feature representation, the process of transforming any possible natural language document (often represented as an ordered list of words which can be of variable length) into a feature vector xn of a standard length. CourseNana.COM

In this project, we will explore several approaches, including bag-of-words vectors (explored in Problem 1). Later, you’ll be allowed to try any feature representation approach you want (Problem 2). CourseNana.COM

In most cases, we suggest that you consider removing punctuation and converting upper case to lower case. CourseNana.COM

Problem 1: Bag-of-Words Feature Representation

Background on Bag-of-Words Representations

As discussed in class on day10, the “Bag-of-Words” (BoW) representation assumes a fixed, finite-size vocabulary of V possible words is known in advance, with a defined index order (e.g. the first word is “stegosaurus”, the second word is “dinosaur”, etc.). CourseNana.COM

Each document is represented as a count vector of length V, where entry at index v gives the number of times that the vocabulary word with index v appears in the document. CourseNana.COM

The key constraint with BoW representations is that each input feature must directly correspond to one human-readable unigram in a finite vocabulary. CourseNana.COM

That said, you have many design decision to make when applying a BoW representation: CourseNana.COM

  • How big is your vocabulary?
  • Do you exclude rare words (e.g. appearing in less than 10 documents)?
  • Do you exclude common words (like ‘the’ or ‘a’, or appearing in more than 50% of documents)?
  • Do you keep the count values, or only store present/absent binary values?

You are strongly encouraged to take advantage of the many tools that sklearn provides related to BoW representations: CourseNana.COM

Goals and Tasks for Problem 1

For Problem 1, you will develop an effective BoW representation plus binary classifier pipeline, aiming to produce the best possible performance on heldout data. CourseNana.COM

You should experiment with several possible ways of performing BoW preprocessing. CourseNana.COM

You should use only a LogisticRegression classifier for this problem. CourseNana.COM

You should use best practices in hyperparameter selection techniques to avoid overfitting and generalize well to new data. Within your hyperparameter selection, you should use cross-validation over multiple folds to assess the range of possible performance numbers that might be observed on new data. CourseNana.COM

Your report should contain the following sections: CourseNana.COM

1A : Bag-of-Words Design Decision Description

Well-written paragraph describing your chosen BoW feature representation pipeline, with sufficient detail that another student in this class could reproduce it. You are encouraged to use just plain English prose, but you might include a brief, well-written pseudocode block if you think it is helpful. CourseNana.COM

You should describe and justify all major decisions, such as: CourseNana.COM

  • how did you “clean” the data? (handle punctuation, upper vs. lower case, numbers, etc)
  • how did you determine the final vocabulary set? did you exclude words, and if so how?
  • what was your final vocabulary size (or rough estimate of size(s), if size varies across folds because it depends on the training set)?
  • did you use counts or binary values or something else?
  • how does your approach handle out-of-vocabulary words in the test set? (it’s fine if you just ignore them, but you should be aware of this)

1B : Cross Validation Design Description

Well-written paragraph describing how you use cross-validation to perform both classifier training and any hyperparameter selection needed for the classifier pipeline. CourseNana.COM

For Problem 1, you must use cross validation with at least 3 folds, searching over at least 5 possible hyperparameter configurations to avoid overfitting. CourseNana.COM

You should describe and justify all major decisions, such as: CourseNana.COM

  • What performance metric will your search try to optimize on heldout data?
  • How did you execute CV? How many folds? How big is each fold? how do you split the folds?
  • What off-the-shelf software did you use, if any?
  • After using CV to identify a selected hyperparameter configuration, how will you then build one “final” model to apply on the test set?

1C : Hyperparameter Selection for Logistic Regression Classifier

Using your BoW preprocessing, plus a logistic regression classifier, your goal is to train a model that achieves the best performance on heldout data. CourseNana.COM

Here, we ask you to use a LogisticRegression classifier, and identify a concrete hyperparameter search strategy. Which hyperparameters are you searching? What concrete grid of values will you try? CourseNana.COM

Your report should include a figure and paragraph summarizing the design of this search as well as the results. Please follow the hyperparameter selection rubric. CourseNana.COM

1D : Analysis of Predictions for the Best Classifier

In a figure, show some representative examples of false positives and false negatives for your chosen best classifier from 1C. Be sure to look at heldout examples (not examples used to train that model). It’s OK to analyze examples from just one fold (you don’t need to look at all K test sets in CV). CourseNana.COM

In a paragraph caption below the figure, try to characterize what kinds of mistakes the classifier makes. Do you notice anything about these sentences that you could use to improve performance? (You can apply these ideas later in Problem 2). CourseNana.COM

You could look at any of these questions: CourseNana.COM

  • does it do better on longer sentences or shorter sentences?
  • does it do better on a particular kind of review (amazon or imdb or yelp)?
  • does it do better on sentences without negation words (“not”, “didn’t”, “shouldn’t”, etc.)?

1E : Report Performance on Test Set via Leaderboard

Create your “final” classifier using the selected hyperparameters from 1C. Apply your classifier to each test sentence in x_test.csv. Store your probabilistic predictions into a single-column plain-text file yproba1_test.txt (remember, we’ll use AUROC as the metric to decide your rank on the leaderboard). Upload this file to our bag-of-words leaderboard. CourseNana.COM

In your report, include a summary paragraph stating your ultimate test set performance, compare it to your previous estimates of heldout performance from cross-validation, and reflect on any differences. CourseNana.COM

Problem 2: Open-ended challenge

Goals and Tasks for Problem 2

For this problem, your goal is to obtain the best performance on heldout data, using any feature representation you want, any classifier you want, and any hyperparameter selection procedure you want. CourseNana.COM

Here are some concrete examples of methods/ideas you could try: CourseNana.COM

  • Instead of only using single words (“unigrams”), as in Problem 1, can you consider some bigrams (e.g. ‘New York’ or ‘not bad’)?
  • Can you use smart reweighting techniques like term-frequency/inverse-document-frequency? See sklearn.feature_extraction.text.TfIdfVectorizer
  • Try a different classifier in sklearn (nearest neighbor, random forest, MLP, etc.). Be sure you understand enough about this classifier to define a reasonable hyperparameter search strategy.
  • Would it help to build separate classifiers for amazon, imdb, and yelp reviews?
  • Would it help build features for the first-half and second-half of sentences?
  • Can you use text compression methods to obtain features?
  • Can you use off-the-shelf neural representations of text, like word2vec or GloVe or BERT?
    • We’ve included BERT embeddings specifically for you, in addition to a notebook showing how to use the embeddings.

We expect to see the paragraphs and figures described below, that mirror the criteria above for 1A (overall design of feature representation), 1B (overall CV experimental strategy), 1C (hyperparameter search strategy for chosen classifier), and 1D (performance analysis). CourseNana.COM

For full credit, we expect that at least 2 parts out of 2A, 2B, and 2C explore substantially different methods than those used in Problems 1A, 1B, 1C. Each choice must be plausibly motivated by improving your classifier’s performance. CourseNana.COM

** 2A ** : Feature Representation description

Include a paragraph describing and justifying how you transformed text into fixed-length feature vectors suitable for classification. Include enough detail that another student could roughly reproduce your work. CourseNana.COM

If this is substantially similar to 1A, it is OK to say so and keep this paragraph brief (you don’t need to repeat yourself). CourseNana.COM

** 2B ** : Cross Validation (or Equivalent) description

Include a paragraph describing and justifying how you set up your training and hyperparameter selection process, given only the provided training set. Include enough detail that another student could roughly reproduce your work. CourseNana.COM

If this is substantially similar to 1B, it is OK to say so and keep this paragraph brief. CourseNana.COM

** 2D ** : Error analysis

In a figure, show some representative examples of false positives and false negatives for your chosen best classifier from 2C. Be sure to look at heldout examples (not examples used to train that model). CourseNana.COM

In a paragraph caption below the figure, try to characterize what kinds of mistakes the classifier makes. Reflect on any key differences from the classifier in Problem 1. CourseNana.COM

2E : Report Performance on Test Set via Leaderboard

Apply your best pipeline from 2A - 2D above to the test sentences in x_test.csv. Store your probabilistic predictions into a single-column plain-text file yproba1_test.txt. Upload this file to our Open-Ended leaderboard. CourseNana.COM

In your report, include a summary paragraph stating your ultimate test set performance. Discuss if your performance is better or worse than in Problem 1, and reflect on why you think that might be. CourseNana.COM

Grading

Overall Grade Breakdown

We’ll get a final number for this project by averaging: CourseNana.COM

  • 87% : your report performance, using the rubric below
  • 10% : your leaderboard submissions, using the rubric below
  • 3% : completion of your reflection on the project

Leaderboard Submissions

You’ll submit 2 sets of predictions to our leaderboard (one each for Problem 1 and 2) CourseNana.COM

For each one, we’ll give you a score between 0.0 and 1.0 where: CourseNana.COM

  • 85% of points represent if you achieved a “reasonable” score (e.g. a standard pipeline trained using good practices)
  • 15% of points awarded if you are within tolerance of the top 3 submissions in this class (partial credit possible, linearly interpolating between the “reasonable” score and the “top” score).

PDF Report

Earning full credit on this assignment requires a well-thought-out report that demonstrates you made reasonable design decisions for feature preprocessing and classifiers and followed machine learning best practices throughout, especially for hyperparameter selection. Achieving top-scores on the leaderboard is far less important than understanding why some methods and choices outperform others. CourseNana.COM

Points will be allocated across the various parts as follows: CourseNana.COM

  • 60%: Problem 1
  • 40%: Problem 2

Within each problem, we break down the points like this CourseNana.COM

  • 30%: Paragraph A on Feature representation design decisions
  • 15%: Paragraph B on cross validation design decisions
  • 35%: Paragraph C on training and selection for your classifier
  • 15%: Paragraph D on analysis of classifier mistakes/successes
  • 5%: Paragraph E reflection on heldout performance

Hyperparameter Selection Rubric

Figure Requirements:

Your figure should show heldout performance a range of at least 5 possible hyperparameter values controlling model complexity that cover both underfitting and overfitting. That is, if at all possible, at least one candidate value should show clear underfitting and at least one should show clear overfitting. CourseNana.COM

Your figure should: CourseNana.COM

  • Show both training set and validation set performance trends in the same plot. Make sure you add a legend, label your axes, add a title.
  • Show the typical performance at each hyperparameter via the average over multiple CV folds

An ideal figure will also CourseNana.COM

  • Communicate uncertainty around this typical value, by exposing the variation across the multiple CV folds CourseNana.COM

    • A simple way to show uncertainty is show the empirical range across all folds, or the empirical standard deviation
    • better way to do this is show a separate dot for the direct performance of each fold (so 5 dots for 5 folds).

The big idea here is that your figure should help the reader understand if one hyperparameter is definitely better than another (e.g. performance is better on most or all folds) or if there isn’t much difference. CourseNana.COM

Paragraph requirements:

In each paragraph where you describe training a classifier and selecting its hyperparameters to avoid overfitting, you should include CourseNana.COM

  • 1-2 sentences: describe the potential advantages of the chosen classifier for the task at hand.
  • 1-3 sentences: describe any necessary details about the training process (e.g. are there convergence issues? step-size selection issues? should you stop early to avoid overfitting?)
  • 1-2 sentences: describe which model complexity hyperparameter(s) were explored, how these values control model complexity, and why the chosen candidate value grids (or random distributions) are reasonable to explore the transition between under and over fitting and find the “sweet spot” in-between.
  • 1-2 sentences: describe the results of the experiment: which hyperparameter is preferred? is the evidence decisive, or uncertain?

For better readability, please bold the key factual takeaway for each of the 4 items above. For example, CourseNana.COM

Our 10-fold CV search for logistic regression searched the C hyperparameter, an L2-regularization penalty on the weights, across 20 log-spaced values between 10^-6 and 10^6 CourseNana.COM

General tips for Figures

Please do your best to keep figures close to the related paragraph, ideally on the same page. CourseNana.COM

If a figure contains multiple elements such as multiple lines or multiple sets of bars, please make sure that they are on the same scale. Alternatively, if different scales are necessary, adjust them appropriately to maintain reasonable and easily interpretable trends. For instance, it is inappropriate to represent a line fluctuating between 0 and 1 on the same scale as a line oscillating between 100 and 1000, as this would distort the representation and interpretation of the data points. CourseNana.COM

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
Tufts代写,CS 135代写,Intro to Machine Learning代写,Classifying Sentiment代写,Python代写,Bag-of-Words代写,Pipeline代写,Sklearn代写,NLTK代写,Cross Validation代写,Logistic Regression Classifier代写,Tufts代编,CS 135代编,Intro to Machine Learning代编,Classifying Sentiment代编,Python代编,Bag-of-Words代编,Pipeline代编,Sklearn代编,NLTK代编,Cross Validation代编,Logistic Regression Classifier代编,Tufts代考,CS 135代考,Intro to Machine Learning代考,Classifying Sentiment代考,Python代考,Bag-of-Words代考,Pipeline代考,Sklearn代考,NLTK代考,Cross Validation代考,Logistic Regression Classifier代考,Tuftshelp,CS 135help,Intro to Machine Learninghelp,Classifying Sentimenthelp,Pythonhelp,Bag-of-Wordshelp,Pipelinehelp,Sklearnhelp,NLTKhelp,Cross Validationhelp,Logistic Regression Classifierhelp,Tufts作业代写,CS 135作业代写,Intro to Machine Learning作业代写,Classifying Sentiment作业代写,Python作业代写,Bag-of-Words作业代写,Pipeline作业代写,Sklearn作业代写,NLTK作业代写,Cross Validation作业代写,Logistic Regression Classifier作业代写,Tufts编程代写,CS 135编程代写,Intro to Machine Learning编程代写,Classifying Sentiment编程代写,Python编程代写,Bag-of-Words编程代写,Pipeline编程代写,Sklearn编程代写,NLTK编程代写,Cross Validation编程代写,Logistic Regression Classifier编程代写,Tuftsprogramming help,CS 135programming help,Intro to Machine Learningprogramming help,Classifying Sentimentprogramming help,Pythonprogramming help,Bag-of-Wordsprogramming help,Pipelineprogramming help,Sklearnprogramming help,NLTKprogramming help,Cross Validationprogramming help,Logistic Regression Classifierprogramming help,Tuftsassignment help,CS 135assignment help,Intro to Machine Learningassignment help,Classifying Sentimentassignment help,Pythonassignment help,Bag-of-Wordsassignment help,Pipelineassignment help,Sklearnassignment help,NLTKassignment help,Cross Validationassignment help,Logistic Regression Classifierassignment help,Tuftssolution,CS 135solution,Intro to Machine Learningsolution,Classifying Sentimentsolution,Pythonsolution,Bag-of-Wordssolution,Pipelinesolution,Sklearnsolution,NLTKsolution,Cross Validationsolution,Logistic Regression Classifiersolution,