1. Homepage
  2. Programming
  3. CS 135 Intro to Machine Learning - Homework 2: Evaluating Binary Classifiers and Implementing Logistic Regression

CS 135 Intro to Machine Learning - Homework 2: Evaluating Binary Classifiers and Implementing Logistic Regression

Engage in a Conversation
TuftsCS135Intro to Machine LearningEvaluating Binary ClassifiersLogistic RegressionCancer-Risk ScreeningPython

Homework 2: Evaluating Binary Classifiers and Implementing Logistic Regression

    Files to Turn In: CourseNana.COM

    ZIP file of source code submitted to autograder should contain: CourseNana.COM

    • binary_metrics.py : will be autograded
    • hw2_notebook.ipynb : will not be autograded, but may be manually assessed to verify authorship/correctness/partial credit

    PDF report (manually graded): CourseNana.COM

    • Export a PDF of your completed Jupyter notebook.
    • You can export a notebook as a PDF in 2 steps (or 1 one if you want to install Pandoc and xelatex, but that’s not required).
      1. In Jupyter, go to File -> Save and Export Notebook As -> HTML. Note that without installing additional software directly exporting as a PDF will not work.
      2. Open the saved HTML file, then print the page using your web browser’s builtin print functionality.
    • While uploading to gradescope, mark each subproblem marked via the in-browser Gradescope annotation tool) in your uploaded report.

    Evaluation Rubric: CourseNana.COM

    • 80% will be the report
    • 20% will be the autograder score of your code

    See the PDF submission portal on Gradescope for the point values of each problem. Generally, tasks with more coding/effort will earn more potential points. CourseNana.COM

    Background

    In this HW, you’ll complete two problems related to binary classifiers. CourseNana.COM

    In Problem 1, you’ll implement common metrics for evaluating binary classifiers. CourseNana.COM

    In problem 2, you’ll learn how to decide if a new feature can help classify cancer better than a previous model. CourseNana.COM

    As much as possible, we have tried to decouple these parts, so you may successfully complete the report even if some of your code doesn’t work. Much of your analysis will use library code in sklearn with similar functionality as what you implement yourself. CourseNana.COM

    This homework specifically deals with: CourseNana.COM

    • Classifier Basics
    • Evaluation of Binary Classifiers

    Starter Code

    See the hw2 folder of the public assignments repo for this class: https://github.com/tufts-ml-courses/cs135-24f-assignments/tree/main/hw2 CourseNana.COM

    This starter code includes several files CourseNana.COM

    For Problem 1: CourseNana.COM

    • You need to edit code in binary_metrics.py

    For Problem 2: CourseNana.COM

    • You need to edit hw2_notebook.ipynb, which will help you organize your analysis for the report. CourseNana.COM

    • Helper functions in threshold_selection.py and confusion_matrix.py are implemented for you. You should understand the provided code, but you do NOT need to edit these files. CourseNana.COM

    Problem 1: Implement performance metrics for binary predictions

    Here, you’ll implement several metrics for comparing provided “true” binary labels with “predicted” binary decisions. CourseNana.COM

    See the starter code file: binary_metrics.py CourseNana.COM

    Task 1(a) : Implement calc_TP_TN_FP_FN CourseNana.COM

    Task 1(b) : Implement calc_ACC CourseNana.COM

    Task 1(c) : Implement calc_TPR CourseNana.COM

    Task 1(d) : Implement calc_PPV CourseNana.COM

    See the starter code for example inputs and the expected output. CourseNana.COM

    Problem 2: Binary Classifier for Cancer-Risk Screening

    Dataset: Predicting Cancer Risk from Easy-to-measure Facts

    We are building classifiers that decide if patients are low-risk or high-risk for some kind of cancer. If our classifier is reliable enough at identifying low-risk patients, we could use the classifier’s assigned label to perform screening: CourseNana.COM

    • if y^=1: do a follow-up biopsy
    • if y^=0: no follow-up necessary

    Currently, all patients have the biopsy. Can we reduce the number of patients that need to go thru the biopsy, while still catching almost all the cases that have cancer? CourseNana.COM

    Setup: You have been given a dataset containing some medical history information for 750 patients that might be at risk of cancer. Dataset credit: A. Vickers, Memorial Sloan Kettering Cancer Center [original link]. CourseNana.COM

    Each patient in our dataset has had a biopsy, a short surgical procedure to extract a tumor sample that is a bit painful but with virtually no lasting harmful effects. After the biopsy, lab techs can test the tissue sample to obtain a direct “ground truth” label so we know each patient’s actual cancer status (binary variable, 1 means “has cancer”, 0 means does not, column name is cancer in the y data files). CourseNana.COM

    We want to build classifiers to predict whether a patient likely has cancer from easier-to-get information, so we could avoid painful biopsies unless they are necessary. Of course, if we skip the biopsy, a patient with cancer would be left undiagnosed and therefore untreated. We’re told by the doctors this outcome would be life-threatening. CourseNana.COM

    Easiest features: It is known that older patients with a family history of cancer have a higher probability of harboring cancer. So we can use age and famhistory variables in the x dataset files as inputs to a simple predictor. CourseNana.COM

    Possible new feature: A clinical chemist has recently discovered a real-valued marker (called marker in the x dataset files) that she believes can distinguish between patients with and without cancer. We wish to assess whether or not the new marker does indeed identify patients with and without cancer well. CourseNana.COM

    To summarize, there are two versions of the features x we’d like you to examine: CourseNana.COM

    • 2-feature dataset: ‘age’ and ‘famhistory’
    • 3-feature dataset: ‘marker’, ‘age’ and ‘famhistory’

    In the starter code, we have provided an existing train/validation/test split of this dataset, stored on-disk in comma-separated-value (CSV) files: x_train.csv, y_train.csv, x_valid.csv, y_valid.csv, x_test.csv, and y_test.csv. CourseNana.COM

    Data: https://github.com/tufts-ml-courses/cs135-24f-assignments/tree/main/hw2/data_cancer CourseNana.COM

    Understanding the Dataset

    Implementation Step 1A CourseNana.COM

    Given the provided datasets (as CSV files), load them and compute the relevant counts needed for Table 1A below. CourseNana.COM

    Table 1 in Report CourseNana.COM

    Provide a table summarizing some basic properties of the provided training set, validation set, and test set: CourseNana.COM

    • Row 1 ‘total count’: how many total examples are in each set?
    • Row 2 ‘positive label count’: how many examples have a positive label (means cancer)?
    • Row 3 ‘fraction positive’ : what fraction (between 0 and 1) of the examples have cancer?

    Establishing Baseline Prediction Quality

    Implementation Step 1B CourseNana.COM

    Given a training set of values {yn}n=1N, we can always consider a simple baseline for prediction that returns the same constant predicted label regardless of the input xi feature vector: CourseNana.COM

    • predict-0-always : y^(xi)=0 for all i

    Short Answer 1a in Report CourseNana.COM

    What accuracy does the “predict-0-always” classifier get on the test set (report to 3 decimal places)? (You should see a pretty high number). Why isn’t this classifier “good enough” to use in our screening task? CourseNana.COM

    Trying Logistic Regression: Training and Hyperparameter Selection

    Implementation Step 1C CourseNana.COM

    Consider the 2-feature dataset. Fit a logistic regression model using sklearn’s LogisticRegression implementation sklearn.linear_model.LogisticRegression docs. CourseNana.COM

    When you construct your LogisticRegression classifier, please be sure that: CourseNana.COM

    • Set solver='lbfgs' (ensures consistent performance, coherent penalty)
    • Provide a positive value for hyperparameter C, an “inverse strength value” for the L2 penalty on coefficient weights
      • Small C (e.g. 106) mean the weights should be near zero (equivalent to large α)
      • Large C (e.g. 10+6) means the weights should be unpenalized (equivalent to small α)

    To avoid overfitting, you should explore a range of C values, using a regularly-spaced grid: C_grid = np.logspace(-9, 6, 31). Among these possible values, select the value that minimizes the mean cross entropy loss on the validation set. The starter code contains a function from sklearn for computing this loss. CourseNana.COM

    Implementation Step 1D CourseNana.COM

    Repeat 1C, for the 3-feature dataset. CourseNana.COM

    Comparing Models with ROC Analysis

    We have trained two possible LR models, one using the 2-feature dataset (F=2) and the other with the 3-feature dataset (F=3). Which is better? CourseNana.COM

    Receiver Operating Curves (“ROC” curves) allow us to compare classifiers across many possible decision thresholds. Each curve shows the tradeoffs a classifier makes between true positive rate (TPR) and false positive rate (FPR), as you vary the decision threshold. Remember FPR = 1 - TNR. CourseNana.COM

    Implementation Step 1E CourseNana.COM

    Compare the F=2 and F=3 model’s performance on the validation set, using ROC curves. CourseNana.COM

    You can use `sklearn.metrics.roc_curve’ to plot such curves. To understand how to use this function, consult the function’s User Guide and documentation. CourseNana.COM

    Create a single plot showing two lines: CourseNana.COM

    • one line is the validation-set ROC curve for the F=2 model from 1C (use color BLUE (‘b’) and style ‘.-’)
    • one line is the validation-set ROC curve for the F=3 model from 1D (use color RED (‘r’) and style ‘.-’)

    Figure 1 in Report CourseNana.COM

    In your report, show the plot you created in step 1E. No caption is necessary. CourseNana.COM

    Short Answer 1b in Report CourseNana.COM

    Compare the two models in terms of their ROC curves from Figure 1. Does one dominate the other in terms of overall performance across all thresholds, or are there some threshold regimes where the 2-feature model is preferred and other regimes where the 3-feature model is preferred? Which model do you recommend for the task at hand? CourseNana.COM

    Selecting the Decision Threshold

    Remember that even after we train an LR model to make probabilistic predictions, if we intend the classifier to ultimately make some yes/no binary decision (e.g. should we give a biopsy or not), we need to select the threshold we use to obtain a binary decision from probabilities. CourseNana.COM

    Of course, we could just use a threshold of 0.5 (y_pred=0 if y_proba<0.5 else 1, which is what sklearn and most implementations will do by default). Below, we’ll compare that approach against several potentially smarter strategies for selecting this threshold. CourseNana.COM

    To get candidate threshold values, use the helper function compute_perf_metrics_across_thresholds in the starter code file threshold_selection.py. CourseNana.COM

    Implementation Step 1F CourseNana.COM

    For the classifier from 1D above (LR for 3-features), calculate performance metrics using the default threshold of y_proba < 0.5. Produce the confusion matrix and calculate the TPR and PPV on the test set. Tip: Remember that we have implemented helper functions for you in confusion_matrix.py. CourseNana.COM

    Implementation Step 1G CourseNana.COM

    For the classifier from 1D above (LR for 3-features), compute performance metrics across all candidate thresholds on the validation set (use compute_perf_metrics_across_thresholds). Then, pick the threshold that maximizes TPR while satisfying PPV >= 0.98 on the validation set. If there’s a tie for the maximum TPR, chose the threshold corresponding to a higher PPV. CourseNana.COM

    Remember, you pick this threshold based on the validation set, then later you’ll evaluate it on the test set. CourseNana.COM

    Implementation Step 1H CourseNana.COM

    For the classifier from 1D above (LR for 3-feature), compute performance metrics across all candidate thresholds on the validation set (use compute_perf_metrics_across_thresholds), and pick the threshold that maximizes PPV while satisfying TPR >= 0.98 on the validation set. If there’s a tie for the maximum PPV, chose the threshold corresponding to a higher TPR. CourseNana.COM

    Remember, you pick this threshold based on the validation set, then later you’ll evaluate it on the test set. CourseNana.COM

    Short Answer 1c in Report CourseNana.COM

    By carefully reading the confusion matrices, report for each of the 3 thresholding strategies in parts 1F - 1H how many subjects in the test set are saved from unnecessary biopsies that would be done in current practice. CourseNana.COM

    Hint: You can assume that currently, the hospital would have done a biopsy on every patient in the test set. Your goal is to build a classifier that improves on this practice. CourseNana.COM

    Short Answer 1d in Report CourseNana.COM

    Among the 3 possible thresholding strategies, which strategy best meets the stated goals of stakeholders in this screening task: avoid life-threatening mistakes whenever possible, while also eliminating unnecessary biopsies? What fraction of current biopsies might be avoided if this strategy was adopted by the hospital? CourseNana.COM

    Hint: You can also assume the test set is a reasonable representation of the true population of patients. CourseNana.COM

    Get in Touch with Our Experts

    WeChat (微信) WeChat (微信)
    Whatsapp WhatsApp
    Tufts代写,CS135代写,Intro to Machine Learning代写,Evaluating Binary Classifiers代写,Logistic Regression代写,Cancer-Risk Screening代写,Python代写,Tufts代编,CS135代编,Intro to Machine Learning代编,Evaluating Binary Classifiers代编,Logistic Regression代编,Cancer-Risk Screening代编,Python代编,Tufts代考,CS135代考,Intro to Machine Learning代考,Evaluating Binary Classifiers代考,Logistic Regression代考,Cancer-Risk Screening代考,Python代考,Tuftshelp,CS135help,Intro to Machine Learninghelp,Evaluating Binary Classifiershelp,Logistic Regressionhelp,Cancer-Risk Screeninghelp,Pythonhelp,Tufts作业代写,CS135作业代写,Intro to Machine Learning作业代写,Evaluating Binary Classifiers作业代写,Logistic Regression作业代写,Cancer-Risk Screening作业代写,Python作业代写,Tufts编程代写,CS135编程代写,Intro to Machine Learning编程代写,Evaluating Binary Classifiers编程代写,Logistic Regression编程代写,Cancer-Risk Screening编程代写,Python编程代写,Tuftsprogramming help,CS135programming help,Intro to Machine Learningprogramming help,Evaluating Binary Classifiersprogramming help,Logistic Regressionprogramming help,Cancer-Risk Screeningprogramming help,Pythonprogramming help,Tuftsassignment help,CS135assignment help,Intro to Machine Learningassignment help,Evaluating Binary Classifiersassignment help,Logistic Regressionassignment help,Cancer-Risk Screeningassignment help,Pythonassignment help,Tuftssolution,CS135solution,Intro to Machine Learningsolution,Evaluating Binary Classifierssolution,Logistic Regressionsolution,Cancer-Risk Screeningsolution,Pythonsolution,