1. Homepage
  2. Programming
  3. CS152 L3D Learning from Limited Labeled Data - HW1: Transfer Learning for the Birds

CS152 L3D Learning from Limited Labeled Data - HW1: Transfer Learning for the Birds

Engage in a Conversation
TuftsCS152Learning from Limited Labeled DataTransfer Learning for the BirdsPythonMachine Learning

HW1: Transfer Learning for the Birds


CourseNana.COM

    Your ZIP file should include CourseNana.COM

    • All starter code files (.py and .ipynb) with your edits (in the top-level directory)

    Your PDF should include (in order): CourseNana.COM

    • Your full name
    • Collaboration statement
    • Problem 1 figure 1a with caption, figure 1b with caption
    • Problem 2 figure 2a with caption, short answer
    • Problem 3 short answer

    Questions? CourseNana.COM

    • First look at the HW1 FAQ post on Piazza
    • Then, post a new question to Piazza, using hw1 topic

    Jump to: Starter Code   Problem 1   Problem 2   Problem 3 CourseNana.COM

    Updates to Instructions: CourseNana.COM

    • 2024-09-16 02:25 : Improved intended explanation of Prob 1 Task CODE 1(v) for eval_acc
    • 2024-09-17 05:00 : Fixed bug in data_utils.py, please get latest version. Only impacts test set.
    • 2024-09-18 10:00 : Updated last line of eval_acc in hw1.ipynb, to correctly divide by number of test examples.

    Goals

    We spent Week 2 learning about Transfer Learning, including both positive aspects ("Astounding Baseline" paper) and its limitations ("Do ImageNet transfer to real-world data?"). CourseNana.COM

    In this HW1, you'll apply Transfer Learning to a real dataset, and wrestle with several questions: CourseNana.COM

    • Problem 1: For a specific target classification task of interest, would we rather have a source model trained on a "generic" dataset like ImageNet1k, or a smaller dataset related to our target task? CourseNana.COM

    • Problem 2: What are the tradeoffs between fine-tuning just the last layer (aka "linear probing") and fine-tuning a few more layers? Can we compose these to do better? CourseNana.COM

    Starter Code and Provided Data

    You can find the starter code and our provided "BirdSnap-10" dataset in the course Github repository here: CourseNana.COM

    Two ways to run the experiments
    CourseNana.COM

    Option 1: Google Colab cloud-based notebook environment

    To get started quickly on Colab: CourseNana.COM

    • Follow this link:
    • Use "Ctrl-S" or similar to copy that notebook into your Google Drive to save progress

    Here's a quick video demo of how to setup Colab. CourseNana.COM

    Option 2: Your own environment

    You can find .yml files specifiying required python packages in the repo. CourseNana.COM

    You'll be responsible for getting things working yourself. CourseNana.COM

    Background

    A wildlife conservation organization has reached out for help to develop an automated bird species classifier. Their goal is to classify 10 specific bird species critical to biodiversity conservation. Unfortunately, large datasets for these birds are difficult to acquire. CourseNana.COM

    You have been provided: CourseNana.COM

    • a `train' dataset, to be used for all model development (training and validation)
    • a `test' dataset, to be used only to evaluate model generalization

    You can obtain these images by unpacking `birdsnap10_224x224only.zip'. CourseNana.COM

    These images come from BirdSnap dataset, a public dataset released by Berg et al. in 2014 paper release page. CourseNana.COM

    You will explore 4 possible pretrained models provided by pytorchcv, that differ across two key axes CourseNana.COM

    • Model architecture
      • ResNet-10, with ~5M parameters
      • ResNet-26, with ~17M parameters
    • Source Dataset
      • ImageNet-1k, large and diverse data containing >1 million images from 1000 classes
      • CUB-2011-200, specialized to birds, containing 11000 images of 200 bird species

    We have tried our best to construct this task so your target BirdSnap-10 dataset has no class overlap with the species in CUB-2011-200, and also no overlap with classes in ImageNet-1k (which does contain several bird classes). CourseNana.COM

    Problem 1: Should Source Models Specialize or Generalize?

    Your goal in Problem 1 is to compare the 4 possible source models (each combination of 2 datasets x 2 archs) at last layer fine-tuning. CourseNana.COM

    Tasks for Code Implementation

    Open models.py, and examine the PTNetForBirdSnap10 class, which defines a pytorch neural net module that uses a pretrained backbone combined with a simple linear classification head for the 10-class BirdSnap data. CourseNana.COM

    CODE 1(i) Edit setup_trainable_params method so that, given a desired number of layers n as an int, the last n layers are set to trainable (accepting gradient updates in the PyTorch computation graph) and all other parameters should be frozen (not accepting gradient updates). Hint: you'll need to edit the boolean property requires_grad of each parameter tensor. CourseNana.COM

    Now open train.py, which defines a function for performing training on our target BirdSnap data. This function works whether we are just updating the last-layer or more layers. CourseNana.COM

    CODE 1(ii) Edit train_model method to compute the cross-entropy loss, in two places. First, for the current batch of train data, inside the tr_loader loop. Second, for the current batch of validation data, inside the va_loader loop. CourseNana.COM

    The way you take averages differs a bit: CourseNana.COM

    • Train: You want a per-example average from current batch, as a fast yet unbiased estimator for computing loss/gradients.
    • Valid: You want the true per-example average over full validation dataset.

    Next, implement strategies to avoid overfitting, like L2 penalization of weight magnitudes for the last layer. CourseNana.COM

    CODE 1(iii) Edit train_model to add a L2 penalty loss on the weights only (not biases) of the last layer ("classification head"). For our provided model, you can use the dict model.trainable_params to access a tensor using its name as the key. The last-layer weights are named 'output.weight'. Hint: To compute L2 magnitude, think sum of squares). CourseNana.COM

    CODE 1(iv) Edit train_model to add early stopping functionality. We track the number of consecutive epochs that validation cross-entropy loss is getting worse. Once this exceeds a threshold, we revert the model to its previous state (which gave the best val-set cross-entropy) and return that model. CourseNana.COM

    Finally, write code to evaluate test set accuracy. CourseNana.COM

    CODE 1(v) Edit eval_acc (defined in the body of hw1.ipynb) to measure a model's accuracy (defined as the fraction of examples that are correctly predicted, from 0.0 to 1.0, higher is better) on the provided test set. CourseNana.COM

    Tasks for Experiment Execution

    Now, step through the provided notebook hw1.ipynb to achieve the following CourseNana.COM

    EXPERIMENT 1(i): First, do last-layer fine-tuning of ResNet10 using the ImageNet1k pretrained model, on the available training/validation sets. Use the provided train/valid data loaders (don't mess with batch_size or other settings). Monitor train and validation metrics, make your own plots of these metrics as needed. Find a suitable learning rate and L2-regularization strength to minimize over/under-fitting. CourseNana.COM

    EXPERIMENT 1(ii): Repeat step 1(i) above for the other combinations of architecture and source dataset. Using the code's computed train/validation metrics tracked over epochs, find a setting of hyperparamters (n_epochs, lr, l2 penalty, seed). that seems to deliver reasonable heldout performance without too much over/under-fitting. CourseNana.COM

    We recommend saving intermediate results to a .pkl file or similar (see hw1.ipynb), so it is easy to plot later without redoing experiments. CourseNana.COM

    Tasks for Report Writing

    In your submitted report, include the following CourseNana.COM

    FIGURE 1(a): Plot loss-vs-epoch for your best runs of (ResNet10, ImageNet1k) and (ResNet10, CUB). Use style provided in the Figure1a block in hw1.ipynb. Include this figure in your report, with a caption that summarizes major takeaway messages (did you see major overfitting? was strong L2-penalization needed? was early stopping beneficial?). CourseNana.COM

    FIGURE 1(b): Make target-task-accuracy vs source-task-accuracy plot, like the main figure in the Fang et al. paper from day04. Use the style provided in the Figure1b block in hw1.ipynb. Include this figure in your report, with a caption that summarizes both the major takeaway messages of your results (which src-dataset is better for our target task? which arch is better?). Try to reason about why given your knowledge from readings. CourseNana.COM

    Problem 2: LP then FT

    To keep things simple, we'll fix ('ResNet10', 'ImageNet1k') for the arch and source dataset throughout Problem 2. Be sure you're only using this configuration. CourseNana.COM

    Your goal in Problem 2 is to implement LP-then-FT method of Kumar et al. (from our day04 readings). That is, you'll do: CourseNana.COM

    • First stage of LP (Call train.train_model with n_trainable_layers=1). CourseNana.COM

      • You can reuse the best hyperparameters from Problem 1 above.
    • Second stage of FT (Call train.train_model with n_trainable_layers=3). Be sure to initialize from the model produced by stage one. You'll have 3 trainable layers (not just 1), so lots more flexibility but also potential to overfit. CourseNana.COM

      • You'll need to tune lr / l2penalty / n_epochs to be sure you are fitting reasonably

    Tasks for Code Implementation

    CODE 2(i) : Edit your hw1.ipynb notebook to implement two-phase training. In the first phase, again set n_trainable_layers=1 and use exactly the lr/l2penalty/seed combinations that worked well in Problem 1. In the second phase, you'll want to consider different settings of lr/l2penalty. CourseNana.COM

    To make things quick, you can run the first phase just once, yielding a good LPmodel, and then tune the hyperparameters of the second phase using copy.deepcopy(LPmodel) to get a copy of the model to train each hyperparameter config, while leaving the original LPmodel unchanged. CourseNana.COM

    Tasks for Experiment Execution

    EXPERIMENT 2(i): Run a small-scale hyperparameter search (either manually or systematically), aiming to find a configuration of lr/l2penalty for the second phase that delivers the best possible validation performance. Don't spend more than about an hour. CourseNana.COM

    EXPERIMENT 2(ii): Compute the test-set accuracy for both the phase 1 and phase 2 "best" models, using eval_acc CourseNana.COM

    Tasks for Report Writing

    In your submitted report, include the following CourseNana.COM

    FIGURE 2(a): Plot loss/error-vs-epoch plots in two panels (left=LP phase, right=FT phase), using style provided in the Figure2a block in hw1.ipynb. Aim to show the best run from experiment 2(i), where ideally reading across the plot there is obvious continuity between the LP and FT phases (e.g. va loss doesn't immediately jump away from the values seen at end of pretraining). Include this figure in your report, with a caption that summarizes major takeaway messages: was your implementation successful? CourseNana.COM

    SHORT ANSWER 2(b) Report the ultimate test-set accuracy for both LP and LP-then-FT. Reflect on any differences. CourseNana.COM

    Problem 3: Conceptual Questions

    Short Answer 3a

    Provide a math formula for computing the complete loss used to train models here, including the cross-entropy and the L2-penalty terms. CourseNana.COM

    Notation: CourseNana.COM

    • B : total number of examples in current batch, indexed by i
    • C : total number of classes for target task
    • yi : int indicator of class label for example i
    • zi : vector of logits for example i
    • w : matrix of weight parameters of last layer
    • b : vector of bias parameters of last layer

    You can only use basic math functions (log, sum, exp). Be sure to clearly define the assumed size of each vector/matrix, using the actual values in the code you implemented in Problem 1. CourseNana.COM

    Get in Touch with Our Experts

    WeChat (微信) WeChat (微信)
    Whatsapp WhatsApp
    Tufts代写,CS152代写,Learning from Limited Labeled Data代写,Transfer Learning for the Birds代写,Python代写,Machine Learning代写,Tufts代编,CS152代编,Learning from Limited Labeled Data代编,Transfer Learning for the Birds代编,Python代编,Machine Learning代编,Tufts代考,CS152代考,Learning from Limited Labeled Data代考,Transfer Learning for the Birds代考,Python代考,Machine Learning代考,Tuftshelp,CS152help,Learning from Limited Labeled Datahelp,Transfer Learning for the Birdshelp,Pythonhelp,Machine Learninghelp,Tufts作业代写,CS152作业代写,Learning from Limited Labeled Data作业代写,Transfer Learning for the Birds作业代写,Python作业代写,Machine Learning作业代写,Tufts编程代写,CS152编程代写,Learning from Limited Labeled Data编程代写,Transfer Learning for the Birds编程代写,Python编程代写,Machine Learning编程代写,Tuftsprogramming help,CS152programming help,Learning from Limited Labeled Dataprogramming help,Transfer Learning for the Birdsprogramming help,Pythonprogramming help,Machine Learningprogramming help,Tuftsassignment help,CS152assignment help,Learning from Limited Labeled Dataassignment help,Transfer Learning for the Birdsassignment help,Pythonassignment help,Machine Learningassignment help,Tuftssolution,CS152solution,Learning from Limited Labeled Datasolution,Transfer Learning for the Birdssolution,Pythonsolution,Machine Learningsolution,