1. Homepage
  2. Programming
  3. CS615 Deep Learning Assignment 4 - Exploring Hyperparameters Spring 2024

CS615 Deep Learning Assignment 4 - Exploring Hyperparameters Spring 2024

Engage in a Conversation
DrexelCS615Deep LearningPythonHyperparametersClassifierAdaptive Learning Rate

Introduction CourseNana.COM

CS 615 - Deep Learning CourseNana.COM

Assignment 4 - Exploring Hyperparameters Spring 2024 CourseNana.COM

In this assignment we will explore the effect of different hyperparameter choices and apply a multi– class classifier to a dataset. CourseNana.COM

Programming Language/Environment CourseNana.COM

As per the syllabus, we are working in Python 3.x and you must constrain yourself to using numpy, matplotlib, pillow and opencv–python add–on libraries. CourseNana.COM

Allowable Libraries/Functions CourseNana.COM

In addition, you cannot use any ML functions to do the training or evaluation for you. Using basic statistical and linear algebra functions like mean, std, cov etc.. is fine, but using ones like train, confusion, etc.. is not. Using any ML–related functions, may result in a zero for the programming component. In general, use the “spirit of the assignment” (where we’re implementing things from scratch) as your guide, but if you want clarification on if can use a particular function, DM the professor on Discord. CourseNana.COM

Grading CourseNana.COM

Part 1 (Theory)
Part 2 (Visualizing an Objective Function) Part 3 (Exploring Model Initialization Effects) Part 4 (Exploring Learning Rate Effects)
Part 5 (Adaptive Learning Rate)
Part 6 (Multi–class classification)
CourseNana.COM

Table 1: Grading Rubric CourseNana.COM

20pts 10pts 20pts 20pts 20pts 10pts CourseNana.COM

Datasets CourseNana.COM

MNIST Database The MNIST Database is a dataset of hand-written digits from 0 to 9. The original dataset contains 60,000 training samples, and 10,000 testing samples, each of which is a 28 × 28 image. CourseNana.COM

To keep processing time reasonable, we have extracted 100 observations of each class from the train- ing datase,t and 10 observations of each class from the validation/testing set to create a new dataset in the files mnist train 100.csv and mnist valid 10.csv, respectively. CourseNana.COM

The files are arranged so that each row pertains to an observation, and in each row, the first column is the target class ∈ {0, 9}. The remaining 784 columns are the features of that observation, in this case, the pixel values. CourseNana.COM

For more information about the original dataset, you can visit: http://yann.lecun.com/exdb/mnist/ CourseNana.COM

1 Theory CourseNana.COM

Whenever possible, please leave your answers as fractions so the question of rounding and loss of precision therein does not come up. CourseNana.COM

1. What would the one–hot encoding be for the following set of multi–class labels (5pts)? and the fully connected layer having weights W CourseNana.COM

tions. For simplicity do not z-score your inputs. (5pts)
Input Fully Connected Softmax CourseNana.COM

  1. Using the same setup as the previous question, what are the gradients to update the fully connected layer’s weights (both W and b) if we’re using a cross-entropy objective function if CourseNana.COM

    0
    we have three (3) total classes as the observations’ targets are Y = 1 ? Make sure to show CourseNana.COM

    the intermediate gradients being passed backwards to make these computations. (5pts) CourseNana.COM

  2. Given the objective function J = 41 (x1w1)4 34 (x1w1)3 + 23 (x1w1)2 (I know you already did this CourseNana.COM

1 2 1 0 4 4 CourseNana.COM

2. Given inputs X =
b = 1 0 2, what is the output of the following architecture? Show intermediate computa- CourseNana.COM

in HW3, but it will be relevant for HW4 as well): (a) What is the gradient ∂J (1pt)? CourseNana.COM

∂w1
(b) What are the locations of the extrema points for your objective function if x1 = 1? Recall CourseNana.COM

that to find these you set the derivative to zero and solve for, in this case, w1. (3pts) (c) What does J evaluate to at each of your extrema points, again when x1 = 1 (1pts)? CourseNana.COM

4 1 3 CourseNana.COM

2 Visualizing an Objection Function CourseNana.COM

For the next few parts we’ll use the objective function J = 41(x1w1)4 43(x1w1)3 + 23(x1w1)2 from the theory section. First let’s get a look at this objective function. Using x1 = 1, plot w1 vs J, varying w1 from -2 to +5 in increments of 0.1. You will put this figure in your report. CourseNana.COM

3 Exploring Model Initialization Effects CourseNana.COM

Let’s explore the effects of choosing different initializations for our parameter(s). In the theory part you derived the partial of J = 41(x1w1)4 43(x1w1)3 + 23(x1w1)2 with respect to the parameter w1. Now you will run gradient descent on this for four different initial values of w1 to see the effect of weight initialization and local solutions. CourseNana.COM

Perform gradient descent as follows:
ˆ Run through 100 epochs.
ˆ Use a learning rate of η = 0.1.
ˆ Evaluate J at each epoch so we can see how/if it converges. ˆ Assume our only data point is x = 1 CourseNana.COM

Do this for initialization choices: ˆ w1 =1. CourseNana.COM

ˆ w1 = 0.2. ˆ w1 = 0.9. ˆ w1 = 4. CourseNana.COM

In your report provide the four plots of epoch vs. J, superimposing on your plots the final value of w1 and J once 100 epochs has been reached. In addition, based on your visualization of the objective function in Section 2, describe why you think w1 converged to its final place in each case. CourseNana.COM

4 Explore Learning Rate Effects CourseNana.COM

Next we’re going to look at how your choice of learning rate can affect things. We’ll use the same objective function as the previous sections, namely J = 41 (x1w1)4 43 (x1w1)3 + 23 (x1w1)2. CourseNana.COM

For each experiment initialize w1 = 0.2 and use x = 1 as your only data point and once again run each experiment for 100 epochs. CourseNana.COM

The learning rates for the experiments are: ˆ η = 0.001 CourseNana.COM

ˆ η=0.01 ˆ η = 1.0 ˆ η = 5.0 CourseNana.COM

And once again, create plots of epoch vs J for each experiment and superimpose the final values of w1 and J. CourseNana.COM

NOTE: Due to the potential of overflow, you likely will want to have the evaluation of your J function in a try/except block where you break out of the gradient decent loop if an exception happens. CourseNana.COM

5 Adaptive Learning Rate CourseNana.COM

Finally let’s look at using an adaptive learning rate, a ́ la the Adam algorithm. CourseNana.COM

For this part of your homework assignment we’ll once again look to learn the w1 that minimizes J = 14(x1w1)4 43(x1w1)3 + 32(x1w1)2 given the data point x = 1. Run gradient descent with ADAM adaptive learning on this objective function for 100 epochs and produce a graph of epoch vs J. Ulti- mately, you are implementing ADAM from scratch here. CourseNana.COM

Your hyperparameter initializations are: ˆ w1 =0.2 CourseNana.COM

ˆη=5
ˆ ρ1 =0.9
ˆ ρ2 = 0.999 ˆ δ=108 CourseNana.COM

In your report provide a plot of epoch vs J. CourseNana.COM

6 Multi–Class Classification CourseNana.COM

Finally, in preparation for our next assignment, let’s do multi–class classification. For this we’ll use the architecture: CourseNana.COM

Input Fully Connected Softmax Output w/ Cross–Entropy Objective Function CourseNana.COM

Download the MNIST dataset from BBlearn and read in the training data. Train your system using the training data, keeping track of the value of your objective function with regards to the training set as you go. In addition, we’ll compute the cross entropy loss for the validation as well, the watch for overfitting. CourseNana.COM

Here’s some additional implementation details/specifications: CourseNana.COM

Implementation Details CourseNana.COM

  • ˆ  Make sure to remember to one-hot-encode your targets!. CourseNana.COM

  • ˆ  Use Xavier Initialzation to initialize your weights and biases. CourseNana.COM

  • ˆ  Use ADAM learning. CourseNana.COM

  • ˆ  Run your iterations until near-convergence appears (things are mostly flattening out). CourseNana.COM

  • ˆ  You can decide on your own about things like hyperparameters, batch sizes, z-scoring, etc.. Just report those design decisions in your report and state why you made them. CourseNana.COM

    In your final report provide: CourseNana.COM

  • ˆ  A graph of epoch vs. J for the training data and the validation data. Both plots should be on the same graph with legends indicating which is which. CourseNana.COM

  • ˆ  Your final training and validatino accuracy. Make sure predict the enumarated class using the argmax of the output as well as the original target enumerated classes. CourseNana.COM

  • ˆ  Your hyperparameter design decisions and why you made them. CourseNana.COM

CourseNana.COM

Submission CourseNana.COM

For your submission, upload to Blackboard a single zip file containing: CourseNana.COM

1. PDF Writeup 2. Source Code 3. readme.txt file CourseNana.COM

The readme.txt file should contain information on how to run your code to reproduce results for each part of the assignment. CourseNana.COM

The PDF document should contain the following: CourseNana.COM

1. Part (a) CourseNana.COM

2. Part (a) CourseNana.COM

3. Part CourseNana.COM

(a) (b) CourseNana.COM

4. Part (a) CourseNana.COM

5. Part (a) CourseNana.COM

6. Part (a) CourseNana.COM

(b) (c) CourseNana.COM

1: CourseNana.COM

Your solutions to the theory question CourseNana.COM

2: CourseNana.COM

Your plot. CourseNana.COM

3: CourseNana.COM

Your four plots of epoch vs. J with the terminal values of x and J superimposed on each. A description of why you think x converged to its final place in each case, justified by the CourseNana.COM

visualization of the objective function. CourseNana.COM

4: CourseNana.COM

Your four plots of epoch vs. J with the terminal values of x and J superimposed on each. CourseNana.COM

5: CourseNana.COM

Your plot of epoch vs J. CourseNana.COM

6: CourseNana.COM

A graph of epoch vs. J for the training data and the validation data. Both plots should be on the same graph with legends indicating which is which. CourseNana.COM

Your final training and validatino accuracy. Make sure predict the enumarated class using the argmax of the output as well as the original target enumerated classes. CourseNana.COM

Any additional design/hyperparameter decisions, and why. CourseNana.COM

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
Drexel代写,CS615代写,Deep Learning代写,Python代写,Hyperparameters代写,Classifier代写,Adaptive Learning Rate代写,Drexel代编,CS615代编,Deep Learning代编,Python代编,Hyperparameters代编,Classifier代编,Adaptive Learning Rate代编,Drexel代考,CS615代考,Deep Learning代考,Python代考,Hyperparameters代考,Classifier代考,Adaptive Learning Rate代考,Drexelhelp,CS615help,Deep Learninghelp,Pythonhelp,Hyperparametershelp,Classifierhelp,Adaptive Learning Ratehelp,Drexel作业代写,CS615作业代写,Deep Learning作业代写,Python作业代写,Hyperparameters作业代写,Classifier作业代写,Adaptive Learning Rate作业代写,Drexel编程代写,CS615编程代写,Deep Learning编程代写,Python编程代写,Hyperparameters编程代写,Classifier编程代写,Adaptive Learning Rate编程代写,Drexelprogramming help,CS615programming help,Deep Learningprogramming help,Pythonprogramming help,Hyperparametersprogramming help,Classifierprogramming help,Adaptive Learning Rateprogramming help,Drexelassignment help,CS615assignment help,Deep Learningassignment help,Pythonassignment help,Hyperparametersassignment help,Classifierassignment help,Adaptive Learning Rateassignment help,Drexelsolution,CS615solution,Deep Learningsolution,Pythonsolution,Hyperparameterssolution,Classifiersolution,Adaptive Learning Ratesolution,