CS 615 - Deep Learning
Assignment 2 - Objectives, Gradients, and Backpropagation Spring 2024
Introduction
In this assignment we’ll implement our output/objective modules and add computing the gradients to each of our modules.
Allowable Libraries/Functions
Recall that you cannot use any ML functions to do the training or evaluation for you. Using basic statistical and linear algebra function like mean, std, cov etc.. is fine, but using ones like train are not. Using any ML-related functions, may result in a zero for the programming component. In general, use the “spirit of the assignment” (where we’re implementing things from scratch) as your guide, but if you want clarification on if can use a particular function, DM the professor on slack.
Grading
Do not modify the public interfaces of any code skeleton given to you. Class and variable names should be exactly the same as the skeleton code provided, and no default parameters should be added or removed.
Theory
Testing fully-connected and activation layers’ gradient methods
Testing objective layers’ loss computations and gradients
Forwards-Backwards Propagate Dataset
TOTAL
Table 1: Grading Rubric
20pts 30pts 30pts 20pts 100pts
Theory
1 2 3
(10 points) Given H = 4 5 6 as an input, compute the gradients of the output with respect
to this input for the following activation layers.
(a) A ReLU layer (b) A Softmax layer
(c) A Logistic Sigmoid Layer (d) A Tanh Layer
(e) A Linear Layer
(2 points) Given H = 4 5 6 as an input, compute the gradient of the output a fully
connected layer with regards to this input if the fully connected layer has weights of W = 3 4
as biases b = −1 2.
(2 points) Given target values of Y = 1 and estimated values of Y = 0.3
0 ˆ 0.2 (a) A squared error objective function
compute the loss
for:
(b) A log loss (negative log likelihood) objective function) 1 0 0
ˆ
(1 point) Given target distributions of Y = 0 1 0 and estimated distributions of Y =
0.2 0.2 0.6
0.2 0.7 0.1 compute the cross entropy loss.
0 ˆ 0.2
(4 points) Given target values of Y = 1 and estimated values of Y = 0.3 compute the
gradient of the following objective functions with regards to their input, Yˆ: (a) A squared error objective function
(b) A log loss (negative log likelihood) objective function)
1 0 0 ˆ
(1 point) Given target distributions of Y = 0 1 0 and estimated distributions of Y = 0.2 0.2 0.6
0.2 0.7 0.1 compute the gradient of the cross entropy loss function, with regard to the input distributions Yˆ.
2
1 2 56
1.1 answer
1 1 1 1.a.1 1 1
0.09003 0.02237 0.06766 b. 0.02237 0.09003 0.06766
0.19661 0.10499 0.04518 c. 0.01767 0.00665 0.00247
0.41997 0.07065 0.00987 d. 0.00067 0.00019 0.00001
1 0 0 e. 0 1 0
1 3 5 2. 2 4 6 3.a.0.265 b.−0.7136 4.0.9831
0.2 5.a. −0.4
−0.625 b. −1.6667
−1.875 0.625
6. 0.625 −0.7143 0.625
1.25
3
Datasets
Kid Creative We will use this dataset for binary classification. This dataset consists of data for
673 people in a CSV file. This data for each person includes:
1. Observation Number (we’ll want to omit this)
2. Buy (binary target value, Y )
3. Income
4. Is Female
5. Is Married
6. Has College
7. Is Professional
8. Is Retired
9. Unemployed
10. Residence Length
11. Dual Income
12. Minors
13. Own
14. House
15. White
16. English
17. Prev Child Mag
18. Prev Parent Mag
We’ll omit the first column and use the second column for our binary target Y . The remaining 16 columns provide our feature data for our observation matrix X.
4
2 Update Your Codebase
In this assignment you’ll add gradient and backwards methods to your existing fully-connected layer and activation functions, and implement your objective functions. Again, make sure these work for a single observation and multiple observations (both stored as matrices). We will be unit testing these.
Adding Gradient Methods
Implement gradient methods for your fully connected layer, and all of your activation layers. The prototype of these methods should be:
#Input : None
#Output : An N by (D by D) tensor
def gradient(self): #TODO
Adding Backwards Methods
Add the backward method to our activation and fully-connected layers! You might want to consider having a default version in the abstract class Layer, although we’ll leave those design decisions to you. In general, the backward methods should takes as inputs the backcoming gradient, and returns the updated gradient to be backpropagated. The methods’ prototype should look like:
def backward( self , gradIn ): #TODO
Adding Objective Layers
Now let’s implement a module for each of our objective functions. These modules should again each be in their own file with the same filename as the class/module, and implement (at least) two methods:
-
eval - This method takes two explicit parameters, the target values and the incoming/estimated values, and computes and returns the loss (as a single float value) according to the module’s objective function. This should work both for a single observation, and a set of observations.
-
gradient - This method takes the same two explicit parameters as the eval method and computes and returns the gradient of the objective function using those parameters.
Implement these for the following objective functions: • Squared Error as SquaredError
• Log Loss (negative log likelihood) as LogLoss
• Cross Entropy as CrossEntropy
5
Your public interface is:
class XXX():
#Input : Y is an N by K matrix of target values .
#Input : Yhat is an N by K matrix of estimated values .
# Where N can be any integer>=1
#Output: A single floating point value.
def eval(self ,Y, Yhat): #TODO
#Input : Y is an N by K matrix of target values . #Input : Yhat is an N by K matrix of estimated values . #Output : An N by K matrix .
def gradient(self ,Y, Yhat): #TODO
6
3 Forwards-Backwards Propagate a Dataset
In HW1 you implemented forwards propagation for the Kid Creative dataset with the following architecture (note that I have added on a LogLoss layer):
Input→FC (1 output)→Logistic Sigmoid→LogLoss
Now let’s do forwards-backwards propagation. Using the code shown in the Objectives and Gradients slides, perform one forwards-backwards pass. In your report provide the gradient due to the first observation coming backwards out of:
1. Log Loss
2. Logistic Sigmoid Layer
3. Fully-Connected Layer
3.1 answer
Gradient out of Log Loss for the first observation: [1.99983166]
Gradient out of Logistic Sigmoid Layer for the first observation: [0.49995791]
Gradient out of Fully Connected Layer for the first observation: [-4.98777029e-05 3.93557355e-06
3.61814772e-05 -8.14937026e-06
-4.95566822e-05 -4.03753562e-05 8.27928752e-06 -2.90450515e-05
-9.80522957e-06 -6.82904674e-06 2.57671326e-05 3.63388693e-05
-2.55929394e-05 1.16773737e-05 4.75451430e-05 9.68959219e-06]
7
Submission
For your submission, upload to Blackboard a single zip file containing:
1. PDF Writeup 2. Source Code 3. readme.txt file
The readme.txt file should contain information on how to run your code to reproduce results for each part of the assignment.
The PDF document should contain the following:
-
Part 1: Your solutions to the theory question
-
Part 2: Nothing. We will unit test these, but again we encourage you do so yourself, particularly using the examples from the theory questions.
-
Part 3: The gradient pertaining to the first observation as it comes backwards out of the three modules.