CS 615 - Deep Learning
Assignment 3 - Learning and Basic Architectures Spring 2024
Introduction
In this assignment we will implement backpropagation and train/validate a few simple architectures using real datasets.
Allowable Libraries/Functions
Recall that you cannot use any ML functions to do the training or evaluation for you. Using basic statistical and linear algebra function like mean, std, cov etc.. is fine, but using ones like train are not. Using any ML-related functions, may result in a zero for the programming component. In general, use the “spirit of the assignment” (where we’re implementing things from scratch) as your guide, but if you want clarification on if can use a particular function, DM the professor on slack.
Grading
Do not modify the public interfaces of any code skeleton given to you. Class and variable names should be exactly the same as the skeleton code provided, and no default parameters should be added or removed.
Part 1 (Theory)
Part 2 (Visualizing Gradient Descent)
Part 3 (Update Weights method)
Part 4 (Linear Regression)
Part 5 (Logistic Regression)
TOTAL
Table 1: Grading Rubric
20pts 10pts 10pts 25pts 25pts 100pts
1
Datasets
Medical Cost Personal Dataset For our regression task we’ll once again use the medical cost
dataset that consists of data for 1338 people in a CSV file. This data for each person includes: 1. age
2. sex
3. bmi
4. children
5. smoker
6. region
7. charges (target value, Y )
This time I preprocessed the data for you to again convert the sex and smoker features into binary features and the region into a set of binary features (basically one-hot encoded this). In addition, we now included the charges information as we will want to predict this.
For more information, see https://www.kaggle.com/mirichoi0218/insurance
Kid Creative We will use this dataset for binary classification. This dataset consists of data for 673 people in a CSV file. This data for each person includes:
1. Observation Number (we’ll want to omit this)
2. Buy (binary target value, Y )
3. Income
4. Is Female
5. Is Married
6. Has College
7. Is Professional
8. Is Retired
9. Unemployed
10. Residence Length
11. Dual Income
12. Minors
13. Own
14. House
2
15. White
16. English
17. Prev Child Mag
18. Prev Parent Mag
We’ll omit the first column and use the second column for our binary target Y . The remaining 16 columns provide our feature data for our observation matrix X.
3
1 Theory
1. For the function J = (x1w1 − 5x2w2 − 2)2, where w = [w1, w2]T are our weights to learn:
(a) What are the partial gradients, ∂J and ∂J ? Show work to support your answer (6pts).
∂w2
(b). ∂J = -4
∂w1 ∂J =20
∂w1 ∂w2
(b) What are the value of the partial gradients, given current values of w = [0, 0]T , x = [1, 1]
(4pts)?
2. Given the objective function J = 14 (x1w1)4 − 34 (x1w1)3 + 23 (x1w1)2:
-
(a) What is the gradient ∂J (2pts)? ∂w1
-
(b) What are the locations of the extrema points for this objective function J if x1 = 1? Recall that to find these you take the derivative of the objective function with respect to the unknown, set that equal to zero and solve for said unknown (in this case, w1). (5pts)
-
(c) What does J evaluate to at each of your extrema points, again when x1 = 1 (3pts)?
1.1 answer
1.(a).J = (x1w1 − 5x2w2 − 2)2,we have u = x1w1 − 5x2w2 − 2 and J = u2 ∂J = 2u* ∂u
∂w1 ∂w1 where, ∂u = x1.
∂w1
So, ∂J =2(x1w1 − 5x2w2 − 2)x1
∂w1
we also have, ∂u =−5x2
∂w2
So, ∂J =2(x1w1 − 5x2w2 − 2)(−5x2)
∂w2
2.(a). ∂J = x2w (x2w2 −4x w +3)
∂w1 1111 11 (b),setx1=1,∂J =w1(w12−4w1+3)
∂w1
w1(w12 − 4w1 + 3) = 0,w1=0,1,3
(c).For w1=0,J = 0. For w1=1,J = 5 .
12 For w1=3,J = -2.25.
4
2 Visualizing Gradient Descent
In this section we want to visualize the gradient descent process for the following function (which was part of one of the theory questions):
J = (x1w1 − 5x2w2 − 2)2
Note that this is more of a toy problem to explore the idea of gradient-based learning than it is a
deep learning architecture.
Hyperparameter choices will be as follows:
• Initialize your weights to zero.
• Set the learning rate to η = 0.01.
• Terminate after 100 epochs.
Using the partial gradients you computed in the theory question, perform gradient descent, using x = [1,1]. After each training epoch, evaluate J so that you can plot w1 vs w2, vs J as a 3D line plot. Put this figure in your report.
5
2.1 answer
Figure 1: Enter Caption
3 Updating Fully Connected Layer’s Weights and Biases
We also need to add an updateWeights method for our Fully Connected layer. This method takes a backcoming gradient and a learning rate as parameters, and updates its weights and biases according to the formulas in lecture. The method’s prototype should look like:
def updateWeights( self , gradIn , eta = 0.0001): #TODO
6
4 Linear Regression
In this section you’ll use your modules to train a linear regression model for the medical cost dataset. The architecture of your linear regression should be as follows:
Input → Fully-Connected → Squared-Error-Objective Your code should do the following:
-
Read in the dataset to assemble X and Y (recall that our target Y is the charges column for this dataset).
-
Shuffle the rows of the dataset (both X and Y , together) and use approximately 2/3 for training and 1/3 for validating.
-
Train, via gradient learning, your linear regression system using the training data. Refer to the pseudocode in the lecture slides on how this training loop should look. Initialize your weights to be random values in the range of ±10−4. Play with your learning rate such that you get to (near) convergence in a reasonable amount of time with stability. Terminate the learning process when the absolute change in the mean squared error on the training data is less than 10−10 or you pass 100,000 epochs. During training, keep track of the mean squared error (MSE) for both the training and the validation sets so that we can plot these as a function of the epoch.
In your report provide:
1. Your plots of training and validation MSE vs epoch.
2. Your final RMSE for the training and validation data.
3. Your final SMAPE for the training and validation data.
4.1 answer
(a).
7
Figure 2: MSE vs epoch
(b),(c):
Training RMSE: 8544.019074654007
Validation RMSE: 8693.1881630646
Training SMAPE: 51.3827515030682
Validation SMAPE: 54.37187011111296
8
5 Logistic Regression
Next we’ll use a logistic regression model on the kid creative dataset to predict if a user will purchase a product. The architecture of this model should be:
Input → Fully-Connected → Sigmoid-Activation → Log-Loss-Objective Your code should do the following:
-
Read in the dataset to assemble X and Y (rcall that our target Y is the Buy column for this dataset).
-
Shuffle the rows of the dataset (both X and Y , together) and use approximately 2/3 for training and 1/3 for validating.
-
Train, via gradient learning, your logistic regression system using the training data. Initialize your weights to be random values in the range of ±10−4. Play with your learning rate such that you get to (near) convergence in a reasonable amount of time with stability. Terminate the learning process when the absolute change in the log loss is less than 10−10 or you pass 100, 000 epochs. During training, keep track of the log loss for both the training and the validation sets so that we can plot these as a function of the epoch.
In your report provide:
-
Your plots of training and validation log loss vs epoch.
-
Assigning an observation to class 1 if the model outputs a value greater than 0.5, report the training and validation accuracy.
5.1 answer
(a).
9
Figure 3: log loss vs epoch
(b).
Training Accuracy: 0.9333333333333333
Validation Accuracy: 0.8878923766816144
10
Submission
For your submission, upload to Blackboard a single zip file containing:
1. PDF Writeup 2. Source Code 3. readme.txt file
The readme.txt file should contain information on how to run your code to reproduce results for each part of the assignment.
The PDF document should contain the following:
1. Part 1: Your solutions to the theory question
2. Part 2: Your plot.
3. Part 3: Nothing.
4. Part 4: Your plot and requested statistics.
5. Part 5: Your plot and requested accuracies.
11