CourseNana | CS615 Deep Learning Assignment 3 - Learning and Basic Architectures Spring 2024

CS 615 - Deep Learning CourseNana.COM

Assignment 3 - Learning and Basic Architectures Spring 2024 CourseNana.COM

In this assignment we will implement backpropagation and train/validate a few simple architectures using real datasets. CourseNana.COM

Allowable Libraries/Functions CourseNana.COM

Recall that you cannot use any ML functions to do the training or evaluation for you. Using basic statistical and linear algebra function like mean, std, cov etc.. is fine, but using ones like train are not. Using any ML-related functions, may result in a zero for the programming component. In general, use the “spirit of the assignment” (where we’re implementing things from scratch) as your guide, but if you want clarification on if can use a particular function, DM the professor on slack. CourseNana.COM

Grading CourseNana.COM

Do not modify the public interfaces of any code skeleton given to you. Class and variable names should be exactly the same as the skeleton code provided, and no default parameters should be added or removed. CourseNana.COM

Part 1 (Theory)
Part 2 (Visualizing Gradient Descent) CourseNana.COM

Part 3 (Update Weights method) CourseNana.COM

Part 4 (Linear Regression)
Part 5 (Logistic Regression)
TOTAL CourseNana.COM

Table 1: Grading Rubric CourseNana.COM

20pts 10pts 10pts 25pts 25pts 100pts CourseNana.COM

1 CourseNana.COM

Datasets
Medical Cost Personal Dataset For our regression task we’ll once again use the medical cost CourseNana.COM

dataset that consists of data for 1338 people in a CSV file. This data for each person includes: 1. age CourseNana.COM

2. sex
3. bmi
4. children
5. smoker
6. region
7. charges (target value, Y ) CourseNana.COM

This time I preprocessed the data for you to again convert the sex and smoker features into binary features and the region into a set of binary features (basically one-hot encoded this). In addition, we now included the charges information as we will want to predict this. CourseNana.COM

For more information, see https://www.kaggle.com/mirichoi0218/insurance CourseNana.COM

Kid Creative We will use this dataset for binary classification. This dataset consists of data for 673 people in a CSV file. This data for each person includes: CourseNana.COM

1. Observation Number (we’ll want to omit this) 2. Buy (binary target value, Y )
3. Income
4. Is Female CourseNana.COM

5. Is Married
6. Has College
7. Is Professional 8. Is Retired
9. Unemployed CourseNana.COM

10. Residence Length 11. Dual Income
12. Minors
13. Own CourseNana.COM

14. House CourseNana.COM

2 CourseNana.COM

15. White
16. English
17. Prev Child Mag 18. Prev Parent Mag CourseNana.COM

We’ll omit the first column and use the second column for our binary target Y . The remaining 16 columns provide our feature data for our observation matrix X. CourseNana.COM

3 CourseNana.COM

1 Theory CourseNana.COM

1. For the function J = (x1w1 − 5x2w2 − 2)2, where w = [w1, w2]T are our weights to learn:
(a) What are the partial gradients, ∂J and ∂J ? Show work to support your answer (6pts). CourseNana.COM

∂w2
(b). ∂J = -4 CourseNana.COM

∂w1 ∂J =20 CourseNana.COM

∂w1 ∂w2
(b) What are the value of the partial gradients, given current values of w = [0, 0]T , x = [1, 1] CourseNana.COM

(4pts)?
2. Given the objective function J = 14 (x1w1)4 − 34 (x1w1)3 + 23 (x1w1)2: CourseNana.COM

(a) What is the gradient ∂J (2pts)? ∂w1 CourseNana.COM
(b) What are the locations of the extrema points for this objective function J if x1 = 1? Recall that to find these you take the derivative of the objective function with respect to the unknown, set that equal to zero and solve for said unknown (in this case, w1). (5pts) CourseNana.COM
(c) What does J evaluate to at each of your extrema points, again when x1 = 1 (3pts)? CourseNana.COM

1.1 answer CourseNana.COM

1.(a).J = (x1w1 − 5x2w2 − 2)2,we have u = x1w1 − 5x2w2 − 2 and J = u2 ∂J = 2u* ∂u CourseNana.COM

∂w1 ∂w1 where, ∂u = x1. CourseNana.COM

∂w1
So, ∂J =2(x1w1 − 5x2w2 − 2)x1 CourseNana.COM

∂w1
we also have, ∂u =−5x2 CourseNana.COM

∂w2
So, ∂J =2(x1w1 − 5x2w2 − 2)(−5x2) CourseNana.COM

∂w2
2.(a). ∂J = x2w (x2w2 −4x w +3) CourseNana.COM

∂w1 1111 11 (b),setx1=1,∂J =w1(w12−4w1+3) CourseNana.COM

∂w1
w1(w12 − 4w1 + 3) = 0,w1=0,1,3 CourseNana.COM

(c).For w1=0,J = 0. For w1=1,J = 5 . CourseNana.COM

12 For w1=3,J = -2.25. CourseNana.COM

4 CourseNana.COM

2 Visualizing Gradient Descent CourseNana.COM

In this section we want to visualize the gradient descent process for the following function (which was part of one of the theory questions): CourseNana.COM

J = (x1w1 − 5x2w2 − 2)2
Note that this is more of a toy problem to explore the idea of gradient-based learning than it is a CourseNana.COM

deep learning architecture. CourseNana.COM

Hyperparameter choices will be as follows: • Initialize your weights to zero.
• Set the learning rate to η = 0.01.
• Terminate after 100 epochs. CourseNana.COM

Using the partial gradients you computed in the theory question, perform gradient descent, using x = [1,1]. After each training epoch, evaluate J so that you can plot w1 vs w2, vs J as a 3D line plot. Put this figure in your report. CourseNana.COM

5 CourseNana.COM

2.1 answer CourseNana.COM

Figure 1: Enter Caption CourseNana.COM

3 Updating Fully Connected Layer’s Weights and Biases CourseNana.COM

We also need to add an updateWeights method for our Fully Connected layer. This method takes a backcoming gradient and a learning rate as parameters, and updates its weights and biases according to the formulas in lecture. The method’s prototype should look like: CourseNana.COM

def updateWeights( self , gradIn , eta = 0.0001): #TODO CourseNana.COM

6 CourseNana.COM

4 Linear Regression CourseNana.COM

In this section you’ll use your modules to train a linear regression model for the medical cost dataset. The architecture of your linear regression should be as follows: CourseNana.COM

Input → Fully-Connected → Squared-Error-Objective Your code should do the following: CourseNana.COM

Read in the dataset to assemble X and Y (recall that our target Y is the charges column for this dataset). CourseNana.COM
Shuffle the rows of the dataset (both X and Y , together) and use approximately 2/3 for training and 1/3 for validating. CourseNana.COM
Train, via gradient learning, your linear regression system using the training data. Refer to the pseudocode in the lecture slides on how this training loop should look. Initialize your weights to be random values in the range of ±10−4. Play with your learning rate such that you get to (near) convergence in a reasonable amount of time with stability. Terminate the learning process when the absolute change in the mean squared error on the training data is less than 10−10 or you pass 100,000 epochs. During training, keep track of the mean squared error (MSE) for both the training and the validation sets so that we can plot these as a function of the epoch. CourseNana.COM

In your report provide:
1. Your plots of training and validation MSE vs epoch.
2. Your final RMSE for the training and validation data. 3. Your final SMAPE for the training and validation data. CourseNana.COM

4.1 answer CourseNana.COM

(a). CourseNana.COM

7 CourseNana.COM

Figure 2: MSE vs epoch CourseNana.COM

(b),(c):
Training RMSE: 8544.019074654007 Validation RMSE: 8693.1881630646 Training SMAPE: 51.3827515030682 Validation SMAPE: 54.37187011111296 CourseNana.COM

8 CourseNana.COM

5 Logistic Regression CourseNana.COM

Next we’ll use a logistic regression model on the kid creative dataset to predict if a user will purchase a product. The architecture of this model should be: CourseNana.COM

Input → Fully-Connected → Sigmoid-Activation → Log-Loss-Objective Your code should do the following: CourseNana.COM

Read in the dataset to assemble X and Y (rcall that our target Y is the Buy column for this dataset). CourseNana.COM
Shuffle the rows of the dataset (both X and Y , together) and use approximately 2/3 for training and 1/3 for validating. CourseNana.COM
Train, via gradient learning, your logistic regression system using the training data. Initialize your weights to be random values in the range of ±10−4. Play with your learning rate such that you get to (near) convergence in a reasonable amount of time with stability. Terminate the learning process when the absolute change in the log loss is less than 10−10 or you pass 100, 000 epochs. During training, keep track of the log loss for both the training and the validation sets so that we can plot these as a function of the epoch. CourseNana.COM

In your report provide: CourseNana.COM

Your plots of training and validation log loss vs epoch. CourseNana.COM
Assigning an observation to class 1 if the model outputs a value greater than 0.5, report the training and validation accuracy. CourseNana.COM

5.1 answer CourseNana.COM

(a). CourseNana.COM

9 CourseNana.COM

Figure 3: log loss vs epoch CourseNana.COM

(b).
Training Accuracy: 0.9333333333333333 Validation Accuracy: 0.8878923766816144 CourseNana.COM

10 CourseNana.COM

Submission CourseNana.COM

For your submission, upload to Blackboard a single zip file containing: CourseNana.COM

1. PDF Writeup 2. Source Code 3. readme.txt file CourseNana.COM

The readme.txt file should contain information on how to run your code to reproduce results for each part of the assignment. CourseNana.COM

The PDF document should contain the following: CourseNana.COM

1. Part 1: Your solutions to the theory question 2. Part 2: Your plot.
3. Part 3: Nothing.
4. Part 4: Your plot and requested statistics. CourseNana.COM

5. Part 5: Your plot and requested accuracies. CourseNana.COM

11 CourseNana.COM

CS615 Deep Learning Assignment 3 - Learning and Basic Architectures Spring 2024

Get in Touch with Our Experts