ELEC 576 / COMP 576 – Fall 2024 Assignment 1
Due: Oct 10, 2024 11 a.m. via Canvas
Submission Instructions
Every student must submit their work in a zip file in the following format: netid-assignment1.zip. You should also provide intermediate and final results as well as any necessary code. Submit your zip file on Canvas.
GPU Resource from AWS
To accelerate the training using GPU, you can optionally use Amazon Web Services(AWS)
GPU instance using AWS Education credits. You can also get additional AWS credits from
Github Student Developer Pack.
After having an AWS account, You can either create a fresh ubuntu instance and install
software dependencies by yourself or use off-the-shelf TensorFlow ready image from AWS
Marketplace.
1 Backpropagation in a Simple Neural Network
In this problem, you will learn how to implement the backpropagation algorithm for a simple neural network. To make your job easier, we provide you with starter code in three layer neural network.py. You will fill in this starter code to build a 3-layer neural network (see Fig. 1) and train it using backpropagation.
a) Dataset
We will use the Make-Moons dataset available in Scikit-learn. Data points in this dataset form two interleaving half circles corresponding to two classes (e.g. “female” and “male”). In the main() function of three layer neural network.py, uncomment the “generate and
visualize Make-Moons dataset” section (see below) and run the code. Include the generated figure in your report.
# generate and visualize Make-Moons dataset
X, y = generate_data() plt.scatter(X[:, 0], X[:, 1], s=40, c=y, cmap=plt.cm.Spectral)
b) Activation Function
Tanh, Sigmoid and ReLU are popular activation functions used in neural networks. You will implement them and their derivatives.
-
Implement function actFun(self, z, type) in three layer neural network.py. This function computes the activation function where z is the net input and type ∈ {‘Tanh’, ‘Sigmoid’, ‘ReLU’}.
-
Derive the derivatives of Tanh, Sigmoid and ReLU
-
Implement function diff actFun(self, z, type) in three layer neural network.py. This function computes the derivatives of Tanh, Sigmoid and ReLU.
c) Build the Neural Network
Lets now build a 3-layer neural network of one input layer, one hidden layer, and one output layer. The number of nodes in the input layer is determined by the dimensionality of our data, 2. The number of nodes in the output layer is determined by the number of classes we have, also 2. The input to the network will be x- and y- coordinates and its output will be two probabilities, one for class 0 (“female”) and one for class 1 (“male”). The network looks like the following.
Mathematically, the network is defined as follows.
z1 =W1x+b1 (1) a1 = actFun(z1) (2) z2 = W2a1 + b2 (3) a2 = yˆ = softmax(z2) (4)
where zi is the input of layer i and ai is the output of layer i after applying the activation function. θ ≡ {W1, b1, W2, b2} are the parameters of this network, which we need to learn from the training data.
If we have N training examples and C classes then the loss for the prediction yˆ with respect to the true labels y is given by:
2
Figure 1: A three-layer neural network
L(y,yˆ)=−N1 XXyn,ilogyˆn,i (5) n∈N i∈C
Note that y are one-hot-encoding vectors and yˆ are vectors of probabilities.
1. In three layer neural network.py, implement the function feedforward(self, X, actFun). This function builds a 3-layer neural network and computes the two probabilities (self.probs in the code or a2 in Eq. 4), one for class 0 and one for class 1. X is the input data, and actFun is the activation function. You will pass the function actFun you implemented in part b into feedforward(self, X, actFun).
2. In three layer neural network.py, fill in the function calculate loss(self, X, y). This function computes the loss for prediction of the network. Here X is the input data, and y is the given labels.
d) Backward Pass - Backpropagation
It’s time to implement backpropagation, finally! 1. Derive the following gradients: ∂ L , ∂ L , ∂ L ,
2. In three layer neural network.py, implement the function backprop(self, X, y). Again, X is the input data, and y is the given labels. This function implements backpropagation (i.e., computing the gradients above).
3
∂W2 ∂b2 ∂W1
∂ L mathematically ∂b1
e) Time to Have Fun - Training!
You already have all components needed to run the training. In three layer neural network.py,
we also provide you function visualize decision boundary(self, X, y) to visualize the
decision boundary. Let’s have fun with your network now.
-
Train the network using different activation functions (Tanh, Sigmoid and ReLU). Describe and explain the differences that you observe. Include the figures generated in your report. In order to train the network, uncomment the main() function in three layer neural network.py, take out the following lines, and
run three layer neural network.py.
plt.scatter(X[:, 0], X[:, 1], s=40, c=y, cmap=plt.cm.Spectral)plt.show()
-
Increase the number of hidden units (nn hidden dim) and retrain the network us- ing Tanh as the activation function. Describe and explain the differences that you observe. Include the figures generated in your report.
f) Even More Fun - Training a Deeper Network!!!:
Let’s have some more fun and be more creative now. Write your own n layer neural network.py
that builds and trains a neural network of n layers. Your code must be able to accept
as parameters (1) the number of layers and (2) layer size. We provide you hints below
to help you organize and implement the code, but if you have better ideas, please
feel free to implement them and ignore our hints. In your report, please tell us why
you made the choice(s) you did.
Hints:
-
Create a new class, e.g DeepNeuralNetwork, that inherits NeuralNetwork in
three layer neural network.py
-
In DeepNeuralNetwork, change function feedforward, backprop, calculate loss and fit model
-
Create a new class, e.g. Layer(), that implements the feedforward and back- prop steps for a single layer in the network
-
Use Layer.feedforward to implement DeepNeuralNetwork.feedforward
-
Use Layer.backprop to implement DeepNeuralNetwork.backprop
4
6. Notice that we have L2 weight regularizations in the final loss function in ad- dition to the cross entropy. Make sure you add those regularization terms in DeepNeuralNetwork.calculate loss and their derivatives in DeepNeuralNetwork.fit model.
Train your network on the Make Moons dataset using different number of layers, different layer sizes, different activation functions and, in general, different network configurations. In your report, include generated images and describe what you ob- serve and what you find interesting (e.g. decision boundary of deep vs shallow neural networks).
Next, train your network on another dataset different from Make Moons. You can choose datasets provided by Scikit-learn (more details here) or any dataset of your interest. Make sure that you have the correct number of input and output nodes. Again, play with different network configurations. In your report, describe the dataset you choose and tell us what you find interesting.
Be curious and creative!!! You are exploring Deep Learning. :)
2 Training a Simple Deep Convolutional Network on MNIST
Deep Convolutional Networks (DCN) have been state-of-the-art in many perceptual tasks including object recognition, image segmentation, and speech recognition. In this problem, you will build and train a simple 5-layer DCN on MNIST Dataset. We provide you with starter in the attached .py file on the Canvas assignment page You will fill in this starter code to complete task (a), (b), and (c) below. Also, since one of the purposes of this assignment is to get you familiar with Pytorch, please review this online tutorial . You are encouraged (but not required) to re-organize the starter code but be sure to explain your code in the report.
MNIST is a dataset of handwritten digits (from 0 to 9). This dataset is one of the most popular benchmarks in machine learning and deep learning. If you de- velop an algorithm to learn from static images for tasks such as object recognition, most likely, you will want to debug your algorithm on MNIST first before testing it on more complicated datasets such as CIFAR10 and SVHN. There are also modified versions of MNIST, such as permutation invariant MNIST, which will come in handy for benchmarking at times.
More details, the MNIST data is split into three parts: 55,000 data points of training data (mnist.train), 10,000 points of test data (mnist.test), and 5,000 points of validation data (mnist.validation). The digits have been size-normalized and centered
5
in a fixed-size image. MNIST images are of size 28 x 28. When loaded in Tensorflow, each image is flattened into a vector of 28x28=784 numbers. Each MNIST image will have a corresponding label which is a number between 0 and 9 corresponding to the digit that is drawn in that image.
a) Build and Train a 4-layer DCN
The architecture of the DCN that you will implement is as follows.
conv1(5-5-1-32) - ReLU - maxpool(2-2) - conv2(5-5-32-64) - ReLU - maxpool(2-2)
- fc(1024) - ReLU - DropOut(0.5) - Softmax(10)
More details on the architecture can be found in this tutorial Deep MNIST for Expert.
Follow the tutorial Deep MNIST for Expert to fill in dcn mnist.py. Particularly,
-
Read the tutorial Deep MNIST for Expert to learn how to use Tensorflow.
-
Complete functions weight variable(shape), bias variable(shape), conv2d(x, W), max pool 2x2(x) in dcn mnist.py. The first two functions initialize the weights and biases in the network, and the last two functions will implement convolution and max-pooling operators, respectively.
-
Build your network: In dcn mnist.py, you will see ”FILL IN THE CODE BELOW TO BUILD YOUR NETWORK”. Complete the following sections
in dcn mnist.py: placeholders for input data and input labeles, first convolutional layer, convolutional layer, densely connected layer, dropout, softmax. -
Set up Training: In dcn mnist.py, you will see ”FILL IN THE FOLLOWING CODE TO SET UP THE TRAINING”. Complete section setup training in dcn mnist.py.
-
Run Training: Study the rest of dcn mnist.py. Notice that, different from the tutorial Deep MNIST for Expert, I use the summary operation (e.g. summary op, summary writer, ...) to monitor the training. Here, I only monitor the training loss value. Now, run dcn mnist.py. What is the final test accuracy of your net- work? Note that I set the batch size to 50, and to save time, I set the max step to only 5500. Batch size is the number of MNIST images that are sent to the DCN at each iteration, and max step is the maximum number of training iter- ations. max step = 5500 means the training will stop after 5500 iterations no matter what. When batch size is 50, 5500 iterations is equivalent to 5 epochs.
6
Remind that, in each epoch, the DCN will see the whole training set once. In this case, since there are 55K training images, each epoch is consisted of 55K/50 = 1100 iterations.
6. Visualize Training: In your terminal, type tensorboard --logdir=path/to/results where path/to/results is result dir in dcn mnist.py. Follow the instruction in your terminal to visualize the training loss in the training. You will be asked to navigate to a website to see the results, e.g. http://172.28.29.81:6006. Include the figures generated by TensorBoard in your report.
b) More on Visualizing Your Training
In part (a) of this problem, you only monitor the training loss during the train- ing. Now, let’s visualize your training more! Study dcn mnist.py and this tutorial TensorBoard: Visualizing Learning to learn how to monitor a set of variables during the training. Then, modify dcn mnist.py so that you can monitor the statistics (min, max, mean, standard deviation, histogram) of the following terms after each 100 iterations: weights, biases, net inputs at each layer , activations after ReLU at each layer, activations after Max-Pooling at each layer. Also monitor the test and validation error after each 1100 iterations (equivalently, after each epoch). Run the training again and visualize the monitored terms in TensorBoard. Include the resultant figures in your report.
c) Time for More Fun!!!
As you have noticed, I use ReLU non-linearity, random initialization, and Adam train- ing algorithm in dcn mnist.py. In this section, run the network training with different non- linearities (tanh, sigmoid, leaky-ReLU, MaxOut,...), initialization techniques (Xavier...) and training algorithms (SGD, Momentum-based Methods, Adagrad..). Make sure you still monitor the terms specified in part (b). Include the figures generated by TensorBoard and describe what you observe. Again, be curious and creative! You are encouraged to work in groups, but you need to submit separate reports.
Collaboration Policy
Collaboration both inside and outside class is encouraged. You may talk to other students for general ideas and concepts, but individual write-ups must be done independently.
Plagiarism
Plagiarism of any form will not be tolerated. You are expected to credit all sources explicitly.