Introduction to Deep Learning, Fall 2022
Please answer the questions using the jupyter notebook script.
For submission, please include your code, code output and answers in the script and submit it on sakai. Please don’t modify existing cells in the script. But you can add cells between the exercise statements.
To make markdown, please switch the cell type to markdown (from code) - you can hit ’m’ when you are in command mode - and use the markdown language. For a brief tutorial see: https://daringfireball. net/projects/markdown/syntax.
1 Problem 1. (10 points)
Prove whether the following functions are convex or not. (a) (5points)f(x1,x2)=(x1x2 −1)2,wherex1,x2 ∈R.
22 (b) (5points)f(w1,w2)=∥w1 −w2∥2,wherew1,w2 ∈R .
2 Problem 2. (10 points)
Identify stationary points for f(x) = 2x1 + 12x2 + x21 − 3x2? Are they local minimum/maximum; global minimum/maximum or saddle points? Why?
3 Problem 3. (80 points)
Given training data {xi , yi }ni=1 , each xi ∈ Rd and yi ∈ {+1, −1}, we try to solve the following logistic regression problem by gradient descent:
Test the algorithm using the “heart scale” dataset with n = 270 and d = 13: the matrix X is stored in the file “X heart”, and the vector y is stored in the file “y heart”. (“X heart” contains n lines, each line stores a vector
xi with d real numbers. “y heart” contains the y vector.)
1. (a) (5 points) Compute the gradient of f(w) w.r.t. w.
2. (b) (30 points) Implement the gradient descent algorithm with a fixed step size η. Find a small η1 such that the algorithm converges. Increase the step size to η2 so the algorithm cannot converge. Run 50 iterations and plot the iteration versus log(f(xk) − f(x∗)) plot for η1 and η2. In practice it is impossible to get the exact optimal solution x∗, so use the minimum value you computed as f(x∗) when you plot the figure. Report the f(x∗) value you used for generating the plots.
3. (c) (5 points) Write down the pseudo code of gradient descent with backtracking line search (σ = 0.01).
4. (d) (20 points) Implement the gradient descent algorithm with backtracking line search (σ = 0.01). Plot the
same iteration versus log(f(xk) − f(x∗)) plot.
5. (e) (20 points) Test your implementation (gradient descent with backtracking line search) on a larger dataset “epsilonsubset”. Plot the same iteration vs error plot.