1. Homepage
  2. Homework
  3. DDA4210 Advanced Machine Learning - Assignment 1: Bias-variance decomposition, gradient boosting and recommendation systems
This question has been solved

DDA4210 Advanced Machine Learning - Assignment 1: Bias-variance decomposition, gradient boosting and recommendation systems

Engage in a Conversation
CUHK Shen ZhenDDA4210Advanced Machine LearningBias-variance decompositiongradient boostingrecommendation systemsPCA

Assignment 1 of DDA4210 CourseNana.COM

  • The deadline is 23:59, October 31, 2023. CourseNana.COM

  • The weight of this assignment in the final grade is 10%. CourseNana.COM

  • Electronic submission: Turn in solutions electronically via Blackboard. Be sure to submit your homework as a single file. Please name your solution file as A1 studentID name CourseNana.COM

  • Note that late submissions will result in discounted scores: 0-24 hours 80%, 24-72 hours 50%, 72 or more hours0%. CourseNana.COM

  • Answer the questions in English. Otherwise, you’ll lose half of the points. CourseNana.COM

  • Collaboration policy: You need to solve all questions independently CourseNana.COM

    and collaboration between students is not allowed. CourseNana.COM

    1. Derive the bias-variance decomposition for the squared error loss function. That is, show that for a model fS trained on a dataset S to predict a target y(x) for each x, 10 points CourseNana.COM

    ES [Eout (fS)] = Ex[Bias(x) + Var(x)] given the following definitions: CourseNana.COM

    F(x) = ES [fS(x)]
    h 2i CourseNana.COM

    Eout (fS) = Ex (fS(x) y(x)) Bias(x) = (F (x) y(x))2 CourseNana.COM

    h 2i Var(x) = ES (fS (x) F (x)) CourseNana.COM

    2. Find the closed-form solutions of the following optimization problems (W RK×D,N D > K): CourseNana.COM

(a) minimizeW,b PNi=1 yi Wxi b2 4 points (b) minimizeW,b 21 PNi=1 yi Wxi b2 + λ2 W2F 6 points CourseNana.COM

3. Consider the following problem
minimize
1WΦY2F +λW2F CourseNana.COM

where∥·∥F denotestheFrobeniusnorm;YRK×N,Φ=[φ(x1)(x2),...,φ(xN)], xi RD, i = 1,2,...,N, and φ is the feature map induced by a kernel function k(·,·). Prove that for any x RD, we can make prediction as CourseNana.COM

y = Wφ(x) = Y (K + λI)1 k(x),
where K = ΦΦ and k(x) = [k(x1,x),k(x2,x),...,k(xN,x)]. 15 points CourseNana.COM

4. Compute the space and time complexities (in the form of big O, consider only the training stage) of the following algorithms: 9 points CourseNana.COM

  1. (1)  Ridge regression (Question 2(b)) with the closed-form solution CourseNana.COM

  2. (2)  PCA (N data points of D-dimension, choose d principal components) CourseNana.COM

  3. (3)  Neural network with architecture D H1 H2 K on a mini-batch of size B (consider only the forward process and neglect the computational costs of activation functions) CourseNana.COM

* Hint: the time complexity of A Rm×n × B Rn×l is O(mnl); the time complexities of eigenvalue decomposition and inverse of an n × n matrix are both O(n3). CourseNana.COM

5. Prove the convergence of the generic gradient boosting algorithm (Any- Boost). Specifically, suppose in the algorithm of AnyBoost (page 14 of Lecture 02), the gradient of the objective function L is L-Lipschitz continuous, i.e., there exists L > 0 such that CourseNana.COM

∥∇L(H) − ∇L(H)∥ ≤ LH H
holds for any H and H. Suppose in the algorithm, α is computed as CourseNana.COM

αt+1 = ⟨∇L(Ht),ht+1. Lht+1 2 CourseNana.COM

Then the ensemble model is updated as Ht+1 = Ht + αt+1ht+1. Prove that the algorithm either terminates at round T with ⟨∇L(Ht), ht+1or L(Ht) converges to a finite value, in which case CourseNana.COM

limt→∞⟨∇L(Ht), ht+1= 0. 2 CourseNana.COM

Hint: Using the following result: Suppose L : H → R and ∥∇L(F ) − ∇L(G)∥ ≤ CourseNana.COM

LFGholdsforanyF andGinH,thenL(F +wG)−L(F)w⟨∇L(F),G+ CourseNana.COM

Lw2 G2 holds for any w > 0. 14 points 2 CourseNana.COM

6. True or False? If False, then explain shortly. 10 points CourseNana.COM

  1. (1)  The inequality G(F,n) n2 holds for any model class F. CourseNana.COM

  2. (2)  The VC dimension of an axis-aligned rectangle in a 2D space is 4. CourseNana.COM

  3. (3)  The VC dimension of a circle in a 2D space is 4. CourseNana.COM

  4. (4)  The VC dimension of 1-nearest neighbor classifier in d-dimensional space is d + 1. CourseNana.COM

  5. (5)  Let d be the VC dimension of F. Then the inequality G(F,n) end d CourseNana.COM

    always holds. CourseNana.COM

7. In LASSO, the model class is defined as F = {x 7→ ⟨w,x: w1 α}. Suppose x Rd, y ∈ {−1,+1}, the training data are S = {(xi,yi)}ni=1, and max1in xiβ, where ∥ · ∥denotes the largest absolute element of a vector. CourseNana.COM

  1. (1)  Find an upper bound of the empirical Rademacher complexity RS(F) = Eσ supf∈F n1 Pni=1 σif (xi), where σi are the Rademacher variables. 16 points CourseNana.COM

  2. (2)  Suppose the loss function is the absolute loss. Use the inequality (high- lighted in blue) on page 30 and Lemma 5 on page 35 (i.e., R(l ◦ F) ηR(F)) of Lecture 03 to derive a generalization error bound for LASSO. 6 points CourseNana.COM

Hint: For question (1), please use the inequality a,b⟩ ≤ ∥a1band the following lemma:
Lemma 1. Let A Rn be a finite set of points with r = maxxA x2 and denote x = (x1,x2,...,xn). Then CourseNana.COM

"n# CourseNana.COM

Eσ max X xiσi rp2 log |A|, CourseNana.COM

xA
i
=1 CourseNana.COM

where |A| denotes the cardinality of set A and σi are the Rademacher variables. CourseNana.COM

8. Answer the following questions about recommendation systems.
(1) What are the differences between collaborative filtering and content-based
CourseNana.COM

methods? CourseNana.COM

4 points CourseNana.COM

CourseNana.COM

(2) Suppose we have m items and n users. Let Ω be the set of indices of observed ratings given by the users on the items. Provide the objective function of content-based recommendation for users, where the model class is a neural network with two hidden layers. You need to define the no- tations clearly. In addition, explain how to make recommendations for a CourseNana.COM

new user using your model. CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
CUHK Shen Zhen代写,DDA4210代写,Advanced Machine Learning代写,Bias-variance decomposition代写,gradient boosting代写,recommendation systems代写,PCA代写,CUHK Shen Zhen代编,DDA4210代编,Advanced Machine Learning代编,Bias-variance decomposition代编,gradient boosting代编,recommendation systems代编,PCA代编,CUHK Shen Zhen代考,DDA4210代考,Advanced Machine Learning代考,Bias-variance decomposition代考,gradient boosting代考,recommendation systems代考,PCA代考,CUHK Shen Zhenhelp,DDA4210help,Advanced Machine Learninghelp,Bias-variance decompositionhelp,gradient boostinghelp,recommendation systemshelp,PCAhelp,CUHK Shen Zhen作业代写,DDA4210作业代写,Advanced Machine Learning作业代写,Bias-variance decomposition作业代写,gradient boosting作业代写,recommendation systems作业代写,PCA作业代写,CUHK Shen Zhen编程代写,DDA4210编程代写,Advanced Machine Learning编程代写,Bias-variance decomposition编程代写,gradient boosting编程代写,recommendation systems编程代写,PCA编程代写,CUHK Shen Zhenprogramming help,DDA4210programming help,Advanced Machine Learningprogramming help,Bias-variance decompositionprogramming help,gradient boostingprogramming help,recommendation systemsprogramming help,PCAprogramming help,CUHK Shen Zhenassignment help,DDA4210assignment help,Advanced Machine Learningassignment help,Bias-variance decompositionassignment help,gradient boostingassignment help,recommendation systemsassignment help,PCAassignment help,CUHK Shen Zhensolution,DDA4210solution,Advanced Machine Learningsolution,Bias-variance decompositionsolution,gradient boostingsolution,recommendation systemssolution,PCAsolution,