1. Homepage
  2. Homework
  3. CMPSC 448: Machine Learning and AI Homework 3: Perceptron, Logistic Regression, Decision Tree, Hard Support Vector Machines and Soft Support Vector Machines
This question has been solved

CMPSC 448: Machine Learning and AI Homework 3: Perceptron, Logistic Regression, Decision Tree, Hard Support Vector Machines and Soft Support Vector Machines

Engage in a Conversation
USPSUPennsylvania State UniversityCMPSC 448CMPSC448Machine Learning and AIPerceptron Logistic Regression Decision Tree Hard Support Vector MachinesSoft Support Vector Machines

Instruction CourseNana.COM

CMPSC 448: Machine Learning and AI CourseNana.COM

Homework 3 (Due Sunday, February 26th, 11:59 PM) CourseNana.COM

This HW only includes theory problems. Please note that you need to submit your solutions in a single PDF file via Gradescope. CourseNana.COM

Perceptron CourseNana.COM

Problem 1. (10 points) As we discussed in the lecture, the Perceptron algorithm will only converge if the data is linearly separable, i.e., there exists a linear classifier that can perfectly classify the training data. CourseNana.COM


CourseNana.COM

If the training data S = {(x1,y1),(x2,y2),...,(xn,yn)} where xi Rd and yi ∈ {−1,+1} is not linearly separable, then there is a simple trick to force the data to be linearly sep- arable and then apply the Perceptron algorithm as follows. If you have n data points in d dimensions, map data point xi to the (d + n)-dimensional point [xi, ei] Rd+n, where ei ∈ {0, 1}n is a n-dimensional vector of zeros, except for the ith position, which is 1 (e.g., e4 = [0,0,0,1,0,...,0]). CourseNana.COM

Show that if you apply this mapping, the data becomes linearly separable (you may wish to do so by providing a weight vector w in (d + n)-dimensional space that successfully separates the data). CourseNana.COM


CourseNana.COM


CourseNana.COM

Logistic Regression CourseNana.COM

Problem 2. (15 points) Recall that in lectures we showed that the Logistic Regression for binary classification boils down to solving the following optimization problem (training error) over n training samples: CourseNana.COM

n
f(w) =  log 1 + eyiwxi CourseNana.COM

i=1 CourseNana.COM

a) Compute the gradient of f(w).
b) Please write the pseudocode for using GD to optimize the f(w).
c) Argue that if data is linearly separable (i.e., can be perfectly classified by a linear model), in the solution obtained by GD, some of the coefficients converge to infinity. CourseNana.COM

Problem 3. (20 points) In this problem we aim at generalizing the Logistic Regression algorithm to solve multi-class classification problem, the setting where the label space in- cludes three or more classes, i.e., Y = {1,2,...,K} for K 3. To do so, consider a training data S = {(x1, y1), (x2, y2), . . . , (xn, yn)} where the feature vectors are xi Rd and CourseNana.COM

1 CourseNana.COM

yi ∈ Y,i = 1,2,...,n. Here we assume that we have K different classes and each input xi is a d dimensional vector. We wish to fit a linear model in a similar spirit to logistic regression, and we will use the softmax function to link the linear inputs to the categorical output, instead of the logistic function. CourseNana.COM

One way to generalize binary model to multiclass case is to have K sets of parameters wk for k = 1,2,...,K, and compute the probability distribution of the output as follows (for simplicity we ignore the intercept parameters bk), CourseNana.COM

exp(wk x)
P[y = k | x;w1,...,wK] = ?Ki=1 exp(wi x) for k = 1,...,K CourseNana.COM

Note that we indeed have a probability distribution, as Kk=1 P [y = k | x, w1, . . . , wK ] = 1. To make the model identifiable, we will fix wK to 0, which means we have K 1 sets of parameters {w1, . . . , wK1} , where each one is a d-dimensional vector, to learn. To see this, note that parameter vector for class K can be inferred from rest as the sum of probabilitites for all labels sums up to one (similar to binary classification where we only had a single parameter vector). To make a prediction, we can extend the binary prediction rule to multiclass setting by letting the prediction function to be: CourseNana.COM

f(multiclass )(x) = argmax P[y = k|x] k∈{1,2,...,K} CourseNana.COM

As in logistic regression, we will assume that each yn is i.i.d., which leads to the following likelihood function, CourseNana.COM

P[y1,y2,...,yn | x1,x2,...,xn;w1,...,wK] = ?P[yi | xi,w1,...,wK] CourseNana.COM

i=1 CourseNana.COM

a) Please write down explicitly the log likelihood function and simplify it as much as you can to derive the training loss needs to be optimized.
b) Compute the gradient of likelihood with respect to each wk and simply it
c) Derive the stochastic gradient descent (SGD) update for multiclass logistic regression. CourseNana.COM


CourseNana.COM


CourseNana.COM

Decision Trees CourseNana.COM

Problem 4. (10 points) In this problem, you will investigate building a decision tree for a binary classification problem. The training data is given in Table 1 with 16 instances that will be used to learn a decision tree for predicting whether a mushroom is edible or not based on its attributes (Color, Size and Shape). Please note the label set is a binary set {Yes, No}. CourseNana.COM

Which attribute would the algorithm choose to use for the root of the tree. Show the details of your calculations.
Recall from lectures that if we let
S denote the data set at current node, A denote the feature with values v ∈ V, H denote the entropy function, and Sv denote the subset of S for which the feature A has the value v, the gain of a split along the feature A, denoted InfoGain(S, A) is computed as: CourseNana.COM

Instance Color Size Shape Edible? D1 Yellow Small Round Yes CourseNana.COM

  CourseNana.COM

D2 CourseNana.COM

Yellow CourseNana.COM

Small CourseNana.COM

Round CourseNana.COM

No CourseNana.COM

D3 CourseNana.COM

Green CourseNana.COM

Small CourseNana.COM

Irregular CourseNana.COM

Yes CourseNana.COM

D4 CourseNana.COM

Green CourseNana.COM

Large CourseNana.COM

Irregular CourseNana.COM

No CourseNana.COM

D5 CourseNana.COM

Yellow CourseNana.COM

Large CourseNana.COM

Round CourseNana.COM

Yes CourseNana.COM

D6 CourseNana.COM

Yellow CourseNana.COM

Small CourseNana.COM

Round CourseNana.COM

Yes CourseNana.COM

D7 CourseNana.COM

Yellow CourseNana.COM

Small CourseNana.COM

Round CourseNana.COM

Yes CourseNana.COM

D8 CourseNana.COM

Yellow CourseNana.COM

Small CourseNana.COM

Round CourseNana.COM

Yes CourseNana.COM

D9 CourseNana.COM

Green CourseNana.COM

Small CourseNana.COM

Round CourseNana.COM

No CourseNana.COM

D10 CourseNana.COM

Yellow CourseNana.COM

Large CourseNana.COM

Round CourseNana.COM

No CourseNana.COM

D11 CourseNana.COM

Yellow CourseNana.COM

Large CourseNana.COM

Round CourseNana.COM

Yes CourseNana.COM

D12 CourseNana.COM

Yellow CourseNana.COM

Large CourseNana.COM

Round CourseNana.COM

No CourseNana.COM

D13 CourseNana.COM

Yellow CourseNana.COM

Large CourseNana.COM

Round CourseNana.COM

No CourseNana.COM

D14 CourseNana.COM

Yellow CourseNana.COM

Large CourseNana.COM

Round CourseNana.COM

No CourseNana.COM

D15 CourseNana.COM

Yellow CourseNana.COM

Small CourseNana.COM

Irregular CourseNana.COM

Yes CourseNana.COM

D16 CourseNana.COM

Yellow CourseNana.COM

Large CourseNana.COM

Irregular CourseNana.COM

Yes CourseNana.COM

Table 1: Mushroom data with 16 instances, three categorical features, and binary labels. CourseNana.COM

|Sv| InfoGain(S, A) = H(S) H(Sv). CourseNana.COM

That is, we are taking the difference of the entropy before the split, and subtracting off the entropies of each new node after splitting, with an appropriate weight depending on the number of training examples at each node. CourseNana.COM

Problem 5. (5 points) Handling real valued (numerical) features is totally different from categorical features in splitting nodes. This problem intends to discuss a simple way to decide good thresholds for splitting based on numerical features. Specifically, when there is a numerical feature in data, an option would be treating all numeric values of feature as discrete, i.e., proceeding exactly as we do with categorical data. What problems may arise when we use a tree derived this way to classify an unseen example? CourseNana.COM

Hard Support Vector Machines
Problem 6. (10 points) Consider a data set with three data points in R2 and their corre- CourseNana.COM

sponding labels: CourseNana.COM

0 0 1X=0 1, y=1 CourseNana.COM

20 +1 CourseNana.COM

Note that ith row of X corresponds to ith training sample with two features and its label is yi. CourseNana.COM

Manually solve the following optimization problem for hard-margin SVM stated as CourseNana.COM

vV |S| min 1 w2 2 (1) CourseNana.COM

w,b s.t. CourseNana.COM

yi(wxi +b)1, i=1,2,...,n to get the optimal hyperplane (w,b) and its margin. CourseNana.COM

Hint: first write down the constraints (correct classification conditions) for three training examples to get three equations in three parameters (w = [w1,w2] and b). Then derive the possible values for unknown parameters b, w1, w2 and see what choice of values gives the minimum norm for w. CourseNana.COM


CourseNana.COM

Soft Support Vector Machines CourseNana.COM

Problem 7 (15 points) In this problem we would like to compare the solutions of hard and soft SVMs on a linearly seperable dataset. CourseNana.COM

Let n > 1 be a fixed number. Is it true that there exists C > 0 such that for every sample S of n training examples with binary label, which are linearly separable, the hard-SVM and the soft-SVM (with parameter C) solutions will return exactly the same weight vector. Justify your answer. (Hint: consider n = 2, d = 1 and S = {(x1, y1) , (x2, y2)}. Let a > 0 and consider x1 = a, y1 = 1, x2 = a, y2 = 1. Derive the optimal solution for hard and soft SVM and compare the results.) CourseNana.COM

  CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
US代写,PSU代写,Pennsylvania State University代写,CMPSC 448代写,CMPSC448代写,Machine Learning and AI代写,Perceptron代写, Logistic Regression代写, Decision Tree代写, Hard Support Vector Machines代写,Soft Support Vector Machines代写,US代编,PSU代编,Pennsylvania State University代编,CMPSC 448代编,CMPSC448代编,Machine Learning and AI代编,Perceptron代编, Logistic Regression代编, Decision Tree代编, Hard Support Vector Machines代编,Soft Support Vector Machines代编,US代考,PSU代考,Pennsylvania State University代考,CMPSC 448代考,CMPSC448代考,Machine Learning and AI代考,Perceptron代考, Logistic Regression代考, Decision Tree代考, Hard Support Vector Machines代考,Soft Support Vector Machines代考,UShelp,PSUhelp,Pennsylvania State Universityhelp,CMPSC 448help,CMPSC448help,Machine Learning and AIhelp,Perceptronhelp, Logistic Regressionhelp, Decision Treehelp, Hard Support Vector Machineshelp,Soft Support Vector Machineshelp,US作业代写,PSU作业代写,Pennsylvania State University作业代写,CMPSC 448作业代写,CMPSC448作业代写,Machine Learning and AI作业代写,Perceptron作业代写, Logistic Regression作业代写, Decision Tree作业代写, Hard Support Vector Machines作业代写,Soft Support Vector Machines作业代写,US编程代写,PSU编程代写,Pennsylvania State University编程代写,CMPSC 448编程代写,CMPSC448编程代写,Machine Learning and AI编程代写,Perceptron编程代写, Logistic Regression编程代写, Decision Tree编程代写, Hard Support Vector Machines编程代写,Soft Support Vector Machines编程代写,USprogramming help,PSUprogramming help,Pennsylvania State Universityprogramming help,CMPSC 448programming help,CMPSC448programming help,Machine Learning and AIprogramming help,Perceptronprogramming help, Logistic Regressionprogramming help, Decision Treeprogramming help, Hard Support Vector Machinesprogramming help,Soft Support Vector Machinesprogramming help,USassignment help,PSUassignment help,Pennsylvania State Universityassignment help,CMPSC 448assignment help,CMPSC448assignment help,Machine Learning and AIassignment help,Perceptronassignment help, Logistic Regressionassignment help, Decision Treeassignment help, Hard Support Vector Machinesassignment help,Soft Support Vector Machinesassignment help,USsolution,PSUsolution,Pennsylvania State Universitysolution,CMPSC 448solution,CMPSC448solution,Machine Learning and AIsolution,Perceptronsolution, Logistic Regressionsolution, Decision Treesolution, Hard Support Vector Machinessolution,Soft Support Vector Machinessolution,