Problem Set #2
September 20, 2022
Feel free to collaborate with other classmates in doing the homework. Please indicate your collaborators with their student ID. You should, however, write down your solution yourself. Please try to keep the answers brief and clear.
Whenever you need clarification, please post the related questions on Piazza under the corresponding homework folder.
Maximum likelihood estimation
Question 1.1 (30 points): Suppose we throw a coin n times. Let D = {y1, ..., yn} denote the dataset we obtain, where yi ∈ {0, 1} is the outcome of the i-th throw (i.e., yi = 1 if the coin comes up to be a head and 0 if it comes up to be a tail). We are interested in finding the maximum likelihood estimation of the probability that the coin comes up with a head, θ ∈ [0, 1], given this dataset D.
a) (10 pts) Write out Pθ(D), i.e., the probability of observing the dataset D under the probability distribution is parameterized by θ. Hint: this probability distribution is NOT binomial distribution, since we assume that you have already observed the outcome of every coin.
b) (10 pts) Write out the log-likelihood, logPθ(D)
c) (10 pts) Obtain the maximum likelihood estimation of θ given the dataset D
2 Gaussian discriminant analysis
Question 2.1 (30 points): Suppose we are given a dataset D = {(x1,y1),...,(xn,yn)}, where xi ∈ R and yi ∈ {0, 1}. We will model the joint distribution of (x, y) according to: p(y) = φy(1 − φ)1−y
p(x|y=0)= exp − (x−μ0) Γ (x−μ0)
p(x|y=1)= exp − (x−μ1) Γ (x−μ1)
Here, the parameters of our model are φ, Γ, μ0 and μ1. Note that while there are two
(2π) |Γ| 2
different mean vectors μ0 and μ1, there is only one covariance matrix Γ.
a) (5 pts) Suppose we have already fit φ, Γ, μ0 and μ1, and now want to make a prediction at some new query point x. Show that the posterior distribution of the label at x can be written as
where the vector θ and scalar θ0 are some appropriate functions of φ, Γ, μ0 and μ1 that you need to specify.
b) (25 pts) For this part of the problem only, you may assume d (the dimension of x) is 1, so that Γ = σ2 is just a real number, and likewise the determinant of Γ is given by |Γ| = σ . Given the dataset, we claim that the maximum likelihood estimates of the parameters are given by where 1(·) is the indicator function we have seen in class. The log-likelihood of the data is
Γ= log PΘ(D) = log Πni=1p(xi, yi) = log Πni=1p(xi|yi)p(yi).
By maximizing log PΘ(D) with respect to the four parameters, prove that the maximum likelihood estimates of φ, Γ, μ0 and μ1 are indeed as given in the formulas above. (You may assume that there is at least one positive and one negative example, so that the denominators in the definitions of μ0 and μ1 above are non-zero.)
3 Programming assignment: Linear Regression (40 pts)
For the following programming assignment, please download the datasets and iPython note- books from Canvas and submit the following:
• Completed and ready-to-run iPython notebooks. Note: we will inspect the code and run your notebook if needed. If we cannot run any section of your notebook, you will not receive any points for the task related to that section.
• Responses (texts, codes, and/or figures) to the following problems/tasks
In this programming exercise, you will build a linear regression model and apply it to a covid-19 sample dataset.
Task P1 (6 pts): Complete the codes that generate the three visualization graphs that show the trend of the epidemic progression (”People tested”, ”Deaths”, and ”New positive cases”). Copy them to the solution file.
Task P2 (4 pts): Complete the function predict output. Copy the the outputs of the code to the solution file.
Task P3 (6 pts): Let the regression cost function be given by where xi ∈ Rd is the input feature of dimension d, yi ∈ R is the output response, and w ∈ Rd is the regression weights. Complete the function weight derivative to calculate the derivative of the cost function with respect to regression weights w, i.e., ∂ LD(w). Note ∂w that this should be a d dimensional vector. Also copy the output of the code for the test example to the solution file.
Task P4 (5 pts): Complete the code section to perform the gradient decent in the function regression gradient descent. Copy the code to the solution file.
Task P5 (3 pts): Specify the initial weights, step size and tolerance for the function regression gradient descent. Print the outputs of the code.
Task P6 (3 pts): Use the learned weights to predict ’People tested’ in the last three weeks in the dataset. Copy the predictions to the solution file, and calculate the test error where n is the true label, yˆ 3 LD(w) is the number of test data, y is the predicted label.
Task P7 (3 pts): Specify the initial weights, step size and tolerance for the function regression gradient descent. Print the outputs of the code.
Task P8 (4 pts): Use the learned weights to predict ’People tested’ in the last three weeks in the dataset. Find the value of the model predictions on the 10th day of the forecasting period. Also print the actual number of people tested on that particular day. Copy the predictions to the solution file, and calculate the test error. Note: here we are asking you to report the number before normalization. So you need to convert the prediction back to the unit of people.
Task P9 (6 pts): Explore on your own. Report your question of investigation, as well as your results/interpretation in the solution file.