1. Homepage
  2. Exam
  3. COMP2420/COMP6420 - 2022 Sample - Q3 Constant Classifier

COMP2420/COMP6420 - 2022 Sample - Q3 Constant Classifier

This question has been solved
Engage in a Conversation

Q3) (16 marks)

Part 1 (6 marks)

You want to predict whether a final survey response comes from a first-year student (class 1) or not (class 0) based on responses to two questions. The average responses from a random sample of 200 surveys are below. CourseNana.COM

ClassCountLecture AverageText Average
1:First-year8075%64%
0: Other12067%68%

Survey questions: CourseNana.COM

  • What fraction of lectures did you attend?
  • What fraction of the textbook did you read?

a) (2 marks) A constant classifier is one that always guesses the same class label, regardless of the attributes used. What’s the accuracy of the best constant classifier for predicting the class on this sample data? CourseNana.COM

b) (2 marks) Among the following, what is the best reason to expect that a nearest-neighbor classifier that uses this sample as a training set will have higher accuracy on a test set than any constant classifier? CourseNana.COM

  • (i) The test set may have a different distribution of classes than this sample.
  • (ii) A nearest-neighbor classifier is designed to generalize to unseen examples.
  • (iii) A nearest-neighbor classifier can predict different classes for different examples.
  • (iv) The attributes (lecture and text) are associated with each other.
  • (v) The attributes (lecture and text) are both associated with the class

c) (1 mark) Two roommates always attended exactly the same lectures. One read 120 pages of the textbook, and the other read 240 pages of the textbook. What is the distance between these two roommates used by a nearest-neighbor classifier that includes as attributes both the fraction of lectures attended and the fraction of textbook read? Justify your answer CourseNana.COM

d) (1 mark) Two roommates have read same number of pages from the textbook. One attended 1/2 of the lectures, and the other attended 9/10 of the lectures. What is the distance between these two roommates used by a nearest-neighbor classifier that includes as attributes both the fraction of lectures attended and the fraction of the textbook read? Justify your answer CourseNana.COM

Part 2 (10 marks)

a) (2 marks) Suppose that we take a data set, divide it into equally-sized training and test sets, and then try out two different classification procedures. First we use linear classification and get an error rate of 20 % on the training data and 30 % on the test data. Next we use 1-nearest neighbors (i.e. K = 1) and get an average error rate (averaged over both test and training data sets) of 18 %. Based on these results, which method should we prefer to use for classification of new observations? Why CourseNana.COM

b) (3 marks) Explain the problem of over-fitting in the context of *decision trees as classifiers, with an example. You can assume that you have only categorical features in your dataset. Explain in detail how a validation set can help you to solve the problem of over-fitting. CourseNana.COM

c) (3 marks)  Suppose we have a set of data (n = 100 observations) containing a single predictor (feature X) and a quantitative response (predicted value Y^). We fit a simple linear regression model to the data, as well as a separate cubic regression, i.e., Yl^=βlo+βl1X and Yc^=βc0+βc1X+βc2X2+βc3X3, respectively. CourseNana.COM

Suppose that the true relationship between X and Y is linear, i.e. Y=β0+β1X. Consider the training error for both models. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Also comment on the test error for both models. Justify your answer. CourseNana.COM

Get the Solution to This Question

WeChat WeChat
Whatsapp WhatsApp