1. Homepage
  2. Exam
  3. Machine Learning and Intelligent Data Analysis

Machine Learning and Intelligent Data Analysis

This question has been solved
Engage in a Conversation

Machine Learning and Intelligent Data Analysis CourseNana.COM

Learning Outcomes CourseNana.COM

  1. (a)  Demonstrate knowledge and understanding of core ideas and foundations of unsu- pervised and supervised learning on vectorial data
  2. (b)  Explain principles and techniques for mining textual data
  3. (c)  Demonstrate understanding of the principles of efficient web-mining algorithms
  4. (d)  Demonstrate understanding of broader issues of learning and generalisation in ma- chine learning and data analysis systems

Question 1 Dimensionality Reduction CourseNana.COM

(a) Explain what is meant by “dimensionality reduction” and why it is sometimes nec- essary. [4 marks] CourseNana.COM

  1. (b)  Consider the following dataset of four sample points x(i)4i=1 with x(i) R2 Explain how to calculate the principal components of this dataset, outlining each step and performing all calculations up to (but not including) the computation of eigenvectors and eigenvalues. [6 marks]
  2. (c)  What does principal component analysis (PCA) tell you about the nature of a multi- variate dataset? Explain how it can be used for dimensionality reduction? [4 marks]
  3. (d)  What are the limitations of PCA and what other dimensionality reduction techniques may be used instead? [2 marks]
  4. (e)  You are given a dataset consisting of 100 measurements, each of which has 10 variables. The eigenvalues of the covariance matrix are shown in the following table:

What can you say about the underlying nature of this dataset? [4 marks] 2 CourseNana.COM

Question 2 Classification CourseNana.COM

(a) Consider the Soft Margin Support Vector Machine learnt in Lecture 4e. Consider also CourseNana.COM

(i) (j) (i)T that C = 100 and that we are adopting a linear kernel, i.e., k(x ,x ) = x CourseNana.COM

Assume an illustrative binary classification problem with the following training examples: CourseNana.COM

Which of the Lagrange multipliers below is(are) a plausible solution(s) for this problem? Justify your answer. CourseNana.COM

 (b)  Consider a binary classification problem where around 5% of the training examples are likely to have their labels incorrectly assigned (i.e., assigned as -1 when the true label was +1, and vice-versa). Which value of k for k-Nearest Neighbours is likely to be better suited for this problem: k = 1 or k = 3? Justify your answer. CourseNana.COM

[6 marks] CourseNana.COM

(c)  Consider a binary classification problem where you wish to predict whether a piece of machinery is likely to contain a defect. For this problem, 0.5% of the training examples belong to the defective class, whereas 99.5% belong to the non-defective class. When adopting Na ̈ıve Bayes for this problem, the non-defective class may almost always be the predicted class, even when the true class is the defective class. Explain why and propose a method to alleviate this issue. [8 marks] CourseNana.COM

Question 3 Document Analysis CourseNana.COM

(a)  In a small universe of five web pages, one page has a PageRank of 0.4. What does this tell us about this page? [2 marks] CourseNana.COM

(b)  Compare and contrast the TF-IDF and word2vec approaches to document vectori- sation. You should explain the essential principles of each method, and highlight their respective advantages and disadvantages. [8 marks] CourseNana.COM

(c)  One possible approach to searching a large linked set of documents is to combine a measure of document similarity such as TF-IDF similarity with a measure of a page’s importance such as that provided by PageRank. Suggest three ways in which this could be done and discuss the advantages and disadvantages of each of them. CourseNana.COM

Get the Solution to This Question

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
Machine Learning代写,Data Analysis代写,Machine Learning代编,Data Analysis代编,Machine Learning代考,Data Analysis代考,Machine Learninghelp,Data Analysishelp,Machine Learning作业代写,Data Analysis作业代写,Machine Learning编程代写,Data Analysis编程代写,Machine Learningprogramming help,Data Analysisprogramming help,Machine Learningassignment help,Data Analysisassignment help,Machine Learningsolution,Data Analysissolution,