Homepage
Programming
COMP S491 Machine Learning and Applications - Assignment 3: Unsupervised learning, PCA and Apriori

COMP S491 Machine Learning and Applications - Assignment 3: Unsupervised learning, PCA and Apriori

Engage in a Conversation

Assignment 3

Cutoff date: 1 Mar 2023 Instructions: ● Please be reminded that you must turn in all assignments on time. Assignment extensions will be given only to unforeseen circumstances such as illness when supporting documents are provided. However, late assignments will be marked for your benefit but the scores will NOT be recorded. CourseNana.COM

● There are 4 assignments in the course. Only the top 3 assignments are counted towards the final result. The average passing score is 40 for the assignments. CourseNana.COM

● Please prepare your answers in an answer file, in .docx or .doc format, and name the file like s12345678a1.docx where ‘12345678’ is your 8-digit student number and ‘1’ is the assignment number. For the programming questions, please also include the source code and execution outputs in one Jupyter Notebook file, and name the file like s12345678a1.ipynb. Do NOT submit multiple Jupyter Notebook files. CourseNana.COM

● Please remember to write your name and student number in each of your answer file and Jupyter Notebook file (and related files, if applicable). Put all the files in a zip file, name it like s12345678a1.zip, and submit it (i.e. only that zip file) to OLE. CourseNana.COM

● This assignment contains 4 questions. All questions are compulsory to be answered. CourseNana.COM

Question 1 – Unsupervised learning concepts [25marks]

a. Two main types of unsupervised learning are association analysis and cluster analysis. For each of the two types, give an example application in the education domain; describe the application and how unsupervised learning techniques are employed in the application. CourseNana.COM

b. Principal component analysis (PCA) is applied to a dataset of 4 features to produce 4 principal components called PC0, PC1, PC2, and PC3. CourseNana.COM

i. What is the total amount of explained variance ratios of all the principal components? CourseNana.COM

ii. Compare the amounts of explained variance ratios of the 4 principal components. CourseNana.COM

iii. A student calculates some cumulative explained variance ratios by hand, and gets the following results. Comment on the correctness of the calculated results with justification. CourseNana.COM

Principal component(s)	Calculated cumulative explained variance ratio
PC0	0.40
PC0 + PC1	0.60
PC0 + PC1 + PC2	0.90

c. The following dataset contains 7 transactions of 6 items: apple, bread, carrot, donut, egg, and fish. Calculate (i) support({apple}), (ii) support({donut}), (iii) support({apple} => {donut}), and (iv) confidence({apple} => {donut}). (Note that this dataset is also used in some later questions.) [5] ● T0: apple, bread, egg, fish CourseNana.COM

● T1: apple, bread, carrot, donut, egg CourseNana.COM

● T2: bread, egg CourseNana.COM

● T3: bread, donut, egg CourseNana.COM

● T4: apple, bread, egg CourseNana.COM

● T5: apple, bread, egg, fish CourseNana.COM

● T6: apple, carrot, donut, egg CourseNana.COM

d. Recommender systems use different types of techniques, which employ different data. CourseNana.COM

i. Which type of technique mainly uses product description data? CourseNana.COM

ii. Which type of technique mainly uses recent product sales data? CourseNana.COM

iii. Which type of technique mainly uses historical user purchase data? CourseNana.COM

iv. Which type of technique has the most privacy concerns? Briefly justify your answer. CourseNana.COM

e. Four vectors are given below. CourseNana.COM

ID	Vector
#0	(1, 2, 3)
#1	(2, 2, 2)
#2	(2, 2, 3)
#3	(2, 2, 4)

i. Compute the cosine similarity among the four vectors, and show the resulting cosine similarity values in 4 decimal places and in a 4x4 table. You may do it either by hand or by code, and need not show the intermediate work. CourseNana.COM

ii. From the table, determine the vector that is most similar to vector #1. CourseNana.COM

iii. From the table, determine the vector that is most similar to vector #2. CourseNana.COM

Question 2 – Unsupervised learning algorithms [25marks]

a. Apply Apriori to the dataset in Q1-c to find frequent itemsets for the minimum support value of 0.4. Show your work of identifying the candidate and frequent 1-itemsets, 2-itemsets, and so on. CourseNana.COM

b. Apply k-means to cluster 6 data points (1, 1), (1, 2), (1, 3), (2, 1), (2, 2), and (4, 3) to 2 clusters. Use the initial centroids (1, 0) and (2, 2). Show your work of computing the distances and centroid updates in each iteration of k-means, and present the values in up to 2 decimal places. CourseNana.COM

Question 3 – Programming PCA and Apriori [25

marks] a. Given a dataset created from the code fragment below, write code to apply PCA and linear regression on the dataset for various numbers of principal components, and plot the test scores and total variance explained ratios versus the number of principal components used. Do not use cross-validation. CourseNana.COM

from sklearn.datasets import make_regression
X, y = make_regression(n_samples=500, n_features=30, n_informative=20,
effective_rank=10, noise=4, tail_strength=0.1,
random_state=42)

b. Write code to visualise the dataset in part (a) in a scatter plot using the first two principal components. Show the first principal component in the x-axis, the second principal component in the y-axis, and the target values in colours with the “bwr” Matplotlib colormap. [5] c. Write code to apply Apriori to the dataset in Q1-c for the minimum support value of 0.4. Display the support values and the contents of the frequent itemsets. [8] CourseNana.COM

Question 4 – Programming and evaluation of clustering [25 marks]

a. Write code to apply k-means to cluster 6 data points (1, 1), (1, 2), (1, 3), (2, 1), (2, 2), and (4, 3) to 2 clusters. Use the initial centroids (1, 0) and (2, 2). Display the resulting cluster labels of the data points, cluster centroids, number of iterations performed, and inertia. [10] b. Repeat part (a) by varying the initial centroids to obtain a different clustering scheme. That is, you need to find a set of initial centroids to produce cluster labels and cluster centroids that are different from the scheme of part (a). Show the resulting cluster labels and cluster centroids. [3] c. Write code to apply the silhouette method to determine the optimal number of clusters for the dataset in part (a). Use agglomerative hierarchical clustering, and plot the resulting silhouette coefficients for 2 to 5 clusters. [10] d. Using the resulting graph of part (c), determine the optimal number of clusters for the dataset. [2] CourseNana.COM

End of Assignment CourseNana.COM

Get in Touch with Our Experts

WeChat (微信)

Last: SENG265: Software Development Methods - Assignment 2: Route Manager

Next: CS126 Design of Information Structures - Coursework: Warwick+ for film data

Hong Kong代写,HKMU代写,Hong Kong Metropolitan University代写,COMP S491代写,Machine Learning and Applications代写,Unsupervised learning代写, PCA and Apriori代写,Python代写,Hong Kong代编,HKMU代编,Hong Kong Metropolitan University代编,COMP S491代编,Machine Learning and Applications代编,Unsupervised learning代编, PCA and Apriori代编,Python代编,Hong Kong代考,HKMU代考,Hong Kong Metropolitan University代考,COMP S491代考,Machine Learning and Applications代考,Unsupervised learning代考, PCA and Apriori代考,Python代考,Hong Konghelp,HKMUhelp,Hong Kong Metropolitan Universityhelp,COMP S491help,Machine Learning and Applicationshelp,Unsupervised learninghelp, PCA and Apriorihelp,Pythonhelp,Hong Kong作业代写,HKMU作业代写,Hong Kong Metropolitan University作业代写,COMP S491作业代写,Machine Learning and Applications作业代写,Unsupervised learning作业代写, PCA and Apriori作业代写,Python作业代写,Hong Kong编程代写,HKMU编程代写,Hong Kong Metropolitan University编程代写,COMP S491编程代写,Machine Learning and Applications编程代写,Unsupervised learning编程代写, PCA and Apriori编程代写,Python编程代写,Hong Kongprogramming help,HKMUprogramming help,Hong Kong Metropolitan Universityprogramming help,COMP S491programming help,Machine Learning and Applicationsprogramming help,Unsupervised learningprogramming help, PCA and Aprioriprogramming help,Pythonprogramming help,Hong Kongassignment help,HKMUassignment help,Hong Kong Metropolitan Universityassignment help,COMP S491assignment help,Machine Learning and Applicationsassignment help,Unsupervised learningassignment help, PCA and Aprioriassignment help,Pythonassignment help,Hong Kongsolution,HKMUsolution,Hong Kong Metropolitan Universitysolution,COMP S491solution,Machine Learning and Applicationssolution,Unsupervised learningsolution, PCA and Apriorisolution,Pythonsolution,