1. Homepage
  2. Exam
  3. COMPSCI 753 Algorithms for Massive Data - Semester2 2021- Final Exam - Q1 Locality-Sensitive Hashing

COMPSCI 753 Algorithms for Massive Data - Semester2 2021- Final Exam - Q1 Locality-Sensitive Hashing

This question has been solved
Engage in a Conversation

1 Locality-Sensitive Hashing

Given three documents S1, S2, S3 and a customized query document q: CourseNana.COM

S1 = {3,4,5},S2 = {0,1,2},S3 = {0,1,3},q = {2,3,4,h(y)}; CourseNana.COM

h(y) = y mod 6. CourseNana.COM

where y is the last digit of your Student ID. For instance, suppose my Stu- dentID=xxxxx7, my query document would be q= {2, 3, 4, 1}. CourseNana.COM


CourseNana.COM

1.1 Computing MinHash Signatures CourseNana.COM

1.     Generate the bit-vector representation for {S1, S2, S3, q} in feature space {0, 1, 2, 3, 4, 5}. [1 mark] CourseNana.COM

2.     Generate the MinHash matrix for {S1, S2, S3, q} using the following four MinHash functions. [2 marks] CourseNana.COM

h1(x) = x mod 6 CourseNana.COM

h2(x)=(x+1) mod6 CourseNana.COM

h3(x)=(x+3) mod6 CourseNana.COM

h4(x)=(x+5) mod6 CourseNana.COM

3.     Consider the query q and estimate the signature-based Jaccard similari- ties: J(q, S1), J(q, S2), and J(q, S3). [1 mark] CourseNana.COM


CourseNana.COM


CourseNana.COM

1.2 Tuning Parameters for rNNS CourseNana.COM

In our lecture, we have learnt to formulate the collision probability (i.e., S- curve) given the number of bands b and the number of rows per band r as follows: P r(s) = 1 (1 s ) . CourseNana.COM

Consider three sets of parameters (r=2,b=10), (r=6,b=30), (r=10,b=50). The collision probabilities for similarity s in range of [0,1] for each (r,b) are pro- vided accordingly as follows: CourseNana.COM

  1. Which settings give at most 5% of false negatives for any 70%-similar pairs? Briefly explain the reason. [1 mark]
  2. Which settings give at most 15% of false positives for any 30%-similar pairs? Briefly explain the reason. [1 mark]


CourseNana.COM


CourseNana.COM

1.3 c-Approximate Randomized rNNS  [3 marks] CourseNana.COM

We have learnt that a family of functions H is called (d1, d2, p1, p2)-sensitive with collision probability p1 > p2 and c > 1 if the following conditions hold for any uniformly chosen h H and x,y U: CourseNana.COM

If d(x, y) r, P r[h(x) = h(y)] p1 for similar points, and CourseNana.COM

If d(x, y) cr, P r[h(x) = h(y)] p2 for dissimilar points. CourseNana.COM

Consider a family transformation from (d1, d2, p1, p2)-sensitive to (d1, d2, 1 (1pk1)L,1(1pk2)L)-sensitive, where k and L refer to the number of hash functions and the number of hash tables, respectively. Briefly describe steps to achieve such transformation. What is the expected impact on probability bounds after the transformation? CourseNana.COM


CourseNana.COM


CourseNana.COM

Get the Solution to This Question

WeChat WeChat
Whatsapp WhatsApp
COMPSCI 753代写,Algorithms for Massive Data代写,Auckland代写,澳洲代写,Locality-Sensitive Hashing代写,Minhash代写,rNNs代写,COMPSCI 753代编,Algorithms for Massive Data代编,Auckland代编,澳洲代编,Locality-Sensitive Hashing代编,Minhash代编,rNNs代编,COMPSCI 753代考,Algorithms for Massive Data代考,Auckland代考,澳洲代考,Locality-Sensitive Hashing代考,Minhash代考,rNNs代考,COMPSCI 753help,Algorithms for Massive Datahelp,Aucklandhelp,澳洲help,Locality-Sensitive Hashinghelp,Minhashhelp,rNNshelp,COMPSCI 753作业代写,Algorithms for Massive Data作业代写,Auckland作业代写,澳洲作业代写,Locality-Sensitive Hashing作业代写,Minhash作业代写,rNNs作业代写,COMPSCI 753编程代写,Algorithms for Massive Data编程代写,Auckland编程代写,澳洲编程代写,Locality-Sensitive Hashing编程代写,Minhash编程代写,rNNs编程代写,COMPSCI 753programming help,Algorithms for Massive Dataprogramming help,Aucklandprogramming help,澳洲programming help,Locality-Sensitive Hashingprogramming help,Minhashprogramming help,rNNsprogramming help,COMPSCI 753assignment help,Algorithms for Massive Dataassignment help,Aucklandassignment help,澳洲assignment help,Locality-Sensitive Hashingassignment help,Minhashassignment help,rNNsassignment help,COMPSCI 753solution,Algorithms for Massive Datasolution,Aucklandsolution,澳洲solution,Locality-Sensitive Hashingsolution,Minhashsolution,rNNssolution,