QUESTION 1
“Curse of Dimensionality” is a well-known problem in high dimensional space. Which of the following observations is incorrect?
2 points
a) When data is uniformly distributed, most of the data is in the boundary regions, leaving the central space a very hollow space. Suppose we take the range of [0.05, 0.95] of each dimension, then the interior region’s volume is 0.9d. When d = 50, that takes only 0.005 of the entire volume.
b) The curse of dimensionality is that the performance of an index degrades rapidly as dimensionality increases, but it always outperforms linear scan.
c) When we search the nearest neighbour in a high dimensional space, it would be hard to distinguish the distance between the objects. The distance between the nearest neighbour and the farthest neighbour becomes nearly the same.
d) The number of partitions 2d grows exponentially as the dimensional number (d) grows. When d becomes large enough, we have more partitions than data points.