Question 1. Short Answer Questions (25 points)
1. [4 points] Assume an erasure coding scheme LRC(12,2,2) . The data unit is divided into 12 fragments: a local parity fragment px is created from the first 6 fragments x0 - x5 and another local parity fragment py is created from the next 6 fragments: y0-y5; two global parity fragments p0 and p1 are created from the entire 12 data fragments. Show an example failure pattern with four failed fragments and analyze the reconstruction cost. The failure pattern should not come from the original paper.
2. [6 points] Many cloud database services adopted the layered approach by separating the database layer and the storage layer. Identify THREE benefits of such approach, for each benefit, use a real world system as example to demonstrate the benefit.
3. [4 points] Use your own words, explain the difference between chunk version number and mutation serial number in GFS.
4. [5 points] Explain “shuffle” and how it happens in MapReduce and Spark framework respectively.
5. [6 points] Explain the following terms: “application”, “job”, “task” and “task attempt” in YARN. Some terms are shared by MapReduce and Spark framework while others may be only used by a single framework. If a shared term has different meanings in different frameworks, it should be highlighted.