Question 2 GFS (15 points)
This question has two parts. All parts are related with a GFS cluster with a master node M and many chunk servers CS1, ... CSn. A logfile L is stored as three chunks L0, L1 and L2. They are replicated in a few nodes. The distribution of replicas is shown in the following table.
Chunk/Server | CS1 | CS2 | CS3 | CS4 | CS5 | CS6 |
L0 | x |
| x | x |
|
|
L1 |
| x |
| x | x |
|
L2 |
|
| x |
| x | x |
- [5 points] Assume a MapReduce application takes L as input, describe a possible scenario for the application to get the input data. Your description should describe the nodes being contacted and the possible control and data flow between nodes. No point will be awarded for describing the general GFS read process.
- [10 points] Assume two clients C1 and C2 concurrently append the latest entries to chunk L2. The replica on CS3 holds the current lease of this chunk. C1 is running on a machine closest to CS3; C2 is running on a machine closest to CS5. At time t0, C1 needs to append a log entry le1; C2 needs to append a log entry le2. Describe a possible scenario for the cluster to handle the concurrent append requests. You can make assumptions on the relative distance between the nodes involved. No point will be awarded for describing the general GFS write process.