Question 3 Resource Allocation and scheduling (25 points)
Assume a YARN cluster consists of ten EC2 c4.xlarge instances. The cluster has one master node and nine core nodes. An EC2 c4.xlarge instance has 4 vCPU and 7.5G memory. Assume YARN can use 6G memory on each node. YARN is configured to use a fair scheduler and each user’s applications are submitted to its own queue.
The default resource configuration for MapReduce application is:
· Application Master: 1G
· Mapper: 3G
· Reducer: 4G
For simplicity, we assume containers for reducer will be requested after all map tasks in the same job are completed.
The default resource configuration for Spark application is:
· Application Master: 1G
· Executor Memory: 6G
· Executor Core: 4
For simplicity, we assume all executors will be requested at the same time and stay throughout the life cycle of the application.
Note that containers requested at the same time may not be allocated at the same time. Containers will be allocated by the scheduler algorithm based on the available resources in the cluster.
Consider the following scenario of application execution. We assume the time order is: t0< t1 < t2 < t3
User A submitted a MapReduce application M at t0. There is no other application running in the cluster when M is submitted. M contains two jobs. The first job has an input of 10 blocks. It uses four reduce tasks. The second job has three reduce tasks.
User B submitted a Spark application S at t1. It requires three executors. The application will start one job with three stages. Each stage has 10 tasks.
At t2, five map tasks of M’s first job have completed. At t3, all map tasks of M’s first job have completed.
1. [2 points] What is the total number of vCPU and total size of memory YARN can allocate?
2. [2 points] Describe the queues and their respective resource capacity at t0 and t1 respectively.
3. [5 points] Describe the containers M requested and allocated before t3 and their respective resource allocation.
4. [4 points] Describe the containers M will request after t3 and the sequence of requesting them.
5. [6 points] Describe the sequence of requesting and allocating containers for application S. You can make assumptions on how containers are allocated on different nodes.
6. [6 points] Describe a possible way of allocating tasks to the executors for application S.