CourseNana | COMP9334 Capacity Planning of Computer Systems and Networks Project: Computing clusters

Introduction and learning objectives CourseNana.COM

COMP9334 Project, Term 1, 2024: Computing clusters CourseNana.COM

Due Date: 5:00pm Friday 19 April 2024 Version 1.01 CourseNana.COM

You have learnt in Week 4A’s lecture that a high variability of inter-arrival times or service times can cause a high response time. Measurements from real computer clusters have found that the service times in these clusters have very high variability [1]. The reference paper [1] also has a number of suggestions to deal with this issue. One suggestion is to separate the jobs according to their service time requirements, and have one set of servers processing jobs with short service times and another set of servers for jobs with long service times. This arrangement is the same as supermarkets having express checkouts for customers buying not more than a certain number of items and other checkouts that do not have a limit on the number of items. You had seen this theory in action in Week 4A’s revision Problem 1. We also highly recommend you to read the paper [1]. CourseNana.COM

In this project, you will use simulation to study how to reduce the response time of a server farm that uses different servers to process jobs with different service time requirements. CourseNana.COM

In this project, you will learn:
1. To use discrete event simulation to simulate a computer system 2. To use simulation to solve a design problem
3. To use statistically sound methods to analyse simulation outputs CourseNana.COM

We mentioned a number of times in the lectures that simulation is not simply about writing simulation programs. While it is important to get your simulation code correct, it is also important that you use statistically sound methods to analyse simulation outputs. There, roughly half of the marks of this project is allocated to the simulation program, and the other half to statistical analysis; see Section 7.2. CourseNana.COM

1 CourseNana.COM

Group 0 → CourseNana.COM

Jobs that are killed are sent back
to the dispatcher CourseNana.COM

Jobs that have completed their processing will depart the system permanently CourseNana.COM

Server 0 CourseNana.COM

• • • CourseNana.COM

Server n0 - 1 CourseNana.COM

New jobs submitted by users CourseNana.COM

Dispatcher CourseNana.COM

Queue 0 ↓ CourseNana.COM

Queue 1 ↑ CourseNana.COM

Server n0 CourseNana.COM

• • • CourseNana.COM

Server n - 1 CourseNana.COM

Jobs killed by servers in Group 0 CourseNana.COM

Group 1 → CourseNana.COM

Figure 1: The multi-server system for this project. CourseNana.COM

2 Support provided and computing resources CourseNana.COM

If you have problems doing this project, you can post your question on the course forum. We strongly encourage you to do this as asking questions and trying to answer them is a great way to learn. Do not be afraid that your question may appear to be silly, the other students may very well have the same question! Please note that if your forum post shows part of your solution or code, you must mark that forum post private. CourseNana.COM

Another way to get help is to attend a consultation (see the Timetable section of the course website for dates and times). CourseNana.COM

If you need computing resources to run your simulation program, you can do it on the VLAB remote computing facility provided by the School. Information on VLAB is available here: https: //taggi.cse.unsw.edu.au/Vlab/ CourseNana.COM

3 Multi-server system configuration with job isolation CourseNana.COM

The configuration of the multi-server system that you will use in this project is shown in Figure 1. The system consists of a dispatcher and n servers where n ≥ 2. The n servers are parti- tioned into 2 disjoint groups, called Groups 0 and 1, with at least one server in each group. The numberofserversinGroups0and1are,respectively,n0 andn1 wheren0,n1 ≥1andn0+n1 =n. CourseNana.COM

The servers in Group 0 are used to process short jobs which require a processing time of no more than a time limit of Tlimit. The servers in Group 1 do not impose any limit on service time. CourseNana.COM

2 CourseNana.COM

The dispatcher has two queues: Queue 0 and Queue 1. The jobs in Queue i (where i = 0, 1) are destined for servers in Group i. Both queues have infinite queueing spaces. CourseNana.COM

When a user submits a job to this multi-server system, the user needs to indicate whether the job is intended for the servers in Group 0 or Group 1. The following general processing steps are common to all incoming jobs: CourseNana.COM

• If a job is intended for a server in Group i (where i = 0, 1) arrives at the dispatcher, the job will be sent to a server in Group i if one is available, otherwise the job will join Queue i. CourseNana.COM

• When a job departs from a server in Group i, the server will check whether there is a job at the head of Queue i. If yes, the job will be admitted to the available server for processing. CourseNana.COM

Recall that the servers in Group 0 have a service time limit. The intention is that the users make an estimate of the service time requirement of their submitted jobs. If a user thinks that their job should be able to complete within Tlimit, then they submit it to Group 0; otherwise, they should send it to the Group 1. CourseNana.COM

Unfortunately, the service time estimated by the users is not always correct. It is possible that a user sends a job which cannot be completed within the time limit to Group 0. We will now explain how the multi-server system will process such a job. Since the user has indicated that the job is destined for Group 0, the job will be processed according to the general processing steps explained earlier. This the job to the dispatcher and tell it that this is a killed job. The dispatcher will check whether a server in Group 1 is available. If yes, the job will be send to an available server; otherwise, it will join Queue 1 to wait for a server to become available. When a server in Group 1 is available to work on this job, it will process the job from the beginning, i.e., all the previous processing in a Group 0 server is lost. CourseNana.COM

If a job has completed its processing at a Group 0 server, which means its service time is less than or equal to Tlimit, then the job leaves the multi-server system permanently. Similarly, a job completed its processing at a Group 1 server will leave the system permanently. CourseNana.COM

Remark 1 Some elements in the above description are realistic but some are not. Typically, users are required to specify a walltime as a service time limit when they submit their jobs to a computing cluster. If a server has already spent the specified walltime on the job, then the server will kill the job. All these are realistic. CourseNana.COM

The re-circulation of a killed job is normally not done. A user will typically have to resubmit a new job if it has been killed. If a killed job is re-circulated, then it may be given a lower priority, rather than joining the main queue which is the case here. CourseNana.COM

Some programming technique (e.g., checkpointing) allows a killed job or crashed job to resur- rect from the last state saved rather than from the beginning. However, that may require a sizeable memory space. CourseNana.COM

In order to make this project more do-able, we have simplified many of the settings. For example, we do not use lower priority for the re-circulated killed jobs. CourseNana.COM

4 Examples CourseNana.COM

We will now present three examples to illustrate the operation of the system that you will simulate in this project. In all these examples, we assume that the system is initially empty. CourseNana.COM

4.1 Example 0: n=3, n0 =1, n1 =2 and Tlimit =3
In this example, we assume the there are n = 3 servers in the farm with 1 (= n0) server in Group CourseNana.COM

0 and 2 (= n1) servers in Group 1. The time limit for Group 0 processing is Tlimit = 3. CourseNana.COM

Table 1 shows the attributes of the 8 jobs that we will use in this example. Each job is given an index (from 0 to 7). For each job, Table 1 shows its arrival time, service time and the server group that the user has indicated. For example, Job 1 arrives at time 10, requires 4 units of time for service and the user has indicated that this job needs to go to a Group 0 server. Since the service time requirement for this job exceeds the time limit Tlimit of 3, this job will be killed after 3 time units of service and will be sent to dispatcher after that. CourseNana.COM

Note that, a job which a user sends to a Group 0 server will be completed if its service time is less than or equal to the service time limit Tlimit being imposed. So, Job 6 in Table 1 will be completed in a Group 0 server and this job will not be killed. CourseNana.COM

Job index Arrival time Service time required Server group indicated 0251 1 10 4 0 2 11 9 0 3 12 2 0 4 14 8 1 5 15 5 0 6 19 3 0 7 20 6 1 CourseNana.COM

Table 1: Jobs for Example 0. CourseNana.COM

Remark 2 We remark that the job indices are not necessary for carrying out the discrete event simulation. We have included the job index to make it easier to refer to a job in our description below. CourseNana.COM

The events in the system in Figure 1 are
• The arrival of a new job to the dispatcher; and, CourseNana.COM

4 CourseNana.COM

• The departure of a job from a server. CourseNana.COM

We remark that for a Group 1 server, a departed job has its service completed. However, for a Group 0 server, a departed job can be a killed job or a completed job. Note that we have not included the arrival of a re-circulated killed job to the dispatcher as an event. This is because the arrival of a re-circulated job at the dispatcher is at the same time as the departure of that job from a Group 0 server. So the simulation will handle these events together: the departure of a killed job and its handling by the dispatcher. CourseNana.COM

We will illustrate the simulation of the system in Figure 1 using “on-paper simulation”. The quantities that you need to keep track of include: CourseNana.COM

Next arrival time is the time that the next new job (i.e, not a killed job) will arrive CourseNana.COM
For each server, we keep track its server status, which can be busy or idle. CourseNana.COM
We also keep track of the following information on the job that is being processed in the server: CourseNana.COM
- – Next departure time is the time at which the job will depart from the server. If the server is idle, the next departure time is set to ∞. Note that there is a next departure time for each server. CourseNana.COM
- – The time that this job arrived at the system. This is needed for calculating the response time of the job when it permanently departs from the system. CourseNana.COM
The contents of Queues 0 and 1. Each job in the queue is identified by a 2-tuple of (arrival time, service time). CourseNana.COM

There are other additional quantities that you will need to keep track of and they will be mentioned later on. CourseNana.COM

The “on-paper simulation” is shown in Table 2. The notes in the last column explain what updates you need to do for each event. Recall that the two event types in this simulation are the arrival of a new job to the dispatcher and the departure from a server, we will simply refer to these two events as Arrival and Departure in the “Event type” column (i.e., second column) in Table 2. CourseNana.COM

5 CourseNana.COM

Notes CourseNana.COM	We assume the servers are idle and queues are empty at the start of the simulation. The next departure times for all servers are ∞. The “–” indicates that the queues are empty. CourseNana.COM	This event is the arrival of Job 0 for a Group 1 server. Since both Group 1 servers are idle before this arrival, the job can be sent to any one of the idle servers. We have chosen to send this job to Server 1. The job requires a service time of 5, so its completion time is 7. Note that the record of the job in the server is a 2-tuple consisting of (arrival time, scheduled departure time). Lastly, we need to update the arrival time of the next job, which is 10. CourseNana.COM	This event is the departure of a job from Server 1. Since Queue 1 is empty, Server 1 becomes idle. CourseNana.COM	This event is the arrival of Job 1 for a Group 0 server. Since Server 0 is idle, the job can be sent to the idle server. This job requires a service time of 4 which exceeds the service time limit of 3 for Group 0 servers, so the simulation needs to schedule this job to depart Server 0 at time 13 because this is the time that this job will be killed by the server. We use the 3-tuple consisting of (arrival time, scheduled departure time, service time), which for this job is (10, 13, 4), to indicate that this job arrives at time 10, is scheduled to depart at time 13 and its service time requirement is 4 time units. We need to include the service time of the job because we will need it later when the job is re-circulated to a Group 1 server. Note that if you see a 3-tuple job in a Group 0 server, it means that the job will be killed and re-circulated to a Group 1 server. Lastly, we need to update the arrival time of the next job, which is 11. CourseNana.COM
Queue 1 CourseNana.COM	– CourseNana.COM	– CourseNana.COM	– CourseNana.COM	– CourseNana.COM
Queue 0 CourseNana.COM	– CourseNana.COM	– CourseNana.COM	– CourseNana.COM	– CourseNana.COM
Server 2 Group 1 CourseNana.COM	Idle, ∞ CourseNana.COM	Idle, ∞ CourseNana.COM	Idle, ∞ CourseNana.COM	Idle, ∞ CourseNana.COM
Server 1 Group 1 CourseNana.COM	Idle, ∞ CourseNana.COM	Busy, (2,7) CourseNana.COM	Idle, ∞ CourseNana.COM	Idle, ∞ CourseNana.COM
Server 0 Group 0 CourseNana.COM	Idle, ∞ CourseNana.COM	Idle, ∞ CourseNana.COM	Idle, ∞ CourseNana.COM	Busy (10,13, 4) CourseNana.COM
Next arrival time CourseNana.COM	2 CourseNana.COM	10 CourseNana.COM	10 CourseNana.COM	11 CourseNana.COM
Event type CourseNana.COM	– CourseNana.COM	Arrival CourseNana.COM	Departure CourseNana.COM	Arrival CourseNana.COM
Master clock CourseNana.COM	0 CourseNana.COM	2 CourseNana.COM	7 CourseNana.COM	10 CourseNana.COM

6 CourseNana.COM

This event is the arrival of Job 2 for a Group 0 server. Since Server 0 is busy, this job will join Queue 0. The queue stores the 2-tuple (arrival time, service time) which is (11,9) for this job. We also need to update the arrival time of the next job, which is 12. CourseNana.COM	This event is the arrival of Job 3 for a Group 0 server. Since Server 0 is busy, this job will join Queue 0 with the job informa- tion (12,2). We also need to update the arrival time of the next job, which is 14. CourseNana.COM	This event is the departure of a killed job from Server 0. This job will be re-circulated to the dispatcher. Since both Group 1 servers are idle, this job can go to any one of them. We have chosen to send it to Server 1. Since this job requires 4 time units of service, it is scheduled to depart Server 1 at time 17. The 2- tuple (10,17) indicates that this job arrives at 10 and will depart at time 17. Since this is a departure from a Group 0 server, we will also need to check Queue 0, which has 2 jobs. So the job at the head of the queue will advance to Server 0 which is becoming available. This job requires 9 units of service time which exceeds the service time limit. So, the job will be killed at time 13 + 3 = 16 time units. CourseNana.COM	This event is the arrival of Job 4 for a Group 1 server. Since there is a Group 1 server available, this job goes to Server 2 directly. This job requires 8 units of service, so the job is scheduled to depart at time 22. We also need to update the arrival time of the next job, which is 15. CourseNana.COM	This event is the arrival of Job 5 for a Group 0 server. Since all Group 0 servers are busy, this job joins Queue 0. We also need to update the arrival time of the next job, which is 19. CourseNana.COM
– CourseNana.COM	– CourseNana.COM	– CourseNana.COM	– CourseNana.COM	– CourseNana.COM
(11,9) CourseNana.COM	(11,9), (12,2) CourseNana.COM	(12,2) CourseNana.COM	(12,2) CourseNana.COM	(12,2) (15,5) CourseNana.COM
Idle, ∞ CourseNana.COM	Idle, ∞ CourseNana.COM	Idle, ∞ CourseNana.COM	Busy (14,22) CourseNana.COM	Busy (14,22) CourseNana.COM
Idle, ∞ CourseNana.COM	Idle, ∞ CourseNana.COM	Busy (10,17) CourseNana.COM	Busy (10,17) CourseNana.COM	Busy (10,17) CourseNana.COM
Busy (10,13, 4) CourseNana.COM	Busy (10,13, 4) CourseNana.COM	Busy (11,16, 9) CourseNana.COM	Busy (11,16, 9) CourseNana.COM	Busy (11,16, 9) CourseNana.COM
12 CourseNana.COM	14 CourseNana.COM	14 CourseNana.COM	15 CourseNana.COM	19 CourseNana.COM
Arrival CourseNana.COM	Arrival CourseNana.COM	Departure CourseNana.COM	Arrival CourseNana.COM	Arrival CourseNana.COM
11 CourseNana.COM	12 CourseNana.COM	13 CourseNana.COM	14 CourseNana.COM	15 CourseNana.COM

7 CourseNana.COM

This event is the departure of a killed job from Server 0. This job will be re-circulated to the dispatcher. Since both Group 1 servers are busy, this job will join Queue 1. The job at the head of Queue 0 will advance to Server 0. This job requires only 2 units of service which is within the limit. We use a 2-tuple to remember this job because the job is within the time limit so it will not be killed. CourseNana.COM	This event is the departure of a finished job at Server 1. Since there is a job in Queue 1, the job will move into Server 1. CourseNana.COM	This event is the departure of a finished job at Server 0. This job will depart from the system permanently. We can tell that because it is a 2-tuple in the server rather than a 3-tuple. Since there is a job in Queue 0, the job will move into Server 0. CourseNana.COM	This event is the arrival of Job 6 for a Group 0 server. Since all Group 0 servers are busy, this job joins Queue 0. We also need to update the arrival time of the next job, which is 20. CourseNana.COM	This event is the arrival of Job 7 for a Group 1 server. Since all Group 1 servers are busy, this job joins Queue 1. Since there are no more jobs arriving, we update the next arrival time to ∞ CourseNana.COM	This event is the departure of a killed job from Server 0. This job will be re-circulated to the dispatcher. Since both Group 1 servers are busy, this job will join Queue 1. The job at the head of Queue 0 will advance to Server 0. This job requires only 3 units of service which is within the limit. We only need a 2-tuple to remember that this job arrives at time 19 and will depart at time 24. CourseNana.COM	This event is the departure of a finished job at Server 2. Since there is a job in Queue 1, the job will move into Server 2. CourseNana.COM	This event is the departure of a finished job at Server 0. Since Queue 0 is empty, Server 0 is now idle. CourseNana.COM	This event is the departure of a finished job at Server 1. The job at the head of Queue 1 advances to Server 1. The queue is now empty. CourseNana.COM
(11, 9) CourseNana.COM	- CourseNana.COM	- CourseNana.COM	- CourseNana.COM	(20, 6) CourseNana.COM	(20,6), (15,5) CourseNana.COM	(15,5) CourseNana.COM	(15,5) CourseNana.COM	- CourseNana.COM
(15,5) CourseNana.COM	(15,5) CourseNana.COM	- CourseNana.COM	(19,3) CourseNana.COM	(19,3) CourseNana.COM	- CourseNana.COM	- CourseNana.COM	- CourseNana.COM	- CourseNana.COM
Busy (14,22) CourseNana.COM	Busy (14,22) CourseNana.COM	Busy (14,22) CourseNana.COM	Busy (14,22) CourseNana.COM	Busy (14,22) CourseNana.COM	Busy (14,22) CourseNana.COM	Busy (20, 28) CourseNana.COM	Busy (20, 28) CourseNana.COM	Busy (20, 28) CourseNana.COM
Busy (10,17) CourseNana.COM	Busy (11, 26) CourseNana.COM	Busy (11, 26) CourseNana.COM	Busy (11, 26) CourseNana.COM	Busy (11, 26) CourseNana.COM	Busy (11, 26) CourseNana.COM	Busy (11, 26) CourseNana.COM	Busy (11, 26) CourseNana.COM	Busy (15, 31) CourseNana.COM
Busy (12,18) CourseNana.COM	Busy (12,18) CourseNana.COM

28 CourseNana.COM

31 CourseNana.COM

Table 2: “On paper simulation” illustrating the event updates of the system. CourseNana.COM

The above description has not explained what happens if an arrival event and a departure event are at the same time. We will leave it unspecified. If we ask you to simulate in trace driven mode, we will ensure that such situation will not occur. If the inter-arrival time and service time are generated randomly, the chance of this situation occurring is practically zero so you do not have to worry about it. CourseNana.COM

Table 3 summarises the arrival, departure, job classification and response times of the jobs in this example. In the table, we classify the jobs into 3 types:Job Arrival time Departure time Job classification CourseNana.COM

0271 1 10 17 r0 2 11 26 r0 3 12 18 0 414221 5 15 31 r0 6 19 24 0 720281 CourseNana.COM

Response time Group 0 within limit CourseNana.COM

Group 1 5 CourseNana.COM

6
8 CourseNana.COM

5
8 CourseNana.COM

Table 3: The arrival and departure times of the jobs in Example 0. CourseNana.COM

10 CourseNana.COM

4.2 Example 1: n=4, n0 =2, n1 =2 and Tlimit =3.5
For this example, we assume that the system has n = 4 servers. Both Groups 0 and 1 have 2 CourseNana.COM

servers each, i.e., n0 = n1 = 2. The service time-limit for Group 0 server is Tlimit = 3.5.
Table 4 shows the attributes of the jobs which will arrive at this system. Table 5 summaries the CourseNana.COM

results of the simulation. The mean response time of the completed Group 0 jobs is 23.9 = 5.975 CourseNana.COM

and the mean response time of the Group 1 jobs is 36.8 = 7.36. 5 CourseNana.COM

4 CourseNana.COM

Job index Arrival time 0 2.1 1 3.4 2 4.1 3 4.4 4 4.5 5 4.7 6 5.5 7 5.9 8 6.0 9 6.5 CourseNana.COM

10 7.6 11 8.1 CourseNana.COM

Service time required 5.2 4.1 3.1 3.9 3.4 4.4 4.7 4.1 2.5 8.6 4.1 2.6 CourseNana.COM

Server group indicated 1 1 0 0 0 1 1 0 0 1 0 0 CourseNana.COM

1. CourseNana.COM

Response time Group 0 within limit CourseNana.COM

3.1 CourseNana.COM

6.1
7.0 CourseNana.COM

6.7 CourseNana.COM

7.1
13.8 CourseNana.COM

Job Arrival time CourseNana.COM

Table CourseNana.COM

Departure time CourseNana.COM

7.3 7.5 7.2 CourseNana.COM

16.1 10.6 11.7 12.2 20.2 13.1 20.3 24.3 15.7 CourseNana.COM

4: Jobs for Example CourseNana.COM

Job classification CourseNana.COM

1 1 0 CourseNana.COM

r0 0 1 1 r0 0 1 r0 0 CourseNana.COM

Group 1 5.2 4.1 CourseNana.COM

Table 5: The arrival and CourseNana.COM

7.6 departure times of the jobs in Example 1. CourseNana.COM

11 CourseNana.COM

4.3 Example 2: n=4, n0 =1, n1 =3 and Tlimit =3.5
This example is identical to Example 1 except that n0 = 1. Table 6 summaries the results of the CourseNana.COM

simulation. The mean response time of the completed Group 0 jobs is 44.9 = 11.225 and the mean 4 CourseNana.COM

response time of the Group 1 jobs is 29.8 = 5.96. It is not surprising that the mean response time 5 CourseNana.COM

of the completed Group 0 jobs has gone up while that of Group 1 jobs has gone down. This is because in this example, there are fewer servers in Group 0. CourseNana.COM

Job Arrival time CourseNana.COM

Departure time Job classification CourseNana.COM

7.3 1 7.5 1 7.2 0 CourseNana.COM

14.6 r0 14.1 0 9.1 1 12.0 1 21.7 r0 20.1 0 16.1 1 27.7 r0 26.2 0 CourseNana.COM

Response time Group 0 within limit CourseNana.COM

3.1 CourseNana.COM

9.6
4.4 CourseNana.COM

6.5 CourseNana.COM

14.1
9.6 CourseNana.COM

Group 1 5.2 4.1 CourseNana.COM

Table 6: The arrival and CourseNana.COM

18.1 departure times of the jobs in Example 2. CourseNana.COM

12 CourseNana.COM

5 Project description CourseNana.COM

This project consists of two main parts. The first part is to develop a simulation program for the system in Figure 1. The system has already been described in Section 3 and illustrated in Section 4. In the second part, you will use the simulation program that you have developed to solve a design problem. CourseNana.COM

5.1 Simulation program CourseNana.COM

Note that your simulation program must be a general program which allows different param- eter values to be used. When we test your program, we will vary the parameter values. You can assume that we will only use valid inputs for testing. CourseNana.COM

For the simulation, you can always assume that the system is empty initially. CourseNana.COM

Hint: Do not write two separate programs for the random and trace modes because they share a lot in common. A few if–else statements at the right places are what you need to have both modes in one program. CourseNana.COM

5.1.1 The random mode
When your simulation is working in the random mode, it will generate the inter-arrival times CourseNana.COM

and the workload of a job in the following manner. CourseNana.COM

1. We use {a1,a2,...,ak,...,...} to denote the inter-arrival times of the jobs arriving at the dispatcher. These inter-arrival times have the following properties: CourseNana.COM

(a) Each ak is the product of two random numbers a1k and a2k , i.e ak = a1k a2k ∀k = 1, 2, ... (b) The sequence a1k is exponentially distributed with a mean arrival rate λ requests/s. CourseNana.COM

(c) The sequence a2k is uniformly distributed in the interval [a2l,a2u]. CourseNana.COM

Note: The easiest way to generate the inter-arrival times is to multiply an exponentially distributed random number with the given rate and a uniformly distributed random number in the given range. It would be more difficult to use the inverse transform method in this case, though it is doable. CourseNana.COM

13 CourseNana.COM

5.1.2 CourseNana.COM

Note that this probability density function has 2 parameters: α1 and η1. You can assumethatα1 >0andη1 >1. CourseNana.COM

The trace mode CourseNana.COM

2. The workload of a job is characterised by two attributes: the server group (i.e., Group 0 or 1) that the job is to be sent to, and the service time of the job. CourseNana.COM

(a) The first step to determine which server group to send the job to. This decision is made by a parameter p0 ∈ (0, 1): CourseNana.COM

• Prob[a job is indicated by the user for a Group 0 server] = p0 CourseNana.COM

• Prob[a job is indicated by the user for a Group 1 server] = 1 − p0 CourseNana.COM

For example, if p0 is 0.8, then there is a probability of 0.8 that a job is indicated for a Group 0 server and a probability of 0.2 for a Group 1 server. The server group for each job is independently generated. CourseNana.COM
(b) Once the server group for a job has been generated, the next step is to generate its service time. The service time distribution to be used depends on the server group. CourseNana.COM

i. If a job is indicated to go to a Group 0 server, its service time has the probability density function (PDF) g0(t): CourseNana.COM

Note that this probability density function has 3 parameters: α0, β0 and η0. You CourseNana.COM

canassumethatβ0 >α0 >0andη0 >1.
ii. If a job is indicated to go to a Group 1 server, its service time has PDF: CourseNana.COM

When your simulation is working in the trace mode, it will read the list of inter-arrival times, the list of service times and server groups from two separate ASCII files. We will explain the format of these files in Sections 6.1.3 and 6.1.4. CourseNana.COM

An important requirement for the trace mode is that your program is required to simulate until all jobs have departed from the system. You can refer to Table 2 for an illustration. CourseNana.COM

5.2 Determining the value of n0 that minimises a weighted mean re- sponse time CourseNana.COM

After writing your simulation program, your next step is to use your simulation program to de- termine the number of Group 0 servers n0 that minimises a weighted mean response time. CourseNana.COM

For this design problem, you will assume the following parameter values: 14 CourseNana.COM

γ = α−η1 11 CourseNana.COM

• Total number of servers: n = 10
• The service time limit Tlimit for Group 0 servers is 3.3.
• For inter-arrival times: λ = 3.1, a2l = 0.85, a2u = 1.21
• The probability p0 that a job is indicated for a Group 0 server is 0.74.
• The service time for a job which is indicated for Group 0: α0 = 0.5, β0 = 5.7, η0 = 1.9. • The service time for a job which is indicated for Group 1: α1 = 2.7 and η1 = 2.5.
The aim of the design problem is to minimise the weighted response time: CourseNana.COM

w0T0 + w1T1 (3) CourseNana.COM

where T0 is the mean response time of the completed Group 0 jobs and T1 is the mean response time of Group 1 jobs. The value of the weights w0 and w1 are fixed for this design problem, and they are given by 0.83 and 0.059 respectively. As an example, if T0 = 1.86 and T1 = 56.7, then the weighted mean response time is 0.83 × 1.86 + 0.059 × 56.7. The rationale behind choosing these weights is explained in Remark 3. CourseNana.COM

The aim of the design problem is to find the value of n0 to minimise this weighted response time. Note that we assume that there is at least a server in each group, therefore 1 ≤ n0 ≤ n − 1. CourseNana.COM

In solving this design problem, you need to ensure that you use statistically sound methods to compare systems. You will need to consider simulation controls such as length of simulation, number of replications, transient removals and so on. You will need to justify in your report on how you determine the value of n0. CourseNana.COM

Remark 3 For the parameters above, out of all the jobs that are not re-circulated, 73.65% are CourseNana.COM

Group 0 jobs within the time limit and 26.35% are Group 1 jobs. The average service time for CourseNana.COM

Group 0 jobs within the time limit is 0.887 and that for Group 1 jobs is 4.5. The weights w0 CourseNana.COM

and w1 are computed, respectively, from 0.7365 and 0.2635 . So the weights take into account the 0.887 4.5 CourseNana.COM

frequency of a class of jobs. We also use the inverse service time as a weight so that we are not giving too much advantage to Class 1 jobs as they have large service time requirement. CourseNana.COM

6 Testing your simulation program CourseNana.COM

In order for us to test the correctness of your simulation program, we will run your program using a number of test cases. The aim of this section is to describe the expected input/output file format and how the testing will be performed. CourseNana.COM

Each test is specified by 4 configurations files. We will index the tests from 0. If 12 tests are used, then the indices for the tests are 0, 1, 2, ...., 11. The names of the configuration files are: CourseNana.COM

For Test 0, the configuration files are mode_0.txt, para_0.txt, interarrival_0.txt and service_0.txt. The files are similarly named for indices 1, 2, 3, .., 9. CourseNana.COM
For Test 10, the configuration files are mode_10.txt, para_10.txt, interarrival_10.txt and service_10.txt. The files are similarly named if the test index is a 2-digit number. CourseNana.COM

We will refer to these files using the generic names mode *.txt, para *.txt etc. We will describe the format of the configuration files in Section 6.1 CourseNana.COM

Each test should produce 2 output files whose format will be described in Section 6.2. We will explain how testing will be conducted in Sections 6.3 and 6.5. CourseNana.COM

15 CourseNana.COM

6.1 Configuration file format CourseNana.COM

Note that Test 0 is the same as Example 0 discussed in Section 4.1. We will use that test to illustrate the file format. CourseNana.COM

6.1.1 mode *.txt
This file is to indicate whether the simulation should run in the random or trace mode. The file CourseNana.COM

contains one string, which can either be random or trace. 6.1.2 para *.txt CourseNana.COM

If the simulation mode is trace, then this file has three lines. The first line is the value of n (= total number of servers), the second line has the value of n0 (= number of Group 0 servers) and the third line has the value of Tlimit. If the test is Example 0 in Section 4.1, then the contents of this file are: CourseNana.COM

3 1 3 CourseNana.COM

These values are in the sample file para_0.txt. CourseNana.COM

If the simulation mode is random, then the file has four lines. The meaning of the first three lines is the same as above. The last line contains the value of time_end, which is the end time of the simulation. The contents of the sample file para_4.txt are shown below where the last line indicates that the simulation should run until 200. CourseNana.COM

5
2 3.1 200 CourseNana.COM

You can assume that we will only give you valid values. You can expect n to be a positive integer greater than 2, n0 ≥ 1 and Tlimit > 0. For time_end, it is a strictly positive integer or floating point number. CourseNana.COM

6.1.3 interarrival *.txt CourseNana.COM

The contents of the file interarrival *.txt depend on the mode of the test. If mode is trace, then the file interarrival *.txt contains the interarrival times of the jobs with one interarrival time occupying one line. You can assume that the list of interarrival times is always positive. For Example 0 in Section 4.1, the arrival times are [2,10,11,12,14,15,19,20] which means the inter-arrival times are [2, 8, 1, 1, 2, 1, 4, 1]. For this example, the inter-arrival times will be specified by a file (see sample file interarrival 0.txt) whose contents are: CourseNana.COM

Note that each row has 2 entries, and they correspond to the service time (first entry) and the server group (second entry). For example, the first job has a service time of 5 and is indicated for a Group 1 server. You will find a one-to-one correspondence between the content of service 0.txt and the information in Table 1. You can assume that the first entry is a positive float, and the second entry in each row is either 0 or 1. CourseNana.COM

For random mode, the file service *.txt contains three lines. For example, the contents of service 4.txt are: CourseNana.COM

0.7
1.2 3.6 2.1
2.8 4.1

The number in the first line is p0. The three numbers in the second line are α0, β0 and η0. Finally, the two numbers in the third line are α1 and η1. You can assume all these values are valid. CourseNana.COM

You can assume that the data we provide for trace mode are consistent in the following way: the number of inter-arrival times and the number of lines of service times are equal. CourseNana.COM

6.2 Output file format CourseNana.COM

In order to test your simulation program, we need two output files per test. One file contains two mean response times. The other file contains the arrival times, departure times and job clas- sification information similar to Columns 2–4 in Table 3. CourseNana.COM

For random mode, the mean response time should be calculated using those jobs that have permanently departed the system by time_end. In other words, for those jobs which are still in the queue or are being processed in the server at time_end, you do not include these jobs when calculating the mean response time. CourseNana.COM

Note that you do not have to consider transient removal for the mean response before you write the result to the output file. However, you should consider transient removal when you do your design. CourseNana.COM

17 CourseNana.COM

Two mean response times should be written to a file whose filename has the form mrt_*.txt. For Example 0 in Section 4.1, the expected contents of this file are: CourseNana.COM

5.5000 7.0000

where the two numbers correspond to the mean response times of, respectively, the completed Class 0 and Class 1 jobs. CourseNana.COM

The other file dep_*.txt contains the departure type and classification of the jobs. For Ex- ample 1 in Section 4.2, the expected contents of this file are: CourseNana.COM

4.1000 7.2000 0
2.1000 7.3000 1
3.4000 7.5000 1
4.5000 10.6000 0
4.7000 11.7000 1
5.5000 12.2000 1
6.0000 13.1000 0
8.1000 15.7000 0
4.4000 16.1000 r0
5.9000 20.2000 r0
6.5000 20.3000 1
7.6000 24.3000 r0

Note the following requirements for the file: CourseNana.COM

Each line contains 3 entries. CourseNana.COM
For each line, the first entry is the arrival time of the job to the system (i.e., as a new job), the second entry is its permanent departure time from the system and the third entry is a classification of the job in the same way as Column 4 in Table 3. The possible classifications for a job are 0, r0 and 1. You should be able to reconcile the contents of the above file with Example 1 in Section 4.2. CourseNana.COM
The jobs must be ordered according to ascending completion times. CourseNana.COM
If the simulation is in the trace mode, we expect the simulation to finish after all jobs have been processed. Therefore, the number of lines in dep_*.txt should be equal to the number of jobs. CourseNana.COM
If the simulation is in the random mode, the file should contain all the jobs that have been completed by time_end. CourseNana.COM

All mean response times, arrival times and completion times in mrt_*.txt and dep_*.txt should be printed as floating point numbers to exactly 4 decimal places. Note that your simulation should be performed in full floating point precision and you should only do the rounding when you are writing the output files. CourseNana.COM

6.3 The testing framework CourseNana.COM

When you submit your project, you must include a Linux bash shell script with the name run_test.sh so that we can run your program on the CSE system. This shell script is required because you are allowed to use a computer language of your choice. CourseNana.COM

Let us first recall that each test is specified by four configuration files and should produce two output files. For example, test number 0 is specified by the configuration files mode_0.txt, interarrival_0.txt, service_0.txt and para_0.txt; and test number 0 is expected to produce CourseNana.COM

18 CourseNana.COM

the output files mrt_0.txt and dep_0.txt.
We will use the following directory structure when we do testing. CourseNana.COM

the directory containing run test.sh config/ CourseNana.COM

output/
We will put all the configuration files for all the tests in the sub-directory config/. You should CourseNana.COM

write all the output files to the sub-directory output/. CourseNana.COM

To run test number 0, we use the shell command: CourseNana.COM

./run_test.sh 0

The expected behaviour is that your simulation program will read in the configuration files for test number 0 from config/, carry out the simulation and create the output files in output/. CourseNana.COM

Similarly, to run test number 1, we use the shell command: CourseNana.COM

./run_test.sh 1
This means that the shell script run_test.sh has one input argument which is the test number CourseNana.COM

to be used. CourseNana.COM

Let us for the time being assume that you use Python (Version 3) to write your simulation program and you call your simulation program main.py. If the file main.py is in the same directory as run_test.sh, then run_test.sh can be the following one-line shell script: CourseNana.COM

python3 main.py $1

The shell script will pass the test number (which is in the input argument $1) to your simula- tion program main.py. This also implies that your simulation program should accept one input argument which is the test number. CourseNana.COM

If you use C, C++ or Java, then your run_test.sh should first compile the source code and then run the executable. You should of course pass the test number to the executable as an input. CourseNana.COM

You can put your code in the same directory that contains run_test.sh or in a subdirectory below it. For example, you may have a subdirectory src/ for your code like the following: CourseNana.COM

the directory containing run test.sh config/ CourseNana.COM

output/ src/ CourseNana.COM

6.4 Sample files CourseNana.COM

You should download the file sample project files.zip from the project page on the course website. The zip archive has the following directory structure: CourseNana.COM

19 CourseNana.COM

Base directory containing cf output with ref.py, run test.sh and main.py config/ CourseNana.COM

output/ ref/ CourseNana.COM

Details on the zip-archive are: CourseNana.COM

The sub-directory config/ contains configuration files that you can use for testing. CourseNana.COM
- – The files mode_0.txt, mode_1.txt, ..., and mode_7.txt. Note that Tests 0–3 are for trace mode while Tests 4–6 are for random mode. CourseNana.COM
- – The files para_*.txt, interarrival_*.txt and service_*.txt for * from 0 to 6, as the input to the simulation. CourseNana.COM
- – Note that Tests 0–2 are the same as Examples 0–2 in Section 4. CourseNana.COM
The sub-directory output/ is empty. Your simulation program should place the output files CourseNana.COM

in this sub-dirrectory. CourseNana.COM
The sub-directory ref/ contains the expected simulation results. CourseNana.COM

– The files mrt_*_ref.txt and dep_*_ref.txt for * from 0 to 6, as the reference files for the output. For Tests 0–3, you should be able to reproduce the results in mrt_*_ref.txt and dep_*_ref.txt. However, since Tests 4–6 are in random mode, you will not be able to reproduce the results in the output files. They have been provided so that you can check the expected format of the files. CourseNana.COM

The Python file cf_output_with_ref.py which illustrates how we will compare your output against the reference output. This file takes in one input argument, which is the test number. For example, if you want to check your simulation outputs for test 0, you use: CourseNana.COM
```
  python3 cf_output_with_ref.py 0
```
Note the following: CourseNana.COM
- – The file cf_output_with_ref.py expects the directory structure shown earlier. CourseNana.COM
- – For trace mode, we will check your mean response times, the departure times and classifications. Note that we are not looking for an exact match but rather whether your results are within a valid tolerance. The tolerance for the trace mode is 10−3 which is fairly generous for numbers with 4 decimal places. CourseNana.COM
- – For random mode, we will only check the mean response times. You can see from the sample file that we check whether the mean response time is within an interval. We obtain this interval using the following method: (i) we first simulate the system many times; (ii) we then use the simulation results to estimate the maximum and minimum mean response times; (iii) we use the estimated maximum and minimum values to form an interval; (iv) in order to provide some tolerance due to randomness, we enlarge this interval further. CourseNana.COM
- – Note that we use a very generous tolerance so if your mean response time does not pass the test, then it is highly likely that your simulation program is not correct. CourseNana.COM
The files run_test.sh and main.py as mentioned in Section 6.3. CourseNana.COM

COMP9334 Capacity Planning of Computer Systems and Networks Project: Computing clusters

Get in Touch with Our Experts