DTS205TC High Performance Computing - Lab 2 Networks and Lab 3 Monte Carlo to calculate π.
DTS205TC High Performance Computing Lab 2
Overview
Chatting is a usual web application. We simulate two chatters, A and B, with program(s). Among them, A reads the user input and sends it to B; B reverses the message and converts it to uppercase, and prints it to the screen. The basic process is as follows:
message = ''
def A():
global message
message = input()
def B():
global message
message = message[::-1].upper()
if __name__ == '__main__': A()
B() print(message)
Among them, we use a global variable to achieve message passing between A and B. You need to modify the program to ensure that the program still exhibits the same visible behavior to the user, but its internal implementation can be different:
- 1) Place A and B in different sub-processes, and use the SharedMemory of Python's multiprocessing module to transmit the message; (5 marks)
- 2) Place A and B in different sub-processes, and use the Pipe of Python's multiprocessing module to transmit the message; (5 marks)
- 3) Place A and B in one process on client side, and implement an UDP server in another program to transmit the message; (5 marks)
- 4) Place A and B in one process on client side, and implement an TCP server in another program to transmit the message. The TCP server also has two sub-processes, each of which listens to a unique port and interact with A or B. The TCP server uses Pipe to forward the message between its two sub- processes. (5 marks)
Note: The practice is designed to deepen your understanding on networks. Do not call off-the-shelf libraries.
DTS205TC High Performance Computing Lab 3
Overview
Using the Monte Carlo method to calculate π is a classic example of parallel computing. A serial version of the program is as follows:
import numpy as np
# try to hit the unit circle
def hit_circle(num):
# sampling in square
x = np.random.uniform(low=-1, high=1, size=(num,)) y = np.random.uniform(low=-1, high=1, size=(num,))
# hit or not
h = (np.square(x) + np.square(y)) <= 1
return h
# calculate pi with hit record
def calc_pi(sam):
return 4 * np.sum(sam) / sam.shape[0]
M = 10 ** 4 T=4
# do sampling in batch
hits = np.array([])
for i in range(T):
hits = np.hstack((hits, hit_circle(M)))
print(f'pi={calc_pi(hits)}')
It is worth noting that the sampling process in the above code is done in batches. That is, it is divided into T steps, and M samples are taken in each step, which can be used as the basis for our parallelization.
1) Based on the master-slave method, use mpi4py to implement an MPI version of Monte Carlo to calculate π. (5 marks)
TIPS: The scaffolding code is as follows (the function calc_pi, hit_circle can be found in the serial version)
import numpy as np from mpi4py import MPI
# environment info
comm = MPI.COMM_WORLD rank = comm.Get_rank() nproc = comm.Get_size()
# number of tasks
T = nproc - 1
# total num. of sampling
M = 10 ** 2
if rank == 0: # master
assert nproc > 1
# ======================================================== # ==== add your own code here ============================ # =======================================================
else: # slave
# ========================================================
The result of a run is:
2) Change the number of tasks, test the running time of program 2) for 5 times respectively, and then fill in the table below. The total number of samples for all tasks is recommended (not compulsory) to
be set to N=10 . In this way, when there are K tasks, each task needs to perform M=N/K sampling.
Num. of Processes 1
Running Time (s)
Max number of hardware threads on your computer
analyze why average runtimes decrease, increase, or remain unchanged as the number of processes increases. (5 marks)
Note:
• The total number of samples can be adjusted to make the phenomenon clear, based on your machine configurations.
• If you use Linux, you can use the time command for timing; If you use a different OS, please find a similar command by yourself.
3) Based on the work pool method, use mpi4py to implement an MPI version of Monte Carlo to calculate π. (5 marks)
TIPS: The scaffolding code is as follows (the function calc_pi, hit_circle can be found in the serial version). It takes the total number of tasks as the input parameter.
import sys
import numpy as np from mpi4py import MPI
# environment info
comm = MPI.COMM_WORLD rank = comm.Get_rank() nproc = comm.Get_size()
# allocate window for completed tasks
datatype = MPI.INT
itemsize = datatype.Get_size() # get size of datatype
num_tasks_done = np.array(0, dtype='i') # buffer N = num_tasks_done.size
win_size = N * itemsize if rank == 0 else 0
win = MPI.Win.Allocate(win_size, comm=comm) # allocate window
# number of tasks
T = int(sys.argv[1]) M=10**2 #sizeofsampling
if rank == 0: # manager assert nproc > 1
# ========================================================
# ==== add your own code here ============================
# ========================================================
else: # worker
# ========================================================
# ==== add your own code here ============================
# ========================================================
win.Free()
The result of a run is:
It is important to note the tasks performed by each worker are uncertain and may vary from run to run.
4) Fix the number of worker tasks to the maximum number of threads your hardware can support. Change the workload (number of samples) per task, test the overall running time of program 4) respectively, and then fill in the table below. The total number of samples for all tasks is
7 recommended (not compulsory) to be set to N=10 .
Workload Size
Analyze why runtimes decrease, increase, or remain unchanged as the Workload Size increases. (5 marks)
Note: The total number of samples and the workload sizes in above table can be adjusted to make the phenomenon clear, based on your machine configurations.