QUESTION 4 (20 Marks)
Design an efficient parallel algorithm for LU decomposition (using Gaussian elimination without pivoting) of the following type of matrix on a distributed memory machine.
??00 ??01
??10 ??11 ??12
??20 ??21 ??22 ??23
??30 ??31 ??32 ??33 ??34
??40 ??41 ??42 ??43 ??44 ??45
??50 ??51 ??52 ??53 ??54 ??55 ??56
??60 ??61 ??62 ??63 ??64 ??65 ??66 ??67
??70 ??71 ??72 ??73 ??74 ??75 ??76 ??77 ??78
??80 ??81 ??82 ??83 ??84 ??85 ??86 ??87 ??88
In this type of matrix the elements in the upper triangle are all zeros except ????,??+_1_s which are just above the diagonal.
In the algorithm design you need to assume that the size of the matrix is N by N and the total number of processors is p, and you must seriously consider load balancing and communication minimization.
You must answer the following questions:
1) How many (addition and multiplication) operations are needed to solve the problem for this specific type of matrix?
2) How do you partition the matrix in your design and why? Must give the reasons!
3) Write a pseudo code for your parallel algorithm. When describing your parallel algorithm, you need to use a similar way to that on pages 7 and 8 in week 8 lecture notes lecture05-3-22.pdf on Canvas.