QUESTION 2 (20 Marks)
Modify the following three nested loops using loop unrolling techniques to improve the performance.
for (i=0; i<N; i++)
for (j=i; j<N; j++)
for (k=0; k<M; k++)
C[i,j] = C[i,j] + A[i,k] * B[j,k];
where A and B are 2D matrices of size N by M and C is a 2D matrix of size N by N.
In your solution, (1) the value of the unrolling factor must be set to 3 and (2) you must take the computational intensity into consideration, i.e., once the data items (especially the input data items) are loaded into registers, they will be used multiple times before being replaced.