Homepage
Exam
[2022] COMP5426 Parallel and Distributed Computing - Final Exam Q5 Parallel Reduction Operation

[2022] COMP5426 Parallel and Distributed Computing - Final Exam Q5 Parallel Reduction Operation

This question has been solved

Engage in a Conversation

QUESTION 5 (20 Marks) CourseNana.COM

Consider a CUDA kernel routine reduceOp() below. This routine performs a parallel reduction operation, i.e., parallel summation of an array of integers with n elements. The routine does not make use of shared memory in each SM (or Stream Multiprocessor) and the reduction is done in-place, which means that the values in global memory are replaced by partial sums at each step. CourseNana.COM

CourseNana.COM

__global__ void reduceOp(int *g_idata, int *g_odata, unsigned int n) CourseNana.COM

{ CourseNana.COM

// set thread ID CourseNana.COM

unsigned int tid = threadIdx.x; CourseNana.COM

unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x; CourseNana.COM

// convert global data pointer to the local pointer of this block CourseNana.COM

int *idata = g_idata + blockIdx.x*blockDim.x; CourseNana.COM

// boundary check CourseNana.COM

if(idx >= n) return; CourseNana.COM

// in-place reduction in global memory CourseNana.COM

for (int stride = 1; stride < blockDim.x; stride *= 2) { CourseNana.COM

int index = 2 * stride * tid; CourseNana.COM

if (index < blockDim.x) { CourseNana.COM

idata[index] += idata[index + stride]; CourseNana.COM

} CourseNana.COM

// synchronize within threadblock CourseNana.COM

__syncthreads(); CourseNana.COM

} CourseNana.COM

// write result for this block to global mem CourseNana.COM

if (tid == 0) g_odata[blockIdx.x] = idata[0]; CourseNana.COM

} CourseNana.COM

CourseNana.COM

You need to answer the following questions after analyzing the routine. CourseNana.COM

1) Describe how the parallel algorithm adopted by this routine works. CourseNana.COM

2) Discuss if there are any other shortcomings in this algorithm in addition to not using shared memory in each SM. CourseNana.COM

3) Based on your discussion, modify the routine to make it work more efficiently still for in-place reduction in global memory without making use of shared memory in each SM. CourseNana.COM

4) Justify your modifications, i.e., discuss why your modifications can enhance the performance of the original routine. CourseNana.COM

CourseNana.COM

Get the Solution to This Question

WeChat (微信)

Last: [2022] COMP5426 Parallel and Distributed Computing - Final Exam Q4 LU Decomposition

Next: [2019] INFR11098 SECURE PROGRAMMING - Final Exam - Q1 Environment Concerns

COMP5426代写,Parallel and Distributed Computing代写,The University of Sydney代写,COMP5426代编,Parallel and Distributed Computing代编,The University of Sydney代编,COMP5426代考,Parallel and Distributed Computing代考,The University of Sydney代考,COMP5426help,Parallel and Distributed Computinghelp,The University of Sydneyhelp,COMP5426作业代写,Parallel and Distributed Computing作业代写,The University of Sydney作业代写,COMP5426编程代写,Parallel and Distributed Computing编程代写,The University of Sydney编程代写,COMP5426programming help,Parallel and Distributed Computingprogramming help,The University of Sydneyprogramming help,COMP5426assignment help,Parallel and Distributed Computingassignment help,The University of Sydneyassignment help,COMP5426solution,Parallel and Distributed Computingsolution,The University of Sydneysolution,