1. Homepage
  2. Programming
  3. [2021] PSU - CMPSC 473 Operating Systems - Project4 A Simple MapReduce-Style Wordcount Application

[2021] PSU - CMPSC 473 Operating Systems - Project4 A Simple MapReduce-Style Wordcount Application

Engage in a Conversation
PSUCMPSC 473Operating SystemsThe Pennsylvania State UniversityProgramming HelpMapReduce

Lab 4: A Simple MapReduce-Style Wordcount Application CMPSC 473, SPRING 2021 CourseNana.COM


CourseNana.COM

1 Purpose and Background CourseNana.COM

This project is designed to give you experience in writing multi-threaded programs by implementing a simplified MapReduce-style wordcount application. By working on this project: CourseNana.COM

• Youwilllearntowritemulti-threadedcodethatcorrectlydealswithraceconditions. CourseNana.COM

• You will carry out a simple performance evaluation to examine the performance impact of (i) the degree of parallelism in the mapper stage and (ii) the size of the shared buffer which the two stages of your application will use to communicate. CourseNana.COM

As covered briefly in Lecture 22, the wordcount application takes as input a text file and produces as output the counts for all uniquely occurring words in the input file ar- ranged in an alphabetically increasing order. We will assume that the words within our input files will only contain letters of the English alphabet and the digits 0-9 (i.e., no punc- tuation marks or other special characters). Our wordcount will consist of two stages. The CourseNana.COM

first stage, called ”mapper,” reads individual words and produces key-value pairs of the form (word, 1). The second stage, called the ”reducer” consumes these, groups them by key, and sums up the counts in each group to produce the final output. Notice how the first stage can be parallelized. That is, the mapper stage can consist of multiple mapper threads, each working on its own separate portion of the input file. The reducer stage only contains a single thread which runs concurrently with the mapper threads. The communication between the mapper threads and the reducer thread occurs via a shared in-memory FIFO buffer. Figure 1 summarizes these ideas. CourseNana.COM

2 Starter code: unsynchronized word-count CourseNana.COM

We are providing you starter code that implements a preliminary version of wordcount that works correctly in the absence of multi-threading. To appreciate this, you should confirm that this code is able to pass the ”serialize” test in which accesses to the buffer occur in a strictly serial manner (i.e., an operation is issued only upon the completion of the previous operation). However, our implementation is not thread-safe. That is, it has race conditions. You will need to enhance this code to create a thread-safe version of wordcount. CourseNana.COM

3 What you need to implement CourseNana.COM

Take the time to skim all the starter source code files. All your implementation will be confined to the files helper.c and helper.h. Specifically, you need to consider enhancing the following functions (recall that the provided versions of these functions work correctly in the absence of multi-threding). Though we have implemented the buffer_create function for you, you can enhance it to make any kind of initialization. CourseNana.COM

state_t* buffer_create(int capacity):createsanin-memorybufferwith the specified capacity in bytes). Returns a pointer to state_t, see definition in helper.h. This function will be called once at the beginning, you can do any kind of initialization in it based on your implementation. CourseNana.COM

enum buffer_status buffer_send(state_t *buffer, void* data): writes data to the buffer. In case the buffer is full, the function waits till the buffer has space to write the new data. Returns BUFFER_SUCCESS for successfully writing data to the buffer, CLOSED_ERROR if the buffer is closed, and BUFFER_ERROR on encountering any other error. The size of data can be found out using the function get_msg_size(data). CourseNana.COM

enum buffer_status buffer_receive(state_t* buffer, void** data): Reads data from the given buffer and stores it in the function’s input parameter, data (Note that it is a double pointer). This is a blocking call i.e., the function only returns on a successful completion of receive In case the buffer is empty, the function waits till the buffer has some data to read. CourseNana.COM

·       Theotherexecutable,wordcount_sanitize,willbeusedtohelpdetectdataraces in your code. It can be used with any of the test cases by replacing ./wordcount with ./wordcount_sanitize. Any detected data races will be output to the ter- minal. You should implement code that does not generate any errors or warnings from the data race detector. CourseNana.COM

·       Valgrind is being used to check for memory leaks, report uses of uninitialized val-
ues, and detect memory errors such as freeing a memory space more than once. To
run a valgrind check by yourself, use the command:
valgrind -v --leak-check=full ./wordcount [test case name] [iters]. Note that wordcount sanitize should not be run with valgrind as the tools do CourseNana.COM

not behave well together. Only the wordcount executable should be used with val- grind. Valgrind will issue messages about memory errors and leaks that it detects for you to fix them. You should implement code that does not generate any valgrind errors or warnings. CourseNana.COM

To run all the supplied test cases, simply run the following command in the project folder: ./test.py. This runs all of the provided tests to autograde your assignment. CourseNana.COM

6 Performance evaluation CourseNana.COM

Your performance evaluation will study the impact of two parameters: (i) the number of mapper threads and (ii) the size of the shared memory buffer via which the mapper threads communicate with the reducer thread. For one of the input files provided by us, you will report how execution time varies with (i) and (ii) in the ranges described below. CourseNana.COM

For your performance evaluation, you will be using wordcount in a ”custom” mode (wherein the custom_eval() function is used). You may additionally use this mode
for any custom input files you may want to test your code with. Here is the usage
of
wordcount in this custom mode: ./wordcount custom eval #mapperthreads buffersize infile outfile. For example if you want to run wordcount with 2 mapper threads, a buffer size of 100B, input file perf eval.txt and output file perf out.txt, you will execute: ./wordcount custom eval 2 100 perf eval.txt perf out.txt. CourseNana.COM

To measure the execution time, simply use the time command. For example: time ./wordcount custom eval 2 100 perf eval.txt perf out.txt. You will con-
duct your performance evaluation on a W204 machine on the input file named
perf eval.txt that we are providing you. You are to vary the parameters (i) and (ii) as follows: CourseNana.COM

• numberofmapperthreads∈{1,2,4,8,16,32}withsizeofbuffer1000B, • sizeofbuffer∈{100B,1000B,10000B}with8mapperthreads. CourseNana.COM

You will write your results in a README file. The format of the README file as follows. The README file appears in the root of the repository. You should begin by filling in the name and email of each member of your group on the specified lines at the beginning of the README. The README has several sections for you to fill in. Each section has some text describing what you should write in that section. You should replace that text with your CourseNana.COM

own response for that section. For reporting your experimental results, you will find 9 lines starting with ’@Test:’. Each of these lines has the format: CourseNana.COM

  @Test:  <n maps>, <buffer size>, <minimum execution time in us> CourseNana.COM

For example: CourseNana.COM

  @Test: 1, 100, <minimum execution time in us> CourseNana.COM

Oneachoftheselinesreplacethe<minimum execution time in microseconds> field with the execution time (in microseconds) you found for this choice of parameters. You are to take the minimum of 5 runs for each such number that you report. Do NOT in- clude units (us). Do NOT edit any other part of this line. For example, if the minimum time value was 20 for the experiment with 1 mapper and buffer size of 100 B: CourseNana.COM

  @Test: 1, 100, 20 CourseNana.COM

  CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
PSU代写,CMPSC 473代写,Operating Systems代写,The Pennsylvania State University代写,Programming Help代写,MapReduce代写,PSU代编,CMPSC 473代编,Operating Systems代编,The Pennsylvania State University代编,Programming Help代编,MapReduce代编,PSU代考,CMPSC 473代考,Operating Systems代考,The Pennsylvania State University代考,Programming Help代考,MapReduce代考,PSUhelp,CMPSC 473help,Operating Systemshelp,The Pennsylvania State Universityhelp,Programming Helphelp,MapReducehelp,PSU作业代写,CMPSC 473作业代写,Operating Systems作业代写,The Pennsylvania State University作业代写,Programming Help作业代写,MapReduce作业代写,PSU编程代写,CMPSC 473编程代写,Operating Systems编程代写,The Pennsylvania State University编程代写,Programming Help编程代写,MapReduce编程代写,PSUprogramming help,CMPSC 473programming help,Operating Systemsprogramming help,The Pennsylvania State Universityprogramming help,Programming Helpprogramming help,MapReduceprogramming help,PSUassignment help,CMPSC 473assignment help,Operating Systemsassignment help,The Pennsylvania State Universityassignment help,Programming Helpassignment help,MapReduceassignment help,PSUsolution,CMPSC 473solution,Operating Systemssolution,The Pennsylvania State Universitysolution,Programming Helpsolution,MapReducesolution,