CourseNana | SP Assessed Exercise 2 Concurrent Dependency Discoverer

SP Assessed Exercise 2
CourseNana.COM

Concurrent Dependency Discoverer CourseNana.COM

Large-scale systems developed in C and C++ tend to include a large number of .h files, both of a system variety (enclosed in < >) and non-system (enclosed in “ ”). The make utility and Makefiles are a convenient way to record dependencies between source files, and to minimize the amount of work that is done when the system needs to be rebuilt. Of course, the work will only be minimized if the Makefile exactly captures the dependencies between source and object files. CourseNana.COM

Some systems are extremely large, and it is difficult to keep the dependencies in the Makefile correct as many people make changes at the same time. Therefore, there is a need for a program that can crawl over source files, noting any #include directives, and recurse through files specified in #include directives, and finally generate the correct dependency specifications. CourseNana.COM

#include directives for system files (enclosed in < >) are normally NOT specified in dependencies. Therefore, our system will focus on generating dependencies between source files and non-system #include directives (enclosed in “ ”). CourseNana.COM

2 Specification CourseNana.COM

For very large software systems, a singly-threaded application to crawl the source files may take a long time. The purpose of this assessed exercise is to develop a concurrent include file crawler in C++. CourseNana.COM

On Moodle you are provided with a sequential C++17 include file crawler dependencyDiscoverer.cpp. The main() function may take the following arguments: CourseNana.COM

The crawler uses the following environment variables when it runs: CourseNana.COM

CRAWLER_THREADS – if this is defined, it specifies the number of worker threads that the application must create; if it is not defined, then two (2) worker threads should be created. CourseNana.COM

NOTE: You can set an environment variable in shell with the following command: % export CRAWLER_THREADS=3 CourseNana.COM

-2- CourseNana.COM

SP Assessed Exercise 2
For example, if CPATH is “/home/user/include:/usr/local/group/include” and CourseNana.COM

if “-Ikernel” is specified on the command line, then when processing #include “x.h” CourseNana.COM

x.h will be located by searching for it in the following order: ./x.h CourseNana.COM

             kernel/x.h
             /home/user/include/x.h
             /usr/local/group/include/x.h

3 Design and Implementation CourseNana.COM

The key data structures, data flows, and threads in the concurrent version are shown in the figure below. This is a common leader/worker concurrency pattern. The main thread (leader) places file names to be processed in the work queue. Worker threads select a file name from the work queue, scan the file to discover dependencies, add these dependencies to the result Hash Map and, if new, to the work queue. CourseNana.COM

It should be possible to adjust the number of worker threads that process the accumulated work queue in order to speed up the processing. Since the Work Queue and the Hash Map are shared between threads, you will need to use concurrency control mechanisms to implement thread safe access. CourseNana.COM

-3- CourseNana.COM

SP Assessed Exercise 2 CourseNana.COM

3.1 Howtoproceed CourseNana.COM

You are provided with a working, sequential C++ 17 program called dependencyDiscoverer. Read the extensive comments in dependencyDiscoverer.cpp that explain the design of the application. Use the documentation at en.cppreference.com to check that you understand how the standard C++ containers are used in dependencyDiscoverer.cpp. CourseNana.COM

Build the program with the provided Makefile and you can then test it by running % cd test CourseNana.COM

       % ../dependencyDiscoverer *.y *.l *.c

This should produce an output identical to the provided output file, so that the following command should yield no output when the correct output is produced: CourseNana.COM

% ../dependencyDiscoverer *.y *.l *.c | diff - output CourseNana.COM

NOTE: The university servers might throw an error saying that C++17 is not available. You need to use a more recent version of Clang. To obtain it, run the following in the command shell on one of the stlinux servers (not ssh or sibu): CourseNana.COM

       % source /usr/local/bin/clang9.setup

Start to make the code concurrent by creating new thread-safe Work Queue and Hash Map data structures that encapsulate the existing C++ standard containers. Create a struct that stores the container as a member alongside. CourseNana.COM

Once the single threaded version works correctly it should be straightforward to obtain the number of worker threads that should be created from the CRAWLER_THREADS environment variable and create that many worker threads. A key technical challenge is to design a solution so that the main thread can determine that all the worker threads have finished (without busy waiting) so it can harvest the information in the Hash Map. CourseNana.COM

3.2 SubmissionOptions CourseNana.COM

As with Assessed Exercise 1, you have the option of submitting a less than complete implementation of this exercise. Your options are as follows: CourseNana.COM

You may submit a sequential implementation of the crawler; it must use thread-safe data structures. If you select this option, you are constrained to 50% of the total marks. CourseNana.COM
You may submit an implementation that supports a single worker thread in addition to the main/manager thread. If you select this option, you are constrained to 75% of the total marks. CourseNana.COM
You may submit an implementation that completely conforms to the full specification in Section 2 above. If you select this option, you have access to 100% of the total marks. CourseNana.COM

The marking scheme is appended to this document. CourseNana.COM

SP Assessed Exercise 2 Concurrent Dependency Discoverer

Get in Touch with Our Experts