SP Assessed Exercise 2
Concurrent Dependency Discoverer
1 Requirement
Large-scale systems developed in C and C++ tend to include a large number of .h files, both of a system variety (enclosed in < >) and non-system (enclosed in “ ”). The make utility and Makefiles are a convenient way to record dependencies between source files, and to minimize the amount of work that is done when the system needs to be rebuilt. Of course, the work will only be minimized if the Makefile exactly captures the dependencies between source and object files.
Some systems are extremely large, and it is difficult to keep the dependencies in the Makefile correct as many people make changes at the same time. Therefore, there is a need for a program that can crawl over source files, noting any #include directives, and recurse through files specified in #include directives, and finally generate the correct dependency specifications.
#include directives for system files (enclosed in < >) are normally NOT specified in dependencies. Therefore, our system will focus on generating dependencies between source files and non-system #include directives (enclosed in “ ”).
2 Specification
For very large software systems, a singly-threaded application to crawl the source files may take a long time. The purpose of this assessed exercise is to develop a concurrent include file crawler in C++.
On Moodle you are provided with a sequential C++17 include file crawler dependencyDiscoverer.cpp. The main() function may take the following arguments:
The crawler uses the following environment variables when it runs:
CRAWLER_THREADS – if this is defined, it specifies the number of worker threads that the application must create; if it is not defined, then two (2) worker threads should be created.
NOTE: You can set an environment variable in shell with the following command: % export CRAWLER_THREADS=3
-2-
SP Assessed Exercise 2
For example, if CPATH is “/home/user/include:/usr/local/group/include” and
if “-Ikernel” is specified on the command line, then when processing #include “x.h”
x.h will be located by searching for it in the following order: ./x.h
kernel/x.h
/home/user/include/x.h
/usr/local/group/include/x.h
3 Design and Implementation
The key data structures, data flows, and threads in the concurrent version are shown in the figure below. This is a common leader/worker concurrency pattern. The main thread (leader) places file names to be processed in the work queue. Worker threads select a file name from the work queue, scan the file to discover dependencies, add these dependencies to the result Hash Map and, if new, to the work queue.
It should be possible to adjust the number of worker threads that process the accumulated work queue in order to speed up the processing. Since the Work Queue and the Hash Map are shared between threads, you will need to use concurrency control mechanisms to implement thread safe access.
-3-
SP Assessed Exercise 2
3.1 Howtoproceed
You are provided with a working, sequential C++ 17 program called dependencyDiscoverer. Read the extensive comments in dependencyDiscoverer.cpp that explain the design of the application. Use the documentation at en.cppreference.com to check that you understand how the standard C++ containers are used in dependencyDiscoverer.cpp.
Build the program with the provided Makefile and you can then test it by running % cd test
% ../dependencyDiscoverer *.y *.l *.c
This should produce an output identical to the provided output file, so that the following command should yield no output when the correct output is produced:
% ../dependencyDiscoverer *.y *.l *.c | diff - output
NOTE: The university servers might throw an error saying that C++17 is not available. You need to use a more recent version of Clang. To obtain it, run the following in the command shell on one of the stlinux servers (not ssh or sibu):
% source /usr/local/bin/clang9.setup
Start to make the code concurrent by creating new thread-safe Work Queue and Hash Map data structures that encapsulate the existing C++ standard containers. Create a struct that stores the container as a member alongside.
Once the single threaded version works correctly it should be straightforward to obtain the number of worker threads that should be created from the CRAWLER_THREADS environment variable and create that many worker threads. A key technical challenge is to design a solution so that the main thread can determine that all the worker threads have finished (without busy waiting) so it can harvest the information in the Hash Map.
3.2 SubmissionOptions
As with Assessed Exercise 1, you have the option of submitting a less than complete implementation of this exercise. Your options are as follows:
-
You may submit a sequential implementation of the crawler; it must use thread-safe data structures. If you select this option, you are constrained to 50% of the total marks.
-
You may submit an implementation that supports a single worker thread in addition to the main/manager thread. If you select this option, you are constrained to 75% of the total marks.
-
You may submit an implementation that completely conforms to the full specification in Section 2 above. If you select this option, you have access to 100% of the total marks.
The marking scheme is appended to this document.