CourseNana | INFR10076 Computer Architecture and Design - Practical 2: Dynamic Branch Prediction

Dynamic Branch Prediction CourseNana.COM

Computer Architecture and Design - Practical 2 CourseNana.COM

This handout describes the second practical assignment of the Computer Architecture and Design course, which represents 50% of the practical component of the course. It consists of a programming exercise culminating in a brief written report. Assessment of this practical will be based on the correctness of the code and the clarity of the report as explained below. The practical is to be solved individually. Please bear in mind the School of Informatics guidelines on Academic Misconduct. You must submit your solutions before the due date shown above. Follow the instructions provided in Section 3 for submission details. CourseNana.COM

In this practical, you are required to explore Branch Prediction techniques using the Intel Pin simulation tool. You are strongly advised to commence working on the simulator as soon as possible. CourseNana.COM

1 Overview CourseNana.COM

A branch predictor anticipates the outcome of a branch, before it is executed, to improve the instruction flow in the pipeline. In this practical you will investigate the accuracy of various branch predictors. Your assignment is to implement and evaluate three different branch predictors, which are: CourseNana.COM

1. Local branch predictor
2. Gshare global branch predictor
3. Tournament branch predictor that combines 1 and 2 CourseNana.COM

1.1 The Pin Tool CourseNana.COM

Pin (Intel [2021]) is a dynamic binary instrumentation framework that enables the creation of tools for analysing the dynamic behaviour of programs running natively on an Intel x86 machine. In the context of this assignment we use Pin to create tools for simulating branch predictors. Pin operates by instrumenting a program binary to intercept program execution at certain points and make call-backs into the simulation tool. Pin provides an API that allows you (the tool writer) to run high-level C++ functions in response to annotated runtime events, such as loads, stores, and branch instructions. A Pin tool is a set of call-back functions that use the Pin API and are compiled into a dynamic shared library that is linked with Pin when it runs. The tool specifies what events are of interest to it, and specifies what tool function gets called in response to each of those events. In this practical, you are given an example Pin tool that intercepts all conditional branch instructions and provides a call-back function to simulate the behaviour of an AlwaysTaken branch predictor. When run with a benchmark program, each conditional branch instruction encountered during program execution will trigger a call to the branch predictor’s call-back function, allowing it to simulate a branch predictor processing the entire sequence of conditional branch instructions executed by the benchmark. Your task is to extend this tool to simulate the three other branch predictors listed above, and then to use the tool to evaluate those branch predictors on three standard benchmark programs. CourseNana.COM

The directions below apply to DICE machines. For marking purposes, the testing of your code will be performed on a DICE machine. Therefore you are strongly encouraged to develop your code on a DICE machine. CourseNana.COM

The Pin tool and the benchmarks are available in a shared directory to which you have read-only access from a DICE machine. You first step will be to create a working directory called card_p2 in your home directory1 and then install the required Pin files in that directory. This can be done with the following commands on a DICE machine: CourseNana.COM

mkdir -p $HOME/card p2
cd $HOME/card p2 /group/teaching/card/p2/install.sh CourseNana.COM

The installer copies various files, creates a soft link to a README file, and creates a shell script called setup_pin.sh. It then runs that setup script, and tests the installation. You should see output from the install script similar to the output below: CourseNana.COM

[minos]s1234567: /group/teaching/card/p2/install.sh
Installing Pin for CARD P2 in /home/s1234567/card_p2 on Sat 7 Oct 15:52:32 BST 2023
1. copying Pin files...OK
2. Testing the installation

   - setting up the environment...OK
   - compiling the BPExample Pin tool...OK
   - running the BPExample Pin tool on the gromacs benchmark...OK
   Testing completed successfully

Environment has been set for using Pin
Your P2_WORK directory is /home/s1234567/card_p2

The setup_pin.sh script defines a number of environment variables. For example, $P2_WORK is the path to your Pin installation directory. See the README file installed in your $P2_WORK directory for more details on how to run each benchmark. But remember, that each time you start a new working session with your Pin tool you must source the Pin setup script using the following command, from within your Pin working directory: CourseNana.COM

1You can choose a directory name other than card p2 if you wish. You can also install in as many different directories as you wish. The setup pin.sh file created inside each installation directory will setup environment variables for working in that directory alone, so if you create multiple installation directories you must source the setup pin.sh script from the directory in which you want to work. CourseNana.COM

2 CourseNana.COM

source setup pin.sh CourseNana.COM

When Pin is run with the tool created from branch predictor example.cpp it requires a benchmark program binary, along with its arguments, to be given as the last com- mand line argument after a double-dash --, as shown in the README file. The Pin tool runs the benchmark, and while running it extracts information about all condi- tional branches and passes that information (one branch at a time) to the simulated branch predictor. The predictor implemented in branch predictor example.cpp is called AlwaysTakenBranchPredictor and (not surprisingly) it predicts that all condi- tional branches are Taken. Naturally, a good number of them will be Not Taken, so this example predictor is rather inaccurate. CourseNana.COM

The Pin tool outputs a set of statistics for the simulated branch predictor to a file just before it terminates. You can specify the output filename using one of the Pin command-line arguments outlined below, otherwise it will be BP stats.out by default. CourseNana.COM

The branch predictor simulator, defined in the branch predictor example.cpp, is lo- cated in the BPExample directory within your $P2_WORK directory. The simulator accepts three command line arguments: CourseNana.COM

The kind of predictor to be simulated, which can be: Always Taken (code pro- vided), Local, Gshare, Tournament. The default predictor is Always Taken. To run the Always Taken branch predictor, use the command line argument: -BP type always taken. The other option names are local, gshare, and tournament. CourseNana.COM
The number of entries in the Pattern History Table (PHT) of the predictor, which defaults to 1024 entries. To run the simulator with, for example, 4096 PHT entries, use the command line argument: -num BP entries 4096 CourseNana.COM
The name of the output file (default BP stats.out) specified with -o <filename>. For example to generate an output file called MyOutput.out, use the command line argument: -o MyOutput.out CourseNana.COM

In the path $P2 BENCHMARKS there are 3 different benchmark programs (gromacs, gobmk and sjeng), all from the SPEC2000 benchmark suite. These are pre-compiled to run on DICE. You must run each of your branch predictor experiments on all 3 benchmarks; that means evaluating each of the the 3 predictors, with 3 different PHT sizes, on 3 benchmarks – for a total of 27 distinct experiments 2. The setup_pin.sh script defines an environment variable for each benchmark containing the path to that benchmark. These are: $GROMACS PATH, $GOBMK PATH, and $SJENG PATH. CourseNana.COM

The instructions given in the README text file contain full information about how exactly to compile and run your branch prediction simulator, and the command line arguments required for each benchmark. Please follow these instructions carefully. CourseNana.COM

1.2 Branch Predictor Parameters CourseNana.COM

The experiments must be run with the following three different sizes for the Pattern History Tables (PHTs): 128, 1024, and 4096 entries. Note that you must assess all three PHT sizes for each of the different branch predictors. CourseNana.COM

2You may find it helpful to write a bash script to iterate through the experiments. 3 CourseNana.COM

Each PHT entry is a 2-bit saturating counter as explained in the lecture slides. Re- member that you are writing a simulator in a high-level language; you are not designing actual circuits. As such, you may use any data types for the counters (and other program variables). Regardless of the data type you use, it must behave like a saturating 2-bit counter. CourseNana.COM

Local Branch predictor CourseNana.COM

The Local Branch predictor uses per-branch history to make a prediction, i.e., the history of each branch is considered independently. The Predictor consists of two structures: CourseNana.COM

an array of 128 distinct Local History Registers (LHRs), that store the branch history of different branches CourseNana.COM
the PHT CourseNana.COM

The 7 least-significant bits (LSBs) of the Program Counter (PC) of the branch instruction are used to index into the array of LHRs and select the relevant LHR to be used for the prediction. Each LHR is composed of as many bits as necessary to index the PHT. For example, if the PHT contains 1024 entries, then each LHR should be 10 bits long. Each n-bit LHR contains the last n outcomes of all branches that map to that LHR (i.e. all branch PCs whose 7 LSBs are identical), with 0 denoting not taken and 1 denoting taken. All LHRs should be initialized to 0. The number of LHRs must be 128 and must stay constant across all experiments (i.e. 128 LHRs is a fixed parameter, independent of PHT size). CourseNana.COM

The selected LHR is used to index into the PHT, which yields the prediction. All PHT entries must be initialized to “11” (i.e., 3). CourseNana.COM
To train the predictor after each prediction, (a) the relevant LHR must be updated such that it reflects the new history, and (b) the PHT entry used to make the prediction must be strengthened in case of a correct prediction and weakened in case of misprediction. CourseNana.COM
- – To strengthen a saturating counter means to increment it if its prediction bit is ‘1’, and decrement it if the prediction bit is ‘0’. CourseNana.COM
- – To weaken a saturating counter means to decrement it if its prediction bit is ‘1’, and increment it if the prediction bit is ‘0’. CourseNana.COM
Figure 1 shows a pictorial view of the Local Branch Predictor, with 128 distinct LHRs, each of which consists of 12 bits of branch history, that is used to index into a 4096-entry PHT of 2-bit saturating counters. You are recommended to read Section 4 of McFarling’s paper on combined branch predictors (McFarling [1993]) for more information. CourseNana.COM

Figure 1: A Local Branch predictor with 128 Local History Registers and a PHT with 4096 2-bit saturating counters CourseNana.COM

Gshare Branch Predictor CourseNana.COM

The Gshare Branch predictor combines a Global History Register (GHR) and the LSBs of the branch’s PC to index into the PHT. The PC and the GHR are XORed in order to create the index to the PHT. CourseNana.COM
The Global History Register is composed of as many bits as necessary to index the PHT. For example, if the PHT contains 1024 entries, then the GHR should be 10 bits long. These 10 bits contain the outcomes of the last 10 branches, with 0 denoting not taken and 1 denoting taken. The GHR should be initialized to 0. All PHT entries must be initialized to “11” (i.e., 3). CourseNana.COM
To train the predictor after each prediction, the GHR must be updated such that it reflects the new global history and the used PHT entry must be strengthened in case of a correct prediction and weakened in case of misprediction. CourseNana.COM
Figure 2 shows a pictorial view of the Gshare Branch Predictor with a GHR of 10 bits that gets XORed with the 10 least-significant bits of the PC of the branch in order to index into the PHT that contains 1024 entries of 2-bit saturating counters. You are referred to Section 7 of McFarling [1993] for more information. For the purpose of this practical, the number of LSBs used from the branch PC should be equal to the number of bits of the GHR. CourseNana.COM

PC
10 LSBs CourseNana.COM

XOR CourseNana.COM

PHT 0 CourseNana.COM

1023 CourseNana.COM

Predict Not Taken CourseNana.COM

01 CourseNana.COM

0 CourseNana.COM

10 CourseNana.COM

Predict Taken CourseNana.COM

10-bit Global History Global History Register CourseNana.COM

Figure 2: Gshare Branch predictor with a 10-bit GHR and a 1024-entry PHT CourseNana.COM

Tournament Branch Predictor CourseNana.COM

The Tournament predictor is composed of three different predictors: the Local predictor, the Gshare predictor, and a meta-predictor that selects between the Local CourseNana.COM

5 CourseNana.COM

or the Gshare predictor to provide the actual prediction of the branch outcome. CourseNana.COM

The PHT of the meta-predictor is indexed with the LSBs of the Program Counter (PC) of the branch instruction. CourseNana.COM
In the meta-predictor, when the prediction bit (i.e. the MSB) of the saturating counter is 0, the Local predictor is selected to supply the prediction. When the MSB is 1, the Gshare predictor is selected. All PHT entries must be initialized to “11” (i.e., 3). CourseNana.COM
The Tournament predictor will follow a total update policy. Both the Local and the Gshare predictors will be trained after each branch is resolved. After a correct prediction the relevant meta-predictor entry must be strengthened. After a mis- prediction, the relevant meta-predictor entry (the counter) must be weakened, but only if the predictor that was not chosen was correct. If both the Local and the Gshare predictors were incorrect, there is no update to the meta-predictor. CourseNana.COM
Since the underlying behavior of the Local and Gshare predictors is unmodified in the Tournament scheme, you may choose to use your existing implementations of Local and Gshare predictors to implement the Tournament predictor. CourseNana.COM
Figure 3 shows a pictorial view of the Tournament Branch Predictor with a PHT that contains 1024 entries of 2-bit saturating counters. The 2-bit saturating counter indicates which predictor should be used (Local or Gshare) for the branch given its PC. CourseNana.COM
As before, the experiments must be run with PHTs of 3 different sizes (128, 1024 and 4096 entries). In each experiment the PHTs of the Local Predictor and the Gshare Predictor must all have the same size. I.e. when the meta-predictor PHT is 1024 entries, then the Gshare PHT is also 1024 entries and the Local Predictor PHT is also 1024 entries. CourseNana.COM

CourseNana.COM

Figure 3: Tournament Branch predictor with a a 1024-entry PHT CourseNana.COM

1.3 Simulator infrastructure CourseNana.COM

The three requested predictors must all be implemented inside the branch predictor example.cpp file. Inside this file, you must create a class with an appropriate name for each predictor. The file branch predictor example.cpp includes comments and CourseNana.COM

guidelines on how you must create your classes and the appropriate member functions. To help you get started, an implementation of the Always Taken predictor is supplied. You are advised to start from the code provided for the Always Taken predictor, and modify it to implement the functionality required for each specific predictor type.
CourseNana.COM

For the purpose of this practical you can ignore the Branch Target Buffer (BTB) (i.e. the implementation of a BTB is not needed). Hence, your simulated predictors need only generate a prediction of the outcome of the branch condition (Taken or Not-taken), without specifying the target address of the branch. Additionally the predictors do not need to identify whether an instruction is a conditional branch; they can assume that every instruction is a conditional branch – this is because the code you are working with passes only conditional branches into the prediction functions. CourseNana.COM

The contents of the file branch predictor example.cpp should be modified only as the above guidelines specify, and not in any other way. CourseNana.COM

2 Marking Scheme
Correctness (80 marks). Includes code functionality and quality for the following 3 CourseNana.COM

predictors: CourseNana.COM

1. Local Branch Predictor (30 marks)
2. Gshare Global Branch Predictor (30 marks) 3. Tournament Branch Predictor (20 marks) CourseNana.COM

Report (20 marks). You should write a short report (max 2 pages) on how you have implemented each predictor and where the predictor code is (source file, line numbers). In the report you should answer the following questions. Please refrain from using more than 3 to 4 lines for each question. CourseNana.COM

Why and how, varying the size of the PHT impacts (or not) the prediction accuracy for each of the predictors? CourseNana.COM
What are the advantages and disadvantages of the Local predictor? CourseNana.COM
What are the advantages and disadvantages of the Gshare predictor? CourseNana.COM
What benefits did you expect from the addition of the Tournament predictor? Did the results match your expectations? CourseNana.COM

Finally your report must contain 3 graphs (one for each benchmark) summarizing your results by showing the prediction accuracy of all three predictors for the various PHTs. The x axis should denote the PHT size and the y axis the prediction accuracy as a percentage. In each of the graphs the performance of all 3 predictors must be plotted. A sample chart is shown in figure 4. CourseNana.COM

7 CourseNana.COM

Figure 4: A sample of the chart that must be submitted in the Report. One such chart must exist in the report for each benchmark. CourseNana.COM

3 Format of your submission CourseNana.COM

Your submission should clearly indicate (in both the report and the code) which branch prediction techniques you have simulated completely and which you have only partially completed (if any). Please put comments in the code that you add to make it easy for the markers to understand. Remember that your simulator will be compiled and executed on a DICE machine, so you must ensure it works on DICE.
You are required to submit: CourseNana.COM

1. a .cpp source file containing the implementation of your Branch Predictors, and CourseNana.COM

2. a .pdf file containing your report. CourseNana.COM

Note that both files should be named using your student id. For example, a student with student-id s1234567 would submit s1234567.cpp and s1234567.pdf. CourseNana.COM

Submission is via the LEARN page for CARD. Navigate to the Gradebook tab, where you will see an item called Coursework 2 through which you can submit your report and code for this assignment. You may submit multiple times if you need to revise your submission, up to the deadline, after which your last submission will be the one that is marked. CourseNana.COM

4 Similarity Checking and Academic Misconduct CourseNana.COM

You must submit your own work. Any code that is not written by you must be clearly identified and explained through comments at the top of your files. Failure to do so is plagiarism. Note that you are required to take reasonable measures to protect your assessed work from unauthorized access. Detailed guidelines on what constitutes plagiarism and other issues of academic misconduct can be found at: CourseNana.COM

http://web.inf.ed.ac.uk/infweb/admin/policies/academic-misconduct CourseNana.COM

All submitted code is checked for similarity with other submissions using software tools such as MOSS. These tools have been very effective in the past at finding similarities and are not fooled by name changes and reordering of code blocks. CourseNana.COM

8 CourseNana.COM

5 Reporting Problems CourseNana.COM

Please post questions to Piazza if you have any issues regarding the practical and cannot find the answer in the README file. CourseNana.COM

References CourseNana.COM

Intel. Pin – A Dynamic Binary Instrumentation Tool, 2021. URL CourseNana.COM

https://software.intel.com/content/www/us/en/develop/articles/ pin-a-dynamic-binary-instrumentation-tool.html. CourseNana.COM

Scott McFarling. Combining Branch Predictors. Technical Report TN-36, Digital Equipment Corporation, June 1993. URL https://www.hpl.hp.com/techreports/ Compaq-DEC/WRL-TN-36.pdf. CourseNana.COM

INFR10076 Computer Architecture and Design - Practical 2: Dynamic Branch Prediction

Get in Touch with Our Experts