1. Homepage
  2. Programming
  3. EEE3530-01 COMPUTER ARCHITECTURE Spring 2023 Assignment 4: Branch Prediction

EEE3530-01 COMPUTER ARCHITECTURE Spring 2023 Assignment 4: Branch Prediction

Engage in a Conversation
South KoreaYonsei UniversityEEE3530-01COMPUTER ARCHITECTUREBranch PredictionC++Systems Programming

EEE3530-01 Spring 2023 Assignment 4: Branch Prediction Due: Wednesday, May 31, 2023, 11:59 PM 1 Introduction • The objective of this assignment is to implement a branch predictor in a processor pipeline. This assignment requires working on the Kite C++ code. • To begin the assignment, go to the kite/ directory that you worked for Assignments 1 and 2. • Download branch.sh , and execute it inside the kite/ directory. The script will update Kite to v1.11 and overwrite Makefile ,br_predictor. ,proc.cc , and program_code files for this assignment. •Makefile uses-DBR_PRED and-DDATA_FWD in this assignment, which will enable branch prediction and data forwarding in the processor pipeline by default. $ cd kite/ $ wget https://icsl.yonsei.ac.kr/wp-content/uploads/branch.sh $ chmod +x branch.sh $ ./branch.sh $ make g++ -g -Wall -O3 -DBR_PRED -DDATA_FWD -o alu.o -c alu.cc g++ -g -Wall -O3 -DBR_PRED -DDATA_FWD -o br_predictor.o -c br_predictor.cc ... • The following describes how Kite handles branch instructions in the processor pipeline. You do not have to touch anything in proc.cc , but the explanations will help you understand how branch instructions work. •Fetch stage: Instruction fetch and branch prediction –Open the proc.cc file, and find the void proc_t::fetch() function near the end of the file. –In the fetch stage, inst = inst_memory->read(pc) fetches an instruction from the instruction memory based on the program counter (PC). –If the fetched instruction is SB type (i.e., get_op_type(inst->op) == op_sb_type ), it makes branch prediction based on the instruction’s PC; inst->pred_taken = br_predictor->is_taken(inst) . The instruction PC can be retrieved by reading inst->pc in the function. inst->pred_taken stores if the branch is predicted to be taken (1) or not (0). –If the branch is predicted to be taken, the PC is set to a predicted branch target by consulting the branch target buffer (BTB); pc = br_target_buffer->get_target(inst->pc) . Otherwise, the PC remains PC+4. inst->pred_target stores the predicted branch target address. –If Kite does not use branch prediction (which is not the case in this assignment), PC is set to zero, which will halt instruction fetch. •Execute stage: Resolving branch condition and branch target –Invoid proc_t::execute() , ALU runs an instruction; alu->run(inst) . –Detailed ALU operations are in the void alu_t::run(inst_t m_inst) function of alu.cc . –Near the bottom of the long switch-case statement, there are cases for four branch instructions; op_beq , op_bge ,op_blt , and op_bne . –For instance, a beq instruction checks if its rs1 andrs2 values are equal; m_inst->rs1_val == m_inst->rs2_val . If so, the PC-relative address, m_inst->pc + (m_inst->imm << 1) , is the branch target. Otherwise, the branch target is m_inst->pc + 4 . 1 –m_inst->branch_taken andm_inst->branch_target are the actual branch state and target, which can be different from the predicted branch state and target (i.e., m_inst->pred_taken andm_inst-> pred_target ) obtained by branch prediction in the fetch stage if the prediction is incorrect. •Writeback stage: Updating the branch predictor and branch target buffer –Invoid proc_t::writeback() , a branch instruction updates the branch predictor and BTB based on the actual branch information resolved in the execute stage. –It calls br_predictor->update(inst) to update the branch predictor. If the branch is taken (i.e., inst->branch_taken == 1 ), a two-bit saturating counter is incremented. Otherwise, the counter is decremented. –If the branch is taken, it also updates the BTB by calling br_target_buffer->update(inst->pc, inst->branch_target) . It records the branch target address corresponding to the branch instruction’s PC in the BTB, which serves as a lookup table (or cache). –If the predicted branch target is different from the actual branch address (i.e., inst->pred_target != inst->branch_target ), the processor pipeline is flushed to get rid of wrong-path instructions, and the PC is set to the correct branch target; pc = inst->branch_target . –Flushing the processor pipeline removes all instructions staged in the pipeline registers (i.e., IF/ID, ID/EX, EX/MEM, and MEM/WB). It also flushes the ALU, and the dependency map of the register file is reset. • As described, the branch predictor and BTB are accessed twice in the processor pipeline for every branch instruction, once in the fetch stage and again in the writeback stage. • In the fetch stage, the processor executes inst->pred_taken = br_predictor->is_taken(inst) to pre- dict if a branch is taken. • If the branch is predicted to be taken, inst->pred_target = br_target_buffer->get_target(inst->pc) gets a predicted branch target. • The writeback stage calls br_predictor->update(inst) to update the branch predictor based on the actual branch outcome. • If the branch is taken, br_target_buffer->update(inst->pc, inst->branch_target) updates the branch target buffer. • Your work in this assignment is to fill in the four functions above, i.e., is_taken() andupdate() functions of the branch predictor and get_target() andupdate() functions of the BTB. 2 Implementation • The lecture slides showed the simplest branch predictor called a last-outcome predictor . A PC value is mapped to one of the entries in the counter array of the branch predictor, each containing a two-bit saturating counter. Each entry traces the last (or two) behavior(s) of a branch instruction. • Although such a branch predictor can handle biased branch patterns (e.g., loops), it does not work well with alternating branch behaviors such as T, NT, T, NT, T, NT, · · ·. • Suppose all two-bit saturating counters are initialized to ones (i.e., weakly not-taken). For the alternating T-NT pattern of T, NT, T, NT, T · · ·, the last-outcome predictor achieves 0% branch prediction accuracy since the counter toggles between 1 and 2 and misses every branch prediction. • Thus, the question addressed in this assignment is “Can we design a branch predictor that can also handle such branch patterns?” To achieve this, the branch predictor needs to observe longer branch history, not just the last one or two branch outcomes. • The following figure shows the structure of a two-level branch predictor comprised of a branch history register (BHR) and pattern history table (PHT). 2 Branch history register(BHR)Oldest branchMost recent branchhbitsPattern history table(PHT)320012212h-22h-11323012-bit saturating counter···Predicted to be takenhbits11111111101 (PC >> 2)XOR• The BHR consists of hbits, and the PHT has 2hentries of two-bit saturating counters. The example is shown forh= 10 bits. • The least significant bit (LSB) of the BHR shows the most recently committed branch outcome, and the most significant bit is the oldest branch result. • The BHR is XOR’d with pc >> 2 for indexing the PHT. For instance, the BHR value of [ 11 1111 1101 ] XOR’d with the PC of [ · · ·0000 0000 1100 ] results in the 10-bit index of [ 11 1111 1110 ], which points to the entry #1022 of the PHT. • The corresponding PHT entry in the example has a counter value of two, so the branch is predicted to be taken. • When the branch instruction reaches the writeback stage, it updates the branch predictor based on the actual branch result. The BHR is shifted to the left, and the branch state (i.e., 1 for taken or 0 for not-taken) is written to the LSB. • For instance, if the branch is really taken, the BHR becomes [ 11 1111 1011 ]. Then, the next branch instruction will reference the index #1019 of the PHT. • Notably, all branch instructions globally share the same BHR and PHT. This specific form of the two-level branch predictor is called gshare predictor . • However, globally sharing the BHR can cause a branch instruction to accidentally update a different PHT counter than what it referenced in the fetch stage if the PHT is indexed again using the BHR in the writeback stage. • After a branch instruction reads the BHR for indexing the PHT in the fetch stage, the BHR can be altered by other preceding in-flight branch instructions. Thus, when this instruction reaches the writeback stage, it may see a different BHR value, which can point to a different PHT entry. • To update the same PHT counter that the branch instruction referenced in the fetch stage, the instruction needs to capture the BHR or PHT information at the moment of branch prediction and carry it along the processor pipeline so that it can find the same PHT counter in the writeback stage. • You should add a variable to class inst_t ininst.h to store the necessary information during branch prediction. When the instruction updates the branch predictor, it uses this carry-on information to identify the right PHT counter instead of indexing the PHT with the BHR. •program_code contains a RISC-V code for testing. The code is given, so you will not work on assembly programming in this assignment. • The following shows a part of program_code with five branch instructions in doubly nested loops. 3 CourseNana.COM

Outer loop

addi x18, x10, 0 L1: bge x0, x18, L7 # Branch andi x19, x18, 1 addi x18, x18, -1 bne x19, x0, L2 # Branch jal x0, L1 CourseNana.COM

Inner loop

L2: addi x20, x0, 1 addi x21, x11, 0 L3: blt x21, x20, L6 # Branch andi x19, x20, 3 bne x19, x0, L4 # Branch addi x20, x20, 1 jal x0, L3 L4: addi x19, x19, -1 addi x20, x20, 1 bne x19, x0, L5 # Branch addi x20, x20, 4 L5: jal x0, L3 CourseNana.COM

End of the inner loop

L6: jal x0, L1 CourseNana.COM

End of the outer loop

• The first branch instruction (i.e., bge x0, x18, L7 ) is mostly not taken. The second branch (i.e., bne x19, x0, L2 ) is taken with 50% frequency, and the third branch (i.e., blt x21, x20, L6 ) is mostly not taken. • The fourth branch (i.e., bne x19, x0, L4 ) is taken with 75% frequency, and the fifth branch (i.e., bne x19, x0, L5 ) is taken with 67% frequency. • The default branch predictor in the skeleton code is an always-not-taken predictor, which returns 0 for every br_predictor->is_taken(inst) call. This predictor has 54.6% branch prediction accuracy as follows. $ ./kite program_code ... Start running ... Done. ======== [Kite Pipeline Stats] ========= Total number of clock cycles = 759838 Total number of stalled cycles = 0 Total number of executed instructions = 488631 Cycles per instruction = 1.555 Number of pipeline flushes = 90401 Number of branch mispredictions = 90401 Number of branch target mispredictions = 0 Branch prediction accuracy = 0.546 (108600/199001) ... • If the branch predictor is implemented to work as a last-outcome predictor with 1024 entries in the PHT indexed only by instruction PCs (i.e., pc >> 2 ) such as the lecture slides, it achieves 81.7% branch prediction accuracy. You do not have to implement the last-outcome predictor in this assignment, but it will be an interesting test. • In contrast, a gshare predictor with h= 10 bits achieves 99.9% branch prediction accuracy. Your assignment is to implement the gshare predictor with 99.9% prediction accuracy. • The following shows a simple BTB designed as a direct-mapped cache. 4 Branch target buffer(BTB)0128001210402t-22t-1523···tbits(PC >> 2)• It is indexed by the lowest tbits of pc >> 2 . The BTB has 16 entries for t= 4 bits in this assignment. Each entry of the BTB stores the target address of a taken branch. • For instance, if a branch instruction has the PC of [ ···0000 1011 1100 ], the lowest four bits of pc >> 2 points to the index #15 of the BTB. The predicted branch target address in the example is 104. • The following shows the definition of class br_predictor_t inbr_predictor.h , representing a branch predictor. // Branch predictor class br_predictor_t { public: br_predictor_t(unsigned m_hist_len); ˜br_predictor_t(); bool is_taken(inst_t m_inst); // Is a branch predicted to be taken? void update(inst_t m_inst); // Update a prediction counter. private: uint16_t bhr; // Branch history register (BHR) uint8_t pht; // Pattern history table (PHT) unsigned h; // h-bit branch history }; •br_predictor_t(unsigned m_hist_len) and∼br_predictor_t() are the constructor and destructor functions of the C++ class. These functions are automatically called when the class is created or deleted. •is_taken() is called in the fetch stage to make branch prediction based on an instruction’s PC (i.e., m_inst->pc ). • The writeback stage calls update() to update the branch predictor based on the actual branch state (i.e., m_inst ->branch_taken ). • The BHR is defined as a uint16_t variable, but you only need the lowest ten bits of the variable for h= 10 in this assignment. • The PHT is defined as a uint8_t array, where each entry represents a branch prediction counter. Since a two-bit saturating counter has a value between 0 and 3, it only uses the lowest two bits of uint8_t . • The following shows a branch predictor implementation in br_predictor.cc . Your assignment will be filling in the empty functions of this file. // Branch predictor br_predictor_t::br_predictor_t(unsigned m_hist_len) : 5 bhr(0), pht(0), h(m_hist_len) { // Create a pattern history table (PHT). pht = new uint8_t[1 << h]; } br_predictor_t::˜br_predictor_t() { // Deallocate the PHT. delete [] pht; } // Is a branch predicted to be taken? bool br_predictor_t::is_taken(inst_t m_inst) { // Predict always not taken. return false; } // Update a prediction counter. void br_predictor_t::update(inst_t m_inst) { bhr = 0; } • The first function is a C++ constructor of the class. This function is automatically called when a branch predictor is created. pht = new uint8_t[1 << h] creates a 2h-entry array, which is the same as pht = (uint8_t ) malloc(1 << h) in C language. The only thing you have to add in this function is to initialize the phtarray. • The C++ destructor of the br_predictor_t class is called when the branch predictor is destroyed at the end of a Kite simulation. delete [] pht is the same as free(pht) in C language. The function is complete, so you do not work on this function. •is_taken() makes a branch prediction based on an instruction’s PC and the BHR. The lowest ten bits of bhr is XOR’d with that of m_inst->pc >> 2 . The result is used for indexing the pht array. If the entry contains a counter value of two or greater, the branch is predicted to be taken, and the function returns 1. Otherwise, the branch is predicted to be not taken, and the function returns 0. Note that you need to store either the bhr value or pht index in m_inst to update the right counter later in the update() function. The skeleton code simply returns false , which makes it work as the always-not-taken predictor. You have to update the code to implement the gshare predictor with h= 10 bits. • The update() function updates bhr andpht. The BHR is shifted left by one bit, and the branch state of the instruction (i.e., m_inst->branch_taken ) is recorded in the lowest bit of the BHR. It also updates a PHT counter. However, the PHT cannot be indexed in the same way as what is_taken() does because the BHR value could have been changed since the branch instruction referenced it. This is where you need the bhrvalue orpht index stored in m_inst during the is_taken() call. The skeleton code meaninglessly writes 0 to bhr to avoid a compiler warning, which complains that the variable is never used. You have to update the code to implement a properly working gshare predictor. • The following shows the definition of class br_target_buffer_t inbr_predictor.h , representing a branch target buffer. // Branch target buffer class br_target_buffer_t { public: br_target_buffer_t(uint64_t m_size); ˜br_target_buffer_t(); uint64_t get_target(uint64_t m_pc); // Get a branch target address. void update(uint64_t m_pc, uint64_t m_target_addr); // Update the BTB. 6 private: uint64_t num_entries; // Number of BTB entries uint64_t buffer; // Target address array }; •br_target_buffer_t(uint64_t m_size) and∼br_target_buffer_t() are the C++ constructor and destructor of the class. • The get_target() function is called in the fetch stage to make a prediction on a branch target based on an instruction’s PC in the case the branch is predicted to be taken. • The writeback stage calls the update() function to update the BTB with the actual branch target address. •num_entries is the number of entries in the BTB, which is 16 in this assignment. The BTB is defined as a uint64_t array, where each entry stores the 64-bit address of a branch target. • The following shows the BTB implementation in br_predictor.cc . // Branch target buffer br_target_buffer_t::br_target_buffer_t(uint64_t m_size) : num_entries(m_size), buffer(0) { // Create a direct-mapped branch target buffer (BTB). buffer = new uint64_t[num_entries]; } br_target_buffer_t::˜br_target_buffer_t() { // Deallocate the target address array. delete [] buffer; } // Get a branch target address. uint64_t br_target_buffer_t::get_target(uint64_t m_pc) { // Always return PC = 0. return 0; } // Update the branch target buffer. void br_target_buffer_t::update(uint64_t m_pc, uint64_t m_target_addr) { } • The first function is a C++ constructor of the class. buffer = new uint64_t[num_entries] creates a uint64_t array, which is the same as buffer = (uint64_t )malloc(num_entries *sizeof(uint64_t)) in C language. The only thing you have to add in this function is to initialize buffer entries. • The destructor function deallocates buffer . The function is complete, so you do not work on this function. •get_target() predicts a branch target address. This process looks up the buffer array based on an instruc- tion’s PC (i.e., m_pc >> 2 ). It returns a branch target address stored in the identified BTB entry. Since the branch predictor of the skeleton code is the always-not-taken predictor, the BTB is not used and returns 0 for the predicted branch target. You have to update the function to implement a properly working BTB. • The update() function updates the BTB based on a branch instruction’s PC (i.e., m_pc ) with the actual branch target (i.e., m_target_addr ). Unlike the branch predictor, BTB indexing is not dependent on other branch instructions. It can be indexed in the same way as the get_target() function. 7 3 Submission • After implementing the branch predictor and BTB, run the program code in Kite. If correctly implemented, the gshare predictor should give 99.9% branch prediction accuracy for the program code. $ ./kite program_code ... Start running ... Done. ======== [Kite Pipeline Stats] ========= Total number of clock cycles = ... Total number of stalled cycles = 0 Total number of executed instructions = 488631 Cycles per instruction = ... Number of pipeline flushes = ... Number of branch mispredictions = ... Number of branch target mispredictions = ... Branch prediction accuracy = 0.999 (.../199001) ... • When the assignment is done, execute the tar.sh script in the kite/ directory. It will create a tar file named after your student ID, e.g., 2023143530.tar . Upload the tar file on LearnUs. Do not rename the file. 4 Grading Rules • The following is a general guideline for grading. A 30-point scale will be used for this assignment. The minimum score is zero, and negative scores will not be given. Grading rules are subject to change, and a grader may add a few extra rules for the fair evaluation of students’ efforts if necessary. -5 points: The submitted tar file is renamed to include redundant tags such as a student name, hw4, etc. -5 points: The submitted code does not have sufficient comments. You must make an effort to clearly explain what each part of your code intends to do. -10 points each: Every 5% branch prediction deficit loses 10 points. For instance, if your branch predictor achieves a prediction accuracy of 95% or greater, you get full credit. Between 90% and 95%, 10 points will be taken out, and a branch prediction accuracy between 85% and 90% will lose another 10 points. -30 points: No or late submission. Final grade = F: A submitted tar file is not from the student’s own implementation but copied from someone else. All students involved in the incidents will be given “F” for final grades. Final grade = F: A code is intentionally tweaked and faked to deceive a grader as if some effort was made. Such an attempt is considered unethical and will thus be seriously penalized. • Your teaching assistant (TA) will grade your assignments. If you think your assignment score is incorrect, discuss your concerns with the TA. Always be courteous when contacting the TA. If no agreement is made between you and the TA, elevate the case to the instructor to review your assignment. Refer to the course website for the contact information of the TA and instructor • Arguing for partial credits with no valid reasons will be treated as a cheating attempt, and such a student will lose all assignment scores. 8 CourseNana.COM

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
South Korea代写,Yonsei University代写,EEE3530-01代写,COMPUTER ARCHITECTURE代写,Branch Prediction代写,C++代写,Systems Programming代写,South Korea代编,Yonsei University代编,EEE3530-01代编,COMPUTER ARCHITECTURE代编,Branch Prediction代编,C++代编,Systems Programming代编,South Korea代考,Yonsei University代考,EEE3530-01代考,COMPUTER ARCHITECTURE代考,Branch Prediction代考,C++代考,Systems Programming代考,South Koreahelp,Yonsei Universityhelp,EEE3530-01help,COMPUTER ARCHITECTUREhelp,Branch Predictionhelp,C++help,Systems Programminghelp,South Korea作业代写,Yonsei University作业代写,EEE3530-01作业代写,COMPUTER ARCHITECTURE作业代写,Branch Prediction作业代写,C++作业代写,Systems Programming作业代写,South Korea编程代写,Yonsei University编程代写,EEE3530-01编程代写,COMPUTER ARCHITECTURE编程代写,Branch Prediction编程代写,C++编程代写,Systems Programming编程代写,South Koreaprogramming help,Yonsei Universityprogramming help,EEE3530-01programming help,COMPUTER ARCHITECTUREprogramming help,Branch Predictionprogramming help,C++programming help,Systems Programmingprogramming help,South Koreaassignment help,Yonsei Universityassignment help,EEE3530-01assignment help,COMPUTER ARCHITECTUREassignment help,Branch Predictionassignment help,C++assignment help,Systems Programmingassignment help,South Koreasolution,Yonsei Universitysolution,EEE3530-01solution,COMPUTER ARCHITECTUREsolution,Branch Predictionsolution,C++solution,Systems Programmingsolution,