CS 433 / CSE 422: Computer System Organization HW 06 [5 problems, 100 points]
Covers Classes 22–25 Due: December 6, 2023
Instructions
Homework sets are due at 12:00 PM on the due date. Upload your answers, as described below, to Gradescope by then.
To hand in your homework, create a PDF either by scanning your paper work, or by generating it in a software tool to start with (e.g., a word processor, LaTeX). Ensure your PDF is readable, your work legible and rotated correctly. Upload your work to Gradescope and follow the Gradescope instructions carefully to ensure your work is graded properly. Points will be deducted if you do not follow directions, for instance, by not assigning problems to pages.
This homework has coding questions that must be uploaded to a separate assignment on Gradescope. Please upload any source code files that you have edited to Gradescope. Be careful to edit only the files mentioned in the problem, and make sure to comment your code so we can understand your thought process.
Remember: all files need to be generated and uploaded before the deadline, so leave ample time for submission!
Discussions about homework with a single partner are encouraged — think of this as giving hints, not solutions, to each other. However, homework must be written up individually (no copying is allowed), and all code must be written by you alone. If you discussed your homework solutions with someone else, either as the giver or receiver of information, your write-up must explicitly identify the partner.
You must show details of your work. There is no credit for just writing down an answer.
Homework Problems
1. [20 points] Let’s look at the MSI and MESI cache coherence protocols. For each part, list a series of state transitions for a single cache block, for a single cache.Assume your cache block starts in the invalid state. List and label the transitions under MSI, and under MESI.
As an example, here’s a transition that is equivalent in both MSI and MESI:
MSI: I → M → I MESI: I → M → I
Page 1 of 5
Thread A writes to the cache block, and then the block is evicted.
Thread B does not access the cache block.
-
[8 points] List a series of state transitions that will generate fewer coherence messages on the bus for MESI than for MSI (i.e., MESI is more efficient).
Explain both (1) how two threads could make this sequence happen, and (2) why one has fewer messages than the other.
-
[8 points] List a series of state transitions that will generate fewer coherence messages on the bus for MSI than for MESI (i.e., MSI is more efficient).
Explain both (1) how two threads could make this sequence happen, and (2) why one has fewer messages than the other.
-
[4 points] What is the minimum amount of metadata that you need to implement MESI? Can you reuse any cache metadata bits, and if so, how?
2. [24 points] Time to think about locks and atomics.
-
[4 points] Look at the following piece of code:
volatile int iters = 6; // shared memory
void myThread(int *iters) { --iters;
}
If we run four threads of myThread(), what are the possible values of iters after thethreads complete? Why?
-
[4 points] Now let’s change our code up a bit:
volatile int iters = 6; // shared memory
void myThread(int *iters) { while(iters > 0)
--iters;
}
After we run four threads of myThread(), will iters always be 0? Why or why not?
Page 2 of 5
C. [4 points] An atomic read–modify–write (RMW) avoids race conditions for locks. Why don’t we use atomic RMW for all memory operations?
-
[8 points] MESI can reduce coherence traffic compared to MSI for a test and test-and-set lock. Describe in a few sentences why MESI is better for this type of lock.
Make sure your answer is specific to the lock, and not just about general advantages of MESI over MSI.
-
[4 points] Does an atomic RMW solve the false sharing problem? Why or why not?
-
[11 points] Are you consistent or coherent?
-
[2 points] We have a cache block where false sharing occurs. Do we need to use coherence, consistency, or neither to keep the data synchronized?
-
[3 points] Let’s look at a very simple trace of instructions from two threads belonging to the same process:
Thread 0 Thread 1
a=2 b=a b=3
Before the threads start, a and b are shared memory integers, each in their own cache block, that are both initialized to 1.
In a machine that implements sequential consistency, what are the different values that b could hold after a valid execution?
-
[6 points] Now, we want you to get rid of exactly one of those possible values of b, by running the threads on a machine that implements weak consistency. Rewrite the traces for each thread, showing where you insert SYNCH operations (if any are needed), and tell us which value of b is no longer possible. You should not introduce a new possible output, and you should not reorder the instructions.
-
-
[10 points]
A. [3 points] In a CPU with big–little cores, how does coherence help when we move a thread from a big core to a little core?
Page 3 of 5
-
[3 points] Describe why a GPU is different from having a very wide version of a SIMD instruction set extension.
-
[4 points] Look up two processors online: (1) the Intel Xeon Platinum 8470 (a server processor), and (2) the Snapdragon 8 Gen 2 (a mobile SoC). See if you can find specification that list the components inside each processor.
List one key difference between the two, and explain why these are different.
Coding Problems
5. [35 points] You will implement the MESI cache coherence protocol in Python, along with any necessary movement of data, in a cache subsystem.
Please read the problem statement completely before starting.
Important: We are using Python 3 (available for download at the official Python website, https: // www. python. org/ downloads/ ) with no specific OS or code editor requirements. However, your code must run on the Gradescope Autograder or you will not receive any credit for this problem.
Download a ZIP file with starter code at https://courses.engr.illinois.edu/cs433/ fa2023/hw/hw06-q05.zip.
System Setup
We define a multi-core system with a private cache per core, a shared last-level cache (LLC), and a main memory. The main memory is assumed to be large enough to store all data that we will handle, and the LLC has enough space to prevent any evictions from it.
Cache Back End
We use a simple model of caching where each cache is a hashmap, which maps a cache block to a MemoryLocation object that contains the data, state, and address of a piece of data. The state of each location should be defined as None in main memory and the LLC.
Whenever a private cache is full, it will try and evict the least recently used and Invalid MemoryLocation object. If all of the data in the cache is valid, then it will evict the least recently used data.
Each eviction will call the evict_from_private_cache() function in the MESICoherence class.
A load instruction to the cache triggers the load_data() method in the MESICoherence class. This method should return the data, which is stored as an int (not as a MemoryLocation object).
Page 4 of 5
When a store instruction accesses the cache, the store_data() method in the MESICoherence class is called, and this method should not return anything.
You should assume that all data is stored before a load call is made to it.
Your assignment is to code the MESICoherence class. This class contains a list of private CPU caches in cpu_cache variables, an LLC in the llc variable, and main memory in the main_memory variable.
Assumptions
You should assume that the main memory stores the initial state for all addresses passed in the trace. This means that you can load data from the MainMemory object for all addresses used by the tester.
Your private caches should be write-back, write-allocate.
Trace Description
The first line in each provided trace file contains three integers that define the size of the private cache and shared LLC. Main memory is assumed to be infinite.
All subsequent lines of the trace define load/store instructions. Each of these instructions is definedasLD/ST, <CPU ID>, <ADDR>, <DATA>,wherethedatatagrepresentstheexpected value for a load.
Local Testing
coherence.py has an optional output argument that prints a large JSON file containing the internal state of the system. The autograder generates this file to test code, so make sure that the internal state of the system is consistent with what you are implementing.
Submission
Make sure that you comment your logic in code (you will lose points for code that is not thoroughly commented in English). Please submit the following file (make sure that you do not the complete homework folder):
• mesi.py
Your code must work on the Autograder to get points and code with syntax errors will get no
points.
Make sure you check Campuswire for clarifications, and ask questions whenever you are stuck. Remember that this problem is only a part of your homework. Happy coding!
Page 5 of 5