Project - Cache Organization and Performance Evaluation
In this assignment, you will become familiar with how caches work and how to evaluate their performance. To achieve these goals, you will first build a cache simulator and validate its correctness. Then you will use your cache simulator to study many different cache organizations and management policies as discussed in lecture and in Chapter 5 of Hennessy & Patterson.
Section 1 will walk you through how to build the cache simulator, and section 2 will specify the performance evaluation studies you will undertake using your simulator.
1 Cache Simulator
In the first part of this assignment, you will build a cache simulator. The type of simulator you will build is known as a trace-driven simulator because it takes as input a trace of events, in this case memory references. The trace, which we will provide for you, was acquired on another machine. Once acquired, it can be used to drive simulation studies. In this assignment, the memory reference events specified in the trace(s) we will give you will be used by your simulator to drive the movement of data into and out of the cache, thus simulating its behavior. Trace-driven simulators are very effective for studying caches.
Your cache simulator will be configurable based on arguments given at the command line, and must support the following functionality:
Total cache size
Block size
Unified vs. split I- and D-caches Associativity
Write back vs. write through Write allocate vs. write no allocate
In addition to implementing the functionality listed above, your simulator must also collect and report several statistics that will be used to verify the correctness of the simulator, and that will be used for performance evaluation later in the assignment. In particular, your simulator must track:
Number of instruction references Number of data references Number of instruction misses Number of data misses
Number of words fetched from memory Number of words copied back to memory
1.1 Files
For your project, there are five program files in total, as indicated in table below which lists the file names and a short description of their contents.
File Name Makefile main.c main.h cache.c cache.h
Description Builds the simulator.
Top-level routines for driving the simulator. Header file for main.c.
Cache model.
Header file for cache.c.
The “Makefile” is a UNIX make file. Try typing make in the local directory where you’ve copied the files. This will build the simulator from the program files that have been provided, and will produce an executable called “sim.” Of course, the executable doesn’t do very much since the files we have given you are only a template for the simulator.
The trace files are in ASCII format, so they are in human-readable form. Each line in the trace file represents a single memory reference and contains two numbers: a reference type, which is a number between 0–2, and a memory address. All other text following these two numbers should be ignored by your simulator. The reference number specifies what type of memory reference is being performed with the following encoding:
- 0 Data load reference
- 1 Data store reference
- 2 Instruction load reference
The number following the reference type is the byte address of the memory reference itself. This number is in hexadecimal format, and specifies a 32-bit byte address in the range 0-0xffffffff, inclusive.
1.2 Building the Cache Model
There are many ways to construct the cache model. You will be graded only on the correctness of the model, so you are completely free to implement the cache model in any fashion you choose. In this section, we give some hints for an implementation that uses the data structures given in the template code.
1.2.1 Incremental Approach
The most important hint is a general software engineering rule of thumb: build the simulator by incrementally adding functionality. The biggest mistake you can make is to try to implement the cache functions all at once. Instead, build the very simplest cache model possible, and test it thoroughly before proceeding.
1. Build a unified, fixed block size, direct-mapped cache with a write-back write policy and a write allocate allocation policy.
2. Add variable block size functionality.
3. Add variable associativity functionality.
4. Add split organization functionality.
5. Add write through write policy functionality.
6. Add write no-allocate allocation policy functionality.
You can test your cache model at each stage by comparing the results you get from your simulator with the validation numbers which we will provide.
1.2.2 Cache Structures
In cache.h, you will find the data structure cache which implements most of the cache model: which you should ini- tialize in init cache(). Consult the lectures and text to see how these constants are computed. The remaining three fields, LRU head, LRU tail, and set contents, are the main structures that implement the cache. Let us first consider how to implement the simplest case, a direct-mapped cache. In this case, you only need the LRU head field.
Once you have computed the number of sets in the cache, n sets, you should allocate an array of cache line pointers:
my_cache.LRU_head =
(Pcache_line *)malloc(sizeof(Pcache_line)*my_cache.n_sets);
The LRU head array is the data structure that keeps track of all the cache lines in the cache: each element in this array is a pointer to a cache line that occupies that set, or a NULL pointer if there is no valid cache line in the set (initially, all elements in the array should be set to NULL). The cache line itself is kept in the cache line data structure, also declared in cache.h:
1.2.3 Adding Associativity and LRU Replacement
policy. One way to implement LRU is to keep the linked list in each set sorted by the order in which the cache lines were referenced. This can be done by removing a cache line from the linked list each time you reference it, and inserting it back into the linked list at the head. In this case, you should always evict the cache line at the tail of the list.
To build set-associative caches and to implement the LRU replacement policy described above, you should use the two routines, delete and insert, provided in the cache.c module. delete removes an item from a linked list, and insert inserts an item at the head of a linked list.
1.3 Validation
In Section 2, you will use your cache simulator to analyze the characteristics and performance of various cache configurations on the traces that we have provided for you. Before doing this, it is important to validate the accuracy of your cache simulator.
You should now have a cache simulator that can be configured (for total size, block size, uni- fied versus split, associativity, write-through versus write-back, and write-allocate versus write-no- allocate). Furthermore, this simulator should be verified against the sample cache statistics that we have given you. (If you don’t have such a cache simulator, do not attempt this portion of the assignment until you do).
2 Performance Evaluation
2.1 Working Set Characterization
In this performance evaluation exercise, you will characterize the working set size of the three sample traces given.
Using your cache simulator, plot the hit rate of the cache as a function of cache size. Start with a cache size of 4 bytes, and increase the size (each time by a factor of 2) until the hit rate remains insensitive to cache size. Use a split cache organization so that you can separately characterize the behavior of instruction and data references (i.e., you will have two plots per sample trace–one for instructions, and one for data).
2.2 Impact of Block Size
Answer the following questions:
- Explain why the hit rate vs. block size plot has the shape that it does. In particular, explain the relevance of spatial locality to the shape of this curve.
- What is the optimal block size (consider instruction and data references separately) for each trace?
- Is the optimal block size for instruction and data references different? What does this tell you about the nature of instruction references versus data references?
2.3 Impact of Associativity
Set your cache simulator for a split I- D-cache organization, each of size 8 K-bytes. Set the block size to 128 bytes, use a write-back cache, and use a write-allocate policy. Plot the hit rate of the cache as a function of associativity. Vary the associativity between 1 and 64, in powers of 2. Do this for the three traces, and make a separate plot for instruction and data references.
2.4 Explain why the hit rate vs. associativity plot has the shape that it does.
Is there a difference between the plots for instruction and data references? What does this tell you about the difference in impact of associativity on instruction versus data references?
Memory Bandwidth
Answer the following questions:
Using your cache simulator, compare the total memory traffic generated by a write-through versus write-back cache. Use a split I- D-cache organization. Simulate a couple of reasonable cache sizes (such as 8 K-bytes and 16 K-bytes), reasonable block sizes (such as 64 and 128 bytes), and reasonable associativities (such as 2 and 4). For now, use a write-no-allocate policy. Run 4 or 5 different simulations total.
Answer the following questions:
- Which cache has the smaller memory traffic, the write-through cache or the write-back cache? Why?
- Is there any scenario under which your answer to the question above would flip? Explain.
Now use the same parameters, but fix the policy to write-back. Perform the same experiment, but this time compare the write-allocate and write-no-allocate policies.
Answer the following questions:
- Which cache has the smaller memory traffic, the write-allocate or the write-no-allocate cache? Why?
- Is there any scenario under which your answer to the question above would flip? Explain.