18-213 / 15-613, Fall 2023
Malloc Lab: Writing a Dynamic Storage Allocator
Assigned: Oct 23, 2023
This lab requires submitting two versions of your code: one as an initial checkpoint, and the second as your final version. The due dates of each part are indicated in the following table:
1 Introduction
In this lab you will write a dynamic memory allocator which will consist of the malloc, free, realloc, and calloc functions. Your goal is to implement an allocator that is correct, efficient, and fast.
We strongly encourage you to start early. The total time you spend designing and debugging can easily eclipse the time you spend coding.
Bugs can be especially pernicious and difficult to track down in an allocator, and you will probably spend a significant amount of time debugging your code. Buggy code will not get any credit.
This lab has been heavily revised from previous versions. Do not rely on advice or information you may find on the Web or from people who have done this lab before. It will most likely be misleading or outright wrong.1 Be sure to read all of the documentation carefully and especially study the baseline implementation we have provided.
2 Logistics
This is an individual project. You should do this lab on one of the Shark machines.
To get your lab materials, click “Download Handout” on Autolab, enter your Andrew ID, and follow the instructions. Then, clone your repository on a Shark machine by running:
The only file you will turn in is mm.c. All the code for your allocator must be in this file. The rest of the provided code allows you to evaluate your allocator. Using the command make will generate four driver programs: mdriver, mdriver-dbg, mdriver-emulate, and mdriver-uninit, as described in section 6. Your final autograded score is computed by driver.pl, as described in section 7.1.
To test your code for the checkpoint submission, run mdriver and/or driver.pl with the -C flag. To test your code for the final submission, run mdriver and/or driver.pl with no flags.
These commands will report accurate utilization numbers for your allocator. They will only report approximate throughput numbers. The Autolab servers will generate different throughput numbers, and the servers’ numbers will determine your actual score. This is discussed in more detail in Section 7.
3
3 Required Functions
Your allocator must implement the following functions. They are declared for you in mm.h and you will find starter definitions in mm.c. Note that you cannot alter mm.h in this lab.
bool mm_init(void);
void *malloc(size_t size);
void free(void *ptr);
void *realloc(void *ptr, size_t size);
void *calloc(size_t nmemb, size_t size);
bool mm_checkheap(int);
We provide you two versions of memory allocators:
mm.c: A fully-functional implicit-list allocator. We recommend that you use this code as your starting point. Note that the provided code does not implement block coalescing. The absence of this feature will cause external fragmentation to be very high, so you should implement coalescing. We strongly recommend considering all cases you need to implement before writing code for coalesce_block; the lecture slides should help you identify and reason about these cases.
mm-naive.c: A functional implementation that runs quickly but gets very poor utilization, because it never reuses any blocks of memory.
Your allocator must run correctly on a 64-bit machine. It must support a full 64-bit address space, even though current implementations of x86-64 machines support only a 48-bit address space.
Your submitted mm.c must implement the following functions:
bool mm_init(void): Performs any necessary initializations, such as allocating the initial heap area. The
return value should be false if there was a problem in performing the initialization, true otherwise. You must reinitialize all of your data structures each time this function is called, because the drivers
call your mm_init function every time they begin a new trace to reset to an empty heap.
void *malloc(size_t size): Returns a pointer to an allocated block payload of at least size bytes. The entire allocated block should lie within the heap region and should not overlap with any other allocated block.
Your malloc implementation must always return 16-byte aligned pointers, even if size is smaller than 16.
void free(void *ptr) : If ptr is NULL, does nothing. Otherwise, ptr must point to the beginning of a block payload returned by a previous call to malloc, calloc, or realloc and not already freed. This block is deallocated. Returns nothing.
void *realloc(void *ptr, size_t size): Changes the size of a previously allocated block.
If size is nonzero and ptr is not NULL, allocates a new block with at least size bytes of payload, copies as much data from ptr into the new block as will fit (that is, copies the smaller of size, or the payload size of ptr, bytes), frees ptr, and returns the new block.
If size is nonzero but ptr is NULL, does the same thing as malloc(size). If size is zero, does the same thing as free(ptr) and then returns NULL.
Your realloc implementation will have only minimal impact on measured throughput or utilization. A correct, simple implementation will suffice.
4
void *calloc(size_t nmemb, size_t size): Allocates memory for an array of nmemb elements of size bytes each, initializes the memory to all bytes zero, and returns a pointer to the allocated memory.
Your calloc implementation will have only minimal impact on measured throughput or utilization. A correct, simple implementation will suffice.
bool mm_checkheap(int line): Scans the entire heap and checks it for errors. This function is called the heap consistency checker, or simply heap checker.
A quality heap checker is essential for debugging your malloc implementation. Many malloc bugs are too subtle to debug using conventional gdb techniques. A heap consistency checker can help you isolate the specific operation that causes your heap to become inconsistent.
Because of the importance of the consistency checker, it will be graded, by hand; section 7.2 describes the requirements for your implementation in greater detail. We may also require you to write your heap checker before coming to office hours.
The mm_checkheap function takes a single integer argument that you can use any way you want. One technique is to use this argument to pass in the line number where it was called, using the __LINE__ macro:
mm_checkheap(__LINE__);
This allows you to print the line number where mm_checkheap was called, if you detect a problem with
the heap.
The driver will sometimes call mm_checkheap; when it does this it will always pass an argument of 0.
The semantics of malloc, realloc, calloc, and free match the semantics of the functions with the same names in the C library. You can type man malloc in the shell for more documentation.
5
4 Support Routines
To satisfy allocation requests, dynamic memory allocators must themselves request memory from the operating system, using “primitive” system operations that are less flexible than malloc and free. In this lab, you will use a simulated version of one such primitive. It is implemented for you in memlib.c and declared in memlib.h.
void *mem_sbrk(intptr_t incr): Expands the heap by incr bytes, and returns a generic pointer to the first byte of the newly allocated heap area. If the heap cannot be made any larger, returns (void *)
-1. (Caution: this is different from returning NULL.)
Each time your mm_init function is called, the heap has just been reset to zero bytes long.
mem_sbrk cannot make the heap smaller; it will fail (returning (void *) -1) if size is negative.
Caution: The definition of “last valid byte” may not be intuitive! If your heap is 8 bytes large, then the
last valid byte will be 7 bytes from the start—not an aligned address. size_t mem_heapsize(void): Returns the current size of the heap in bytes.
You can also use the following standard C library functions, but only these: memcpy, memset, printf, fprintf, and sprintf.
Your mm.c code may only call the externally-defined functions that are listed in this section. Otherwise, it must be completely self-contained.
6
5
Programming Rules
-
Any allocator that attempts to detect which trace is running will receive a penalty of 20 points. On the other hand, you should feel free to write an adaptive allocator—one that dynamically tunes itself according to the general characteristics of the different traces.
-
You may not change any of the interfaces in mm.h, or any of the other C source files and headers besides mm.c. (Autolab only processes your mm.c; it will not see changes you make to any other file.) However, we strongly encourage you to use static helper functions in mm.c to break up your code into small, easy-to-understand segments.
-
You may not change the Makefile (again, Autolab will not see any changes you make there) and your code must compile with no warnings using the warnings flags we selected.
-
You are not allowed to declare large global data structures such as large arrays, trees, or lists in mm.c. You are allowed to declare small global arrays, structs, and scalar variables, and you may have as much constant data (defined with the const qualifier) as you like. Specifically, you may declare no more than 128 bytes of writable global variables, total. This is checked automatically, as described in Section 7.1.4.
The reason for this restriction is that global variables are not accounted for when calculating your memory utilization. If you need a large data structure for some reason, you should allocate space for it within the heap, where it will count toward external fragmentation.
-
Dynamic memory allocators cannot avoid doing operations that the C standard labels as “undefined behavior.” They need to treat the heap as a single huge array of bytes and reinterpret those bytes as different data types at different times. It is rarely appropriate to write code in this style, but in this lab it is necessary.
We ask you to minimize the amount of undefined behavior in your code. For example, instead of directly casting between pointer types, you should explicitly alias memory through the use of unions. Additionally, you should confine the pointer arithmetic to a few short helper functions, as we have tried to do in the handout code.
-
In the provided baseline code, we use a zero-length array to declare a payload element in the block struct. This is a non-standard compiler extension, which, in general, we discourage the use of, but in this lab we feel it is better than any available alternative.
A zero-length array is not the same as a C99 “flexible array member;” it can be used in places where a flexible array member cannot. For example, a zero-length array can be a member of a union. Using zero-length arrays this way is our recommended strategy for declaring a block struct that might contain payload data, or might contain something else (such as free list pointers).
-
The practice of using macros instead of function definitions is now obsolete. Modern compilers can perform inline substitution of small functions, eliminating the overhead of function calls. Use of inline functions provides better type checking and debugging support.
In this lab, you may only use #define to define constants (macros with no parameters) and debugging macros that are enabled or disabled at compile time. Debugging macros must have names that begin with the prefix “dbg_” and they must have no effect when the macro-constant DEBUG is not defined.
Here are some examples of allowed and disallowed macro definitions:
7
#define DEBUG 1 OK #define CHUNKSIZE (1<<12) OK #define WSIZE sizeof(uint64_t) OK #define dbg_printf(...) printf(__VA_ARGS__) OK #define GET(p) (*(unsigned int *)(p)) Not OK #define PACK(size, alloc) ((size)|(alloc)) Not OK
Defines a constant Defines a constant Defines a constant Debugging support Has parameters Has parameters
When you run make, it will run a program that checks for disallowed macro definitions in your code. This checker is overly strict—it cannot determine when a macro definition is embedded in a comment or in some part of the code that has been disabled by conditional-compilation directives. Nonetheless, your code must pass this checker without any warning messages.
• It is okay to adapt code for useful generic data structures and algorithms (e.g. linked lists, hash tables, search trees, and priority queues) from any external source (e.g. K&R, Wikipedia, The Art of Computer Programming) as long as it was not already part of a memory allocator. You must include (as a comment) an attribution of the origins of any borrowed code.
• Your allocator must always return pointers that are aligned to 16-byte boundaries, even if the allocation is smaller than 16 bytes. The driver will check this requirement.
8
6 Driver Programs
Four driver programs are generated when you run make.
mdriver is used by Autolab to test your allocator’s correctness, utilization, and throughput on a standard set
of benchmark traces.
mdriver-emulate is used by Autolab to test your allocator with a heap spanning the entire 64-bit address space. In addition to the standard benchmark traces, it will run a set of giant traces that make very large allocation requests.
As the name implies, this test is an emulation: it does not actually allocate exabytes of memory. However, it verifies that your allocator could handle allocations that large, if the hardware permitted them. Failing the checks performed by mdriver-emulate leads to grade penalties, as described in section 7.1.4.
mdriver-dbg is for you to use when debugging your allocator. It is the same program as mdriver, with three notable differences:
-
It is compiled with DEBUG defined, which enables the dbg_ macros at the top of mm.c. Without this defined, functions like dbg_printf and dbg_assert will not have any effect.
-
It is compiled with optimization level -O0, which allows GDB to display more meaningful debugging information.
-
It uses the AddressSanitizer instrumentation tool2 to detect several classes of errors that are easy to make when writing an allocator.
mdriver-uninit is also for you to use when debugging. It uses the MemorySanitizer instrumentation tool3 to detect uses of uninitialized memory.
mdriver-dbg, mdriver-emulate, and mdriver-uninit are much slower than mdriver, so they only report correctness and the utilization score for each trace. All four programs should report the same utilization scores for each trace that they all run (only mdriver-emulate runs the giant traces).
6.1 Trace files
The driver programs are controlled by a set of trace files that are included in the traces subdirectory. Each trace file contains a sequence of commands that instruct the driver to call your malloc, realloc, and free routines in some sequence. Autolab will use the same trace files to grade your work.
When the driver programs are run, they will process each trace file multiple times: once to make sure your implementation is correct, once to determine the space utilization, and between 3 and 20 more times to determine the throughput.
Some of the traces are short traces that are included mainly for detecting errors and debugging. Your utilization and performance scores on these traces do not count toward your grade. The traces that do count are marked with a ‘*’ in the output of mdriver.
6.2 Command-line arguments
The drivers accept the following command-line arguments.
-C: Apply the scoring standards for the checkpoint, rather than for the final submission.
-f tracefile: Only run the trace tracefile. Correctness, utilization, and performance are all tested.
-c tracefile: Only run the trace tracefile, and only test it for correctness. This still runs the trace twice, to verify that mm_init correctly resets your heap.
-v level: Set the verbosity level to the specified value. The level can be 0, 1, or 2; the default level is 1. Raising the verbosity level causes additional diagnostic information to be printed as each trace file is processed. This can help you determine which trace file is causing your allocator to fail.
-d level: Controls the amount of validity checking performed by the driver. This is separate from the DEBUG compile-time define.
At debug level 0, very little checking is done, which is useful when testing performance only.
At debug level 1, the driver checks allocation payloads to ensure that they are not overwritten by unrelated calls into your code. This is the default.
At debug level 2, the driver will also call your implementation of mm_checkheap after each operation. This mode is slow, but it can help identify the exact point at which an error occurs.
Additional arguments can be listed by running mdriver -h.
7 Scoring
Malloc Lab is worth 11% of your final grade in the course. This is divided into the Checkpoint and Final submissions, which have separate due dates.
The checkpoint submission, worth 4% of your final grade, is graded as follows:
Autograded score (7.1) Heap checker (7.2)
Total
The final submission, worth 7% of your final grade, is graded as follows:
100 points 10 points
110 points
Autograded score (7.1) Code style (7.3)
Total
100 points 4 points
104 points
7.1
Autograded score
driver.pl is the program that Autolab will use to calculate your score. For the checkpoint submission, it will be run with the -C flag; for the final submission, it will be run with no flags. This sets the grading standards, as described later in this section.
driver.pl computes the autograded score in two steps. First, mdriver is run to obtain the performance index P, which is a number between 0 and 100 (inclusive). If mdriver detects incorrect behavior on any trace, P will be zero. Otherwise, P is computed from both utilization and throughput, as described below (Section 7.1.1). Second, mdriver-emulate is run to detect forms of incorrect behavior that mdriver cannot detect. Incorrect behavior detected by mdriver-emulate will cause deductions from P, as described in Section 7.1.4.
Your autograded score is P minus any deductions. Separately, your code will be read by TAs and graded on the thoroughness of your heap checker (see Section 7.2) and for overall style (see Section 7.3).
7.1.1 Performance index
Both memory and CPU cycles are expensive system resources, so the performance index P is a weighted sum of your allocator’s space utilization and throughput. The weights are different for the checkpoint and the final submission:
Version
Checkpoint Final
Utilization Throughput
20% 80% 60% 40%
That is, in English, for the checkpoint your score will be computed as 20% utilization and 80% throughput, and for the final it will be 60% utilization and 40% throughput.
7.1.2 Utilization
The utilization of a single trace is the peak ratio between the total amount of memory used by the driver at any one moment (i.e. allocated via malloc but not yet freed via free) and the size of the heap used by your allocator. All allocators have memory overhead: it is not possible to achieve a utilization of 100%. Your goal is to make the utilization as high as possible.
11
The utilization of your allocator, U, will be calculated as the average utilization across all traces. The associated score will be computed as follows:
• If U ≤ Umin then you will receive no credit for utilization.
• If Umin < U < Umax then your utilization score will scale linearly with U.
• If U ≥ Umax then you will receive full credit for utilization.
The values of Umin and Umax are different for the checkpoint and the final submission:
Version Umin Umax Checkpoint 55% 58%
Final 55% 74%
7.1.3 Throughput
The throughput of a single trace is measured by the average number of operations completed per second, expressed in kilo-operations per second or KOPS. A trace that takes T seconds to perform n operations will have a throughput of n/(1000 · T ) KOPS.
Throughput measurements vary according to the type of CPU running the program. We will compensate for this machine dependency by evaluating the throughput of your implementations relative to those of reference implementations running on the same machine. For information on how this is done, see Appendix A.2.
The throughput of your allocator, T , will be calculated as the average throughput across all traces and then compared to the reference throughput Tref which will change from checkpoint to final, and from machine to machine. The associated score will be computed as follows:
The throughput standards are set low enough that we expect your allocator will significantly exceed the requirements for Tmax. If you achieve this, it will also insulate you from run-to-run variations caused by system load.
Remember that throughput scores printed by mdriver on a Shark machine are only an indication of your allocator’s performance. Your scores on Autolab are the only scores that count.
7.1.4 Autograded deductions
The driver.pl program will also run the mdriver-emulate program (see Section 6), which emulates a full 64-bit address space. This may deduct points from your autograded score in the following circumstances:
• If mdriver-emulate fails to run successfully, 30 points will be deducted. This can happen if your code fails to support a full 64-bit address space; for example, if it uses int where a size_t is needed.
12
• Iftheutilizationofatracediffersbetweenmdriverandmdriver-emulate,30pointswillbededucted. • If your program uses more than 128 bytes of global data (see the Programming Rules in Section 5),
then up to 20 points will be deducted. 7.2 Heap Consistency Checker
10 points will be awarded based on the quality of your implementation of mm_checkheap. The heap checker will be graded for your checkpoint submission only. It will not be graded in your final submission.
We require that you check all of the invariants of your data structures. Specific items will be dependent on your design, so after making design changes, think about what changes you need to make to your heap checker. Some examples of what your heap checker should check:
• Checking the heap (implicit list, explicit list, segregated list):
-
– Check for epilogue and prologue blocks.
-
– Check each block’s address alignment.
-
– Check blocks lie within heap boundaries.
-
– Check each block’s header and footer: size (minimum size), previous/next allocate/free bit consistency, header and footer matching each other.
-
– Check coalescing: no consecutive free blocks in the heap. • Checking the free list (explicit list, segregated list):
-
– All next/previous pointers are consistent (if A’s next pointer points to B, B’s previous pointer should point to A).
-
– All free list pointers are between mem_heap_lo() and mem_heap_high().
-
– Count free blocks by iterating through every block and traversing free list by pointers and see if
they match.
-
– All blocks in each list bucket fall within bucket size range (segregated list).
Your heap checker should run silently until it detects some error in the heap. Then, and only then, should it print a message and return false. If it finds no errors, it should return true. It is very important that your heap checker run silently; otherwise, it will produce too much output to be useful on the large traces.
7.3 Style
4 points will be awarded based on the quality of your code style, following the Style Guidelines on the website at http://www.cs.cmu.edu/~18213/codeStyle.html. Style will be graded for your final submission only. It will not be graded for your checkpoint submission.
Some points to keep in mind for malloclab in particular:
• Version control. You must commit your code regularly using Git. This allows you to keep track of your changes, revert to older versions of your code, and regularly remind yourself of what you changed and why
-
Function comments. In addition to the overview header comment, each function must be preceded by a header comment that describes what the function does. Make sure to review the course style guide: we are expecting that for each function, you document at a minimum its purpose, arguments, return value, and any relevant preconditions or postconditions.
-
Inline comments. You will want to use inline comments to explain code flow or code that is tricky.
-
Extensibility. Your code should be modular, robust, and easily scalable. You should be able to easily change various parameters that define your allocator, without any changes in the actual operation of the program. For example, you should be able to arbitrarily change the number of segregated lists with minimal modifications.
Study the code in mm.c as an example of the desired coding style.
For formatting your code, we require that you use the clang-format tool, which automatically reformats your code according to the .clang-format configuration file. To invoke it, run make format. You are welcome to change the configuration settings to match your desired format. More information is available in the style guideline.
7.4 Handin Instructions
Make sure your code does not print anything during normal operation, and that all debugging macros have been disabled. Ensure that you have committed and pushed the latest version of your code to GitHub.
To submit your code, run make submit or upload your mm.c file to Autolab. Only the last version you submit will be graded.
Useful Tips
• You’ll find debugging macros defined in mm.c that provide contract functions, such as dbg_assert. We encourage making liberal use of these contracts to verify invariants and ensure correctness of your code.
-
Use the drivers’ -c and -f options to run individual traces. During initial development, using short trace files will simplify debugging and testing.
-
Use the drivers’ verbose mode. The -V option will also indicate when each trace file is processed, which will help you isolate errors.
-
Use gdb to help you debug. This will help you isolate and identify out-of-bounds memory refer- ences. When debugging, use the mdriver-dbg binary, which is compiled with the -O0 flag to disable optimizations.
-
Use gdb’s watch command to find out what changed some value you did not expect to have changed.
-
Reduce obscure pointer arithmetic through the use of struct’s and union’s. Although your data structures will be implemented in compressed form within the heap, you should strive to make them look as conventional as possible using struct and union declarations to encode the different fields. Examples of this style are shown in the baseline implementation.
-
Encapsulate your pointer operations in functions. Pointer arithmetic in memory managers is confusing and error-prone because of all the casting that is necessary. You can significantly reduce the complexity by writing functions for your pointer operations. See the code in mm.c for examples. Remember: you are not allowed to define macros—the code in the textbook does not meet our coding standards for this lab.
-
Your allocator must work for a 64-bit address space. The mdriver-emulate will specifically test this capability. You should allocate a full eight bytes for all of your pointers and size fields. (You can take advantage of the low-order bits for some of these values being zero due to the alignment requirements.) Make sure you do not inadvertently invoke 32-bit arithmetic with an expression such as 1«32, rather than 1L«32.
15
9 Office Hours
This lab has a significant implementation portion that is much more involved than the prior labs. Expect to be debugging issues for several hours — there is no way around this.
TAs will NOT:
-
Go line-by-line through your program to determine where things go wrong.
-
Read your code to determine if everything you are doing is correct. There are simply too many subtle issues to go through everyone’s programs.
TAs will:
• Help you use GDB efficiently to monitor the state of your program. • Go through conceptual discussions of what you are doing.Here are some useful things to have ready before a TA comes to you:
-
A non-trivial heap checker that exhaustively tests your implementation.
-
Documentation of your data structures (drawings are great!)
-
A detailed description of problems you are experiencing, and what you’ve looked into so far (not “my malloc doesn’t work and I don’t know why")
As a reminder, check our office hours schedule on the course website. Start early so you can get help early!