A1: Cooperative Threads
Cooperative, User-Level Thread Package
In this assignment, you will create a library of functions that define a user-level threads package. Using your library, a program will be able to create threads, destroy them, and run them concurrently on a single processor.
1. Background on Threads
2. Using Threads
3. Cooperative Threads API
4. Solution Requirements
5. Implementation Details
6. Hints and Tips
7. Setup and Submission
Background on Threads
Threads and processes are key abstractions for enabling concurrency in operating systems. To gain a deeper understanding of how these abstractions are constructed, you will build the core of a user-level threads package. Implementing kernel-level threads and processes is not much different, but we'll do this assignment at user level since (a) installing new OS kernels on the teach.cs machines is problematic and (b) debugging user-level code is much easier than debugging kernel-level code.
Threads provide the illusion that different parts of your program are executing concurrently. In the de facto standard model of multithreaded execution, threads share the code, heap, and the runtime system. Each thread, however, has a separate stack and, naturally, a separate set of CPU registers. This programming model also provides synchronization primitives so that different threads can coordinate access to shared resources, but we will leave those concerns for the next assignment.
User-Level vs. Kernel Threads
For practical reasons, this assignment is done at user level: you will construct user threads by implementing a set of functions that your program will call directly to provide the illusion of concurrency. In contrast, modern operating systems provide kernel threads, and a user program invokes the corresponding kernel thread functions via system calls. Both types of threads use the same core techniques for providing the concurrency abstraction; you would build kernel threads in essentially the same way you build user threads in this assignment. Also, kernel processes are built using these techniques.
However, there are a few differences between kernel and user threads:
Multiprocessing
User-level threads provide the illusion of concurrency, but on machines with multiple processors kernel-level threads can provide actual concurrency. With user-level threads, the OS schedules the user process on one CPU, and the user-level threads package multiplexes the kernel thread associated with the process between one or more user-level threads. With kernel-level threads, the OS is aware of the different (kernel) threads, and it can simultaneously schedule these threads from the same process on different processors.
A key simplifying assumption for this assignment is that you will allow programs to multiplex some number (e.g., m) of user-level threads on one kernel thread. This means that at most one user-level thread is running at a time and that your runtime system has complete control over the interleaving of user-level threads with each other. More sophisticated systems implement m on n threads packages where m user-level threads are multiplexed across n kernel threads.
Asynchronous I/O
When a user-level thread makes a system call that blocks (e.g., reading a file from disk), the OS scheduler moves the process to the Blocked state and will not schedule it until the I/O has completed. Thus, even if there are other user-level threads within that process, they also have to wait. In contrast, when a kernel thread blocks for a system call, the OS scheduler may choose another kernel thread from the same process to run. Thus, some kernel threads may be running while others are waiting for I/O.
Timer interrupts
The OS scheduler uses timer interrupts to preempt a running kernel thread and switch the CPU to a different runnable kernel thread. Similar to blocking system calls, this stops the execution of all user-level threads in the process until the kernel thread is scheduled to run again. However, to switch between user-level threads that are multiplexed on a single kernel thread, we cannot rely on timer interrupts (those are delivered to the OS, and not to our thread library runtime). Instead, you will implement a cooperative threading model, where the currently running user-level thread must explicitly yield the processor to another user-level thread by calling a function provided by your library.
In the next assignment, we will simulate timer interrupts that cause the scheduler to switch from one thread or process to another by using POSIX signals. In your implementation, the threads library will "turn off interrupts" by blocking delivery of these signals using system calls. However, there is nothing to prevent the threads, themselves, from "turning off interrupts" the same way. Thus, even though we will implement "preemptive" threads, a "malicious" thread could turn off interrupts and not be preempted until it calls yield, thus hogging the CPU. Note that kernel schedulers don't have this problem. Only the privileged code in the kernel can turn off the real timer interrupts.
Using Threads
With your threads library, a typical program will look something like this:
int
main(int argc, char **argv)
{
// Some initialization
// Create some threads
// wait for threads to finish
// exit
}
// "main" function for thread i
thread_main_i (...)
{
// do some work
// yield
// do some more work
// return (implicit thread exit)
}
Here thread_main_i is a programmer-supplied "main" function that the ith thread starts executing when it is first scheduled to run after its creation. (Note that different threads may have the same "main" function.) The thread can perform useful work by calling any other functions in the program, or voluntarily yielding to other threads.
A thread exits either explicitly or implicitly. It exits explicitly when it calls the thread_exit function in the thread library. It exits implicitly when its thread_main function returns. Additionally, to add more control to the program, a thread may call thread_kill to force other threads to exit as well.
Cooperative Threads API
A key simplifying assumption in this assignment is that the threads are cooperative, i.e., each thread runs until it explicitly releases the CPU to another thread by yielding or by exiting. In contrast, preemptive threading systems allow a scheduler to interrupt a running thread at any time and switch the CPU to running a different thread.
The thread package provides several functions to allow application programs to perform thread management. In addition, there are a few conventions that application programs must follow to ensure proper and safe operation. A list of the functions that constitute the user-level threads API can be found in the thread.h file. The functions that you will be implementing for this assignment are summarized here:
void thread_init(void):
You can use this function to perform any initialization that is needed by your threading system. Here, you should also hand-craft the first user thread in the system. To do so, you should configure your thread state data structures so that the (kernel) thread that is running when your program begins (before any calls to thread_create) will appear as the first user thread in the system (with tid = 0). You will not need to allocate a stack for this thread, because it will run on the (user) stack allocated for this kernel thread by the OS.
Tid thread_id():
This function returns the thread identifier of the currently running thread. The return value should lie between 0 and THREAD_MAX_THREADS-1
(inclusive). See Solution Requirements below. Tid thread_yield(Tid to_tid):
This function suspends the caller and activates the thread given by the identifier to_tid. The caller is put on the ready queue and can be run later in a similar fashion. A reasonable policy is to add the caller to the tail of the ready queue to ensure fairness (so all other threads are run before this thread is scheduled again - see the THREAD_ANY argument below). The value of to_tid may take the identifier of any available thread. It also can take any of the following constants:
THREAD_ANY: tells the thread system to run any thread in the ready queue. A reasonable policy is to run the thread at the head of the ready
queue, which ensures fairness. This policy is called FIFO (first-in, first-out), since the thread that first entered the ready queue (among the
threads that are currently in the ready queue) is scheduled first.
THREAD_SELF: tells the thread system to continue the execution of the caller. This function could be implemented as a no-op, but it may be
useful to explicitly switch to the current thread for debugging purposes.
The thread_yield function returns the identifier of the thread that took control as a result of the function call. Note that the caller does not get to
see the result until it gets its turn to run (later). The function may also fail and the caller continues execution immediately. To indicate the reason for failure, the call returns one of these constants:
THREAD_INVALID: alerts the caller that the identifier to_tid does not correspond to a valid thread.
THREAD_NONE: alerts the caller that there are no more threads, other than the caller, that are available to run, in response to a call with to_tid
set to THREAD_ANY.
Tid thread_create(void (*fn)(void *), void *arg):
This function creates a thread whose starting point is the function fn. The second argument, arg, is a pointer that will be passed to the function fn when the thread starts executing. The created thread is put on a ready queue but does not start execution yet. The caller of the thread_create function continues to execute after thread_create returns. Upon success, thread_create returns a thread identifier of type Tid. If thread_create fails, it returns a value that indicates the reason for failure as follows:
THREAD_NOMORE: alerts the caller that the thread package cannot create more threads. See Solution Requirements below. THREAD_NOMEMORY: alerts the caller that the thread package could not allocate memory to create a stack of the desired size. See Solution Requirements below.
void thread_exit(int exit_code):
This function ensures that the current thread does not run after this call, i.e., this function should never return. If there are other threads in the system, one of them should be run. If there are no other threads (this is the last thread invoking thread_exit), then the program should exit with the supplied exit_code. A thread that is created later should be able to reuse this thread's identifier, but only after this thread has been destroyed. (Note that we will be making more use of the exit_code in the next assignment.)
Tid thread_kill(Tid victim):
This function kills another thread whose identifier is victim. The victim can be the identifier of any available thread. The killed thread should not run any further and the calling thread continues to execute. Upon success, this function returns the identifier of the thread that was killed. Upon failure, it returns the following:
THREAD_INVALID: alerts the caller that the identifier victim does not correspond to a valid thread, or is the current thread. Solution Requirements
The first thread in the system (before the first call to thread_create) should have a thread identifier of 0.
Your threads system should support the creation of a maximum of THREAD_MAX_THREADS concurrent threads by a program (including the initial
main thread). Thus, the maximum value of the thread identifier should be THREAD_MAX_THREADS - 1 (since thread identifiers start from 0). Note
that when a thread exits, its thread identifier can be reused by another thread created later.
The thread_id() function should run in constant time (i.e., the time to find the TID of the current thread should not depend on the number of
threads that have been, or could be, created).
Your library must maintain a "thread control block" (a thread structure) for each thread that is running in the system. This is similar to the process
control block that an operating system implements to support process management. In addition, your library must maintain a queue of the
threads that are ready to run, so that when the current thread yields, the next thread in the ready queue can be run. Your library allows running a
fixed number of threads (THREAD_MAX_THREADS threads), so if it is helpful, you could allocate these structures statically (e.g., as a global array).
Each thread should have a stack of at least THREAD_MIN_STACK bytes. Your implementation must not statically allocate all stacks at initialization
time (e.g., using a global data structure). Instead, you must dynamically allocate a stack (e.g., using malloc()) whenever a new thread is
created (and delete one each time a thread is destroyed.)
Your library must use getcontext and setcontext to save and restore thread context state (see Implementation Details below), but it may
not use makecontext or swapcontext or any other existing C library code to manipulate a thread's context; you need to write the code to do
that yourself.
Your code must not make calls to any existing thread libraries (e.g., Linux pthreads), or borrow code from these libraries for this assignment.
Do not use any code from other students, or any code available on the Internet. When in doubt, please ask us.
Implementation Details Thread Context
Each thread has per-thread state that represents the working state of the thread -- the thread's program counter, local variables, stack, etc. A thread context is a subset of this state that must be saved/restored from the processor when switching threads. (To avoid copying the entire stack, the thread context includes a pointer to the stack, not the entire stack.) Your library will store the thread context in a per-thread data structure (this structure is sometimes called the "thread control block").
Think carefully about what you need to include in your thread control block structure, and how these structures will be used to create and manage your threads. Consider how you will implement your ready queue. Remember that the thread_yield function allows a thread to yield the CPU to a specified thread, so you may need to remove a thread from the middle of the ready queue.
Saving and Restoring Thread Context
When a thread yields the CPU, the threads library must save the current thread's context, which contains the processor register values at the time
the thread yields the CPU. The library restores the saved context later when the thread gets its turn to run on the processor again. Additionally, the library creates a fresh context and a new stack when it creates a thread.
Fortunately, the C runtime system allows an application to retrieve its current context and store it in a memory location, and to set its current context to a predetermined value from a memory location. Your library will make use of these two existing C library calls: getcontext and setcontext.
Study the manual pages (http://linux.die.net/man/2/setcontext) of these two calls. Notice that getcontext saves the current context into a structure of type struct ucontext_t, which is typedef'd as type ucontext_t. So, if you allocate a ucontext_t and pass a pointer to that memory to a call to getcontext, the current registers and other context will be stored to that memory. Later, you can call setcontext to copy that state from that memory to the processor, restoring the saved state. (Hint: You almost certainly want a 'ucontext_t' as part of your thread control block data structure.)
The struct ucontext_t is defined in /usr/include/x86_64-linux-gnu/sys/ucontext.h on the teach.cs servers. Look at the fields of this struct in detail, especially the uc_mcontext and the uc_sigmask fields.
You will use getcontext and setcontext in two ways. First, to suspend a currently running thread (to run another one), you will use getcontext to save its state and later use setcontext to restore its state. Second, to create a new thread, you will use getcontext to create a valid context, but you will leave the current thread running; you (the current thread, actually) will then change a few registers in this valid context to initialize it as a new thread, and put this new thread into the ready queue; at some point, the new thread will be chosen by the scheduler, and it will run when setcontext is called on this new thread's context.
Modifying Thread Context
As noted above, when creating a thread, you can't just make a copy of the current thread's context (using getcontext). You also need to make a few changes to initialize the new thread:
You need to change the saved program counter register in the context to point to a stub function, described below, which should be the first
function the newly created thread runs.
You need to change the saved argument registers, described below, in the context to hold the arguments that are to be passed to the stub
function.
You need to allocate a new per-thread stack using malloc.
You need to change the saved stack pointer register in the context to point to the top of the new stack. (Warning: in x86-64, stacks grow down!)
In the real world, you would take advantage of an existing library function, makecontext, to make these changes to the copy of the current thread's context. The advantage of using this function is that it abstracts away the details of how a context is saved in memory, which simplifies things and helps portability. The disadvantage is that it abstracts away the details of how a context is saved in memory, which might leave you unclear about exactly what's going on.
In the spirit of "there is no magic", for this assignment you should not use makecontext or swapcontext. Instead, you must manipulate the fields in the saved ucontext_t directly. The T2 tutorial exercise will help you understand the ucontext structure.
The Stub Function
When you create a new thread, you want it to run the thread_main function that defines the work you want the thread to do. A thread exits implicitly when it returns from its thread_main function, much like the main program thread is destroyed by the OS when it returns from its main function in C, even when the main function doesn't invoke the exit system call. To implement a similar implicit thread exit, rather than having your thread begin by running the thread_main function directly, you should start the thread in a "stub" function that calls the thread_main function of the thread (much like main is actually called from the crt0 stub function in UNIX). In other words, thread_create should initialize a thread so that it starts in the thread_stub function shown below. When the thread runs for the first time, it will execute thread_stub, which will call thread_main. If the thread_main function returns, it will return to the stub function which will call thread_exit to terminate the thread.
/* thread starts by calling thread_stub. The arguments to thread_stub are the
* thread_main() function, and one argument to the thread_main() function.
*/
void
thread_stub(void (*thread_main)(void *), void *arg)
{ }
thread_main(arg); // call thread_main() function with arg
thread_exit(0);
In the above code, the argument thread_main is a pointer to the thread_main function that describes the real work the thread should do. Notice that in C, a function's name refers to the address of its code in memory. The second argument to thread_stub (arg) is the argument to pass to the thread_main function. We'll have the thread_main function take an argument that is a pointer to an arbitrary type so that you can pass it whatever you want.
Contexts and Calling Conventions
The ucontext_t structure contains many data fields, but you only need to deal with four of them when creating new threads: the stack pointer, the program counter, and two argument registers. Other than that, you don't need to worry about the fields within the context variable, as long as you do not tamper with them. Also, it is a good idea to use variables that have been initialized through a getcontext call in order to not have bizarre behavior.
Under the Posix C calling conventions in x86-64 (http://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI) , here's what things look like while any given function is executing:
Notice that while a procedure executes, it can allocate stack space by moving the stack pointer down (stack grows downwards). However, it can
find local variables, parameters, return addresses, and the old frame pointer (old %rbp) by indexing relative to the frame pointer (%rbp) register because its value does not change during the lifetime of a function call.
When a function needs to make a function call, it copies the arguments of the "callee" function (the function to be called) into the registers shown on the right in the x86-64 architecture. For example, the %rdi register will contain the first argument, the %rsi register will contain the second argument, etc. Then the caller saves the current instruction pointer (%rip) into the stack (shown as "return address" in the figure), and changes the instruction pointer to the callee function. At this point, the stack pointer (%rsp) points to the return address (shown in the figure). Note that the stack pointer points to the last pushed value in the stack.
The callee function then starts executing. It first pushes the the frame pointer value of the caller function (shown as old %rbp) into the stack, and then sets the frame pointer (%rbp) to the current stack pointer (%rbp = %rsp), so that it points to the old frame pointer value. Then the callee function decrements the stack pointer (shown as %rsp), and uses the space between the frame pointer and the stack pointer for its local variables, for saving or spilling other registers, etc. As an example, these three steps are performed by the first three instructions (push, mov and sub) in the main function shown below. The callee locates its variables, parameters, and the return address, by using addresses relative to the fixed frame pointer (%rbp).
To return to the caller, a procedure simply copies the frame pointer (%rbp) to the stack pointer (%rsp = %rbp), effectively releasing the current frame. Then it pops the top stack item into %rbp to restore the %rbp of the caller function, and then uses the ret instruction to pop the old instruction pointer off the stack into the instruction register (%rip), returning control to the caller function. These steps are performed by the last two instructions (leaveq, retq) in the main function shown below.
This is a gdb listing for the test_basic program that we will provide you for this assignment. Run gdb on it (or on any program that you have compiled on the lab machines), as shown below, to see the instructions at the start and the end of a function (e.g., main function). Make sure that you understand what these instructions are doing, and are able to answer the questions in the listing below.
$ cd threads
$ gdb test_basic
...
(gdb) b main
Breakpoint 1 at 0x4009ab: file test_basic.c, line 7.
(gdb) run
...
Breakpoint 1, main (argc=1, argv=0x7fffffffe868) at test_basic.c:7
7 thread_init(0);
(gdb) disassemble main
Dump of assembler code for function main:
0x000000000040099c <+0>:
0x000000000040099d <+1>:
0x00000000004009a0 <+4>:
0x00000000004009a4 <+8>:
0x00000000004009a7 <+11>:
=> 0x00000000004009ab <+15>:
0x00000000004009b0 <+20>:
0x00000000004009b5 <+25>:
0x00000000004009ba <+30>:
0x00000000004009bf <+35>:
0x00000000004009c4 <+40>:
0x00000000004009c5 <+41>:
End of assembler dump.
push %rbp
mov %rsp,%rbp
sub $0x10,%rsp
mov %edi,-0x4(%rbp) // what is happening here?
mov %rsi,-0x10(%rbp) // what is happening here?
mov $0x0,%edi
callq 0x401edc <thread_init>
mov $0x0,%eax
callq 0x4009dc <test_basic>
mov $0x0,%eax
leaveq
retq
One complication with the Posix C x86-64 (http://en.wikipedia.org/wiki/X86_calling_conventions#List_of_x86_calling_conventions) calling convention is that it requires the frame pointer %rbp to be aligned to 16 bytes. This byte alignment (http://en.wikipedia.org/wiki/Data_structure_alignment) means that the value of %rbp, or the stack location (stack address) to which %rbp points, must be a multiple of 16. Otherwise, system libraries may crash (http://web.cecs.pdx.edu/~apt/cs322/x86-64.pdf) .
When you are creating a new thread that will execute the thread_stub function, you will need to set up the stack so that this calling convention is followed. In particular, you will need to think about the byte alignment for the stack pointer and the frame pointer when control is transferred to thread_stub.
Setup and Submission
Log in to MarkUs (https://markus.teach.cs.toronto.edu/2022-09/) and go to CSC369.
You will find the starter files for this assignment on MarkUs. Click the button to 'Add starter files to repository', then clone your repository where you want to do your work. You should find the starter code below the A1/ subdirectory of your cloned repository. Build the starter code by typing make in the A1/ directory
You should only modify the thread.c files in this assignment.
You can find the files you have modified by running the git status command.
You can commit your modified files to your local repository as follows:
git add thread.c
git commit -m "Committing changes for Assignment 1"
We suggest committing your changes frequently by re-running the commands above (with different meaningful messages to the commit command), so that you can go back to see the changes you have made over time, if needed.
Once you have tested your code, and committed it locally (check that by running git status), you can git push it back to MarkUs. We will collect and grade the last version pushed to MarkUs when the grace period expires. Grace tokens are consumed automatically based on the timestamp of your last push.
Hints and Advice
This assignment does not require writing a large number of lines of code. It does require you to think carefully about the code you write. Before you dive into writing code, it will pay to spend time planning and understanding the code you are going to write. If you think the problem through from beginning to end, this assignment will not be too hard. If you try to hack your way out of trouble, you will spend many frustrating nights in the lab.
As a start, here are some questions you should answer before you write code.
What fields will you need in your thread structure? Perhaps the most important is the thread state (e.g., running, etc.). Think about all the states
that you will need to support.
getcontext "returns" twice. When it is called directly, it initializes the context structure that is passed as an argument, and then execution
continues after the getcontext() call. Then, when setcontext() is called later, execution returns to the instruction following the
getcontext() call, which appears to be a second "return", since the code continues running from the instruction following getcontext(). For
this assignment, you will use this behavior, once when you create a context, and again when you switch to that context. What action will you
take in each case? How will you tell which case you are in? (Hint: Look at the T2 tutorial exercise's use of getcontext() and setcontext().)
Most threads are created with thread_create, but the initial thread is there before your library is invoked. Nonetheless, the original thread must
be able to thread_yield to let other threads run, and other threads must be able to call thread_yield and let the original thread run. How is
this going to work?
A hard bug to find would be an overflow or underflow of the stack you allocate. How might such a bug manifest itself? What defensive
programming strategies can you use to detect stack overflow in a more controlled manner as the system runs?
Note that when the initial thread in a C process returns, it calls the exit system call, which causes the OS to destroy the process, even if you
have other user-level threads in the process that want to run. It is possible for the initial tread (TID == 0) to call thread_exit() while other threads
are still running. How will you ensure that the program exits only when the last thread in your system exits?
Be careful. It is dangerous to use memory once it has been freed. In particular, you should not free the stack of the currently running thread in thread_exit while it is still running. So how will you make sure that the thread stack is eventually deallocated? How will you make sure that another thread that is created in between does not start using this stack (and then you inadvertently deallocate it)? You should convince yourself that your program would work even if you used a debugging malloc library that overwrites a block with dummy data when that block is free()'d.
Note that the stack of the initial thread (TID == 0) was not dynamically allocated by the threads library, and cannot be freed when this thread
calls thread_exit. How will you detect and handle this case?
Be careful. If you destroy a thread that is holding or waiting on a resource such as a lock (we will be implementing locks in the next
assignment), problems can occur. For example, deadlock may occur because the thread holding the lock may not have a chance to release the
lock. Similarly, if the thread waiting on a lock is destroyed, then the thread releasing the lock may wake up some other thread incorrectly (e.g.,
due to reusing thread id). For this reason, it is important to ensure that when thread_kill is invoked on a thread, the target thread should not
exit immediately. Instead, the next time the target thread runs, it should notice that it has been killed and exit. How will you implement this
functionality? In practice, operating systems provide a signal handler mechanism that allows threads to clean up their resources (e.g., locks)
before they exit.
What are the similarities and differences between thread_yield and thread_exit? Think carefully. It will be useful to encapsulate all that is
similar in a common function, which will help reduce bugs, and possibly make your code simpler.
We strongly recommend that your first milestone might be for thread_yield(THREAD_SELF) to work for the initial thread (where your implementation stores and then restores the caller's state). Get this working before you try to implement thread_create or thread_exit.
Use a debugger. As an exercise, put a breakpoint at the instruction after you copy the current thread's state using getcontext. You can print the current values of the registers (in gdb, type info registers).
You can print the values stored in your thread struct and the thread context. For example, say current is a pointer to the thread structure associated with the currently running thread, and context is a field in this structure that stores the thread context. Then, in gdb, you can use p/x current->context to print the context stored by a call to getcontext.
You may find this particularly useful in making sure that the state you "restore" when you run a newly-created thread for the first time makes sense.
Start early, we mean it!
Testing Your Code
We have provided the program test_basic.c for testing this assignment. Use it to test your code.
You can also test your code by using our auto-tester program at any time by following the testing instructions (https://q.utoronto.ca/courses/315157/pages/running-the-autotester) .