Back Porting C++20 Coroutines to C++14 -- A Tutorial
This post gives another perspective on the C++20 coroutines by presenting a small library that emulates them using C++14 and a couple macros to aid readability. For example, the following two functions g20(), g14() both implement the same coroutine functionality. While the C++20 version is clearly a bit simpler, both are functionally equivalent and sufficiently readable. Next, we will unpack what the ‘‘coroutine transformation’’ is and how it can [partially] be implemented without specialized compiler support.
my_future<int> h(); // C++20 version my_future<int> g20() { int i = co_await h(); if (i == 42) i = 0; co_return i; } // C++14 version my_future<int> g14() { // Specify the return type and lambda capture and local variables. CO_BEGIN(my_future<int>, i = int{}); // assign the result of h() to i CO_AWAIT(i, h()); if (i == 42) i = 0; CO_RETURN(i); CO_END(); }
Due to certain limitations of macros, etc., all local variables that span a CO_XXXX macro must be “lambda captured” in the CO_BEGIN macro. We will discuss this more below but CO_BEGIN must be passed the return type followed by the body of a lambda capture as if the rest of the code is in a lambda that can not have local variables. For example, CO_END(my_future<int>, i = int{}); loosely gets translated into
[i = int{}] ()->my_future<int> { //<body-of-the-coroutine> }
This likely gives the reader a hint at how we will port C++20 coroutines back to C++14.
Coroutine Usage
First, let us review some basic coroutine functionality. If the reader is familiar with coroutines feel free to skip to the next section. Coroutines can be thought of as a stateful function that can “pause” its execution and then be resumed later by the caller or someone else.
co_await
There are a couple of situations where a coroutine might be paused, or “suspend” in coroutine terminology. The most common is when the coroutine executes a co_await call of some awaitable (e.g. some other coroutine) and the result is not ready yet. For example, in the functions above we call co_await h(). The function h() returns a my_future<int> which implements the interface of an “awaitable” with value type int. This interface allows the compiler to auto generate some code that determines if the underlying value is ready now or if the current coroutine should suspend while we wait for h() to complete.
To make the example a bit more concrete, h() could be fetching a value from some server on the internet which is slow. While we wait, we want to pause this coroutine and allow the caller to go do something else. Once h() finishes, our coroutine can be resumed either by h(), by some other thread or we can be notified in some way and manually resume the coroutine. The how/when/who resumes the coroutine can be customized by our return type my_future<int> and what we are co_awaiting on.
If the underlying int that is provided by h() is ready immediately, our coroutine does not need to suspend. Whenever the underlying value is ready, either immediately after the call to h() or when the coroutine is resumed, execution of the coroutine continues. The next thing to be executed is compiler generated code that passes the value held in the my_future<int> to the current coroutine, i.e. assigns it to int i in our examples. More details on this compiler generated code later. Finally, execution of the coroutine continues with the next statement after the co_await.
co_yield
Another example of when a coroutine can be suspended is with the co_yield keyword. The exact semantics of this is customizable but typically it is used to suspend the current coroutine and pass a value back to the caller. This is similar to returning a value but we are allowed to resume the coroutine later. Typically, the caller will use this value and then resume the coroutine which can co_yield another value back to the caller or run to completion. A classic example of this is to lazily generate a range of values. For example, range20, range14 both return a my_range<int> which allows the caller to iterate over the values that are co_yield out of the coroutine. This example prints the string 2,3,4,5,6,7,8,9,.
my_range<int> range20(int begin, int end) { for (int i = begin; i < end; ++i) co_yield i; } my_range<int> range14(int begin, int end) { CO_BEGIN(my_range<int>, begin, end, i = int{}); for (i = begin; i < end; ++i) CO_YIELD(i); CO_END(); } void example() { for (int i : range14(2, 10)) std::cout << i << ','; }
Importantly, the return type of the coroutine, in this case my_future<int>, must implement a certain interface in order for the coroutine body to interact with it. In brief, this interface consists of the return type having an associated promise_type which implements certain functions. For co_yield <expr>; statements, the <expr>; value is forwarded to promise_type::yield_value(<expr>) which, in our example, somehow forwards it to the return value my_range<int> so that the caller can consume it. Similar “customization points” are required for determining how the other situations are handled.
co_return
To return a value from a coroutine we must use co_return instead of normal return. This special return keyword helps the compiler know that the current function is a coroutine as opposed to a normal function. In particular, from the function signature my_future<int> g20() there is no way to tell if the body of the function is coroutine or a normal function that just happens to return a my_future<int>.
More generally, whenever the compiler sees at least one co_await, co_yield or co_return keyword, it will convert the function body into a coroutine by performing the “coroutine transformation.” Importantly, this transformation is just an implementation detail. The caller does not know or care if my_future<int> g20() is a coroutine or implements the required logic in some other way.
Next, we will discuss precisely that, how to implement the coroutine transformation manually and then introduce the macros you see above to aid readability.
The Coroutine Transformation
At the core of the coroutine specification is a relatively mechanical transformation. Every coroutine function is effectively transformed into two functions. The first (outer function) performs the following:
- Constructs the so-called promise type <function-return-type>::promise_type promise. The role of the promise type is to decide when to start and/or resume the coroutine and what to do when it yields a value or returns.
- Construct the return type <function-return-type> which in our first example is my_future<int>. This is obtained by calling promise.get_return_object(). This will be what is returned to the caller.
- Query the promise_type and decide if the coroutine body should be started now (before returning to the caller) or later (possibly by the caller). This is determined by the result of promise.initial_suspend(). If we should start now then we call the second function which contains the user code.
- Finally, we return the result of promise.get_return_object() to the caller.
The second function (inner function) of the transformation will contain the actual body of the coroutine that the user wrote. Each time co_await, co_yield or co_return is seen, code is inserted to perform the required logic and allow the coroutine to be suspended/resumed at this location. These are called suspension points.
Below is some C++ inspired pseudocode that loosely implements the required logic which we explain next. The outer function g_inlined is intended to have the same behavior as g20(), g14() above.
my_future<int> g_inline() { // Construct the coroutine frame and promise struct coroutine_frame { my_future<int>::promise_type promise; std::function<void(void)> inner_fn; void* suspend_address = &InitialSuspend; }; auto frame = new coroutine_frame; // define the inner function frame->inner_fn = [frame]() { // jump to wherever we left off. goto frame->suspend_address; InitialSuspend: // <---- beginning of user code. ----> // int i = co_await h(); auto awaiter_h = h(); if (!awaiter_h.await_ready()) { frame->suspend_address = &Suspend_1; awaiter_h.await_suspend(frame); return; } Suspend_1: int i = awaiter_h.await_resume(); // normal code if (i == 42) i = 0; // co_return i; frame->suspend_address = &FinalSuspend; frame->promise.return_value(i); // <---- end of user code. ----> // final suspend if (!promise.final_suspend().await_ready()) { frame->suspend_address = &FinalSuspend; return; } FinalSuspend: delete frame; }; my_future<int> ret = frame->promise.get_return_object(); // call the inner function if we are ready. if (frame->promise.initial_suspend().await_ready()) frame->inner_fn(); return ret; }
First, the so-called coroutine_frame is allocated. The frame will contain the promise_type and space for the local variables used in the inner function, which for clarity is being managed using std::function<void(void)>.
This inner function can be called many times with the desired logic being that the function should resume from wherever it left off. The code above achieves this using a hypothetical syntax for storing the addresses of goto labels. In particular, each time we enter the inner function we will perform goto suspend_address where suspend_address is the location in the function that is just after where we suspended last.
The first time we enter the function suspend_address will have the initial value of InitialSuspend: and begin executing the user code. At some point the execution will hit a co_* statement, in this case int i = co_await h(). This statement will be transformed as shown above to first query if the value return by auto awaiter_h = h() is ready to be consumed. This is achieved by calling awaiter_h.await_ready().
If awaiter_h.await_ready() returns true then we need to suspend the coroutine. This is achieved by updating the suspend_address to be just after the current co_await statement and calling awaiter_h.await_suspend(frame) which gives awaiter_h the responsibility to resume this coroutine whenever the value does become available. Finally, we return from the inner function, in this case back to the outer function and then to initial caller. Through some other mechanism, awaiter_h will eventually have a value (or exception) and the inner function will be resumed. This time the goto suspend_address statement will jump us back to just after the return that was taken last time, i.e. Suspend_1.
Alternatively, awaiter_h.await_ready() could have returned true and the program would similarly be at the Suspend_1 location. Regardless, the result of co_await h() statement is ready and can be obtained by calling awaiter_h.await_resume().
The co_return i statement is handled by calling promise.return_value(i); and then immidiately proceeding to the so-called final_suspend. The role of final_suspend is to determine if the current coroutine_frame which contains the user’s promise_type is safe to be deleted. Often it is preferred to not delete the frame at this time and instead allow the caller/consumer to delete the frame once they have obtained the result, i.e. i in this case.
Issues and Solutions
The code given above should give you an intuition about what the compiler is doing. However, there are several issues in emulating as described above.
goto suspent_point
First, is that standard C++ does not allow you to take the address of a goto label. To address this the goto statements can be replaced with a switch statement. The function body would then look like
{ switch (frame->suspend_point) { case SuspensionPoint::InitialSuspend: // int i = co_await h(); auto awaiter_h = h(); if (!awaiter_h.await_ready()) { frame->suspend_point = (SuspendPoint)1; awaiter_h.await_suspend(frame); return; } case (SuspensionPoint)1: int i = awaiter_h.await_resume(); //... case SuspensionPoint::FinalSuspend: delete frame; } }
Using a switch statement is almost functionally equivalent. The main downside is that users can not use switch statements across co_* keywords (looking forward CO_* macros). Interestingly, switch statements allow for case locations that are in a nested scope, e.g. using can suspend in the middle of a for-loop and the switch statement can jump back to it upon resume.
Local variables
The second issue regards saving the state of local variables between suspend and resume calls. For example, if before co_await h() the user has constructed some local variable and we suspended for h(), then when the function is resumed that local variable would be uninitialized.
The compiler handles this in the transformation by moving any declaration of local variables into the coroutine_frame as opposed to being the body of our inner function. They are then destroyed when the coroutine completes.
While one can manually perform the transformation, we will instead consider an alternative solution. In particular, we will place/require all local variables to be declared in the lambda capture. While this results in slightly different semantics, it will enable us to write the CO_BEGIN macro.
A similar issue regards the lifetime of the awaiter when the coroutine is suspended, i.e. awaiter_h in the example above. This can be resolved by storing the awaiter either in a fixed sized buffer in coroutine_frame or allocating it on the heap if it is too large to fit in the buffer.
std::coroutine_handle<promise_type>
The final issue regards compatibility with C++20 coroutines. Ideally we can choose to implement our coroutines manually using the pattern outlined here or use the C++20 keywords. However, the primary barrier in achieving this is that C++20 defines <awaiter>.await_suspend as taking a std::coroutine_handle<promise_type> as input while our example takes <function-name>::coroutine_frame*.
Intuitively, these two parameters represent the same thing. Both are a pointer to the coroutine frame and provide a way for the awaiter to resume this coroutine once the thing being awaited becomes ready. However, the way we resume the <function-name>::coroutine_frame* vs a C++20 coroutine are concretely different. Moreover, there is no way to convert our <function-name>::coroutine_frame* into a std::coroutine_handle<promise_type> since we can not modify that type.
This issue could be resolved by giving promise_type another customization point which std::coroutine_handle<promise_type> interacts with and then dispatches to the correct method for resuming to coroutine. However, this is not part of the standard.
Instead, we will define our own macoro::coroutine_handle<promise_type> which can hold either a std::coroutine_handle<promise_type> or a <function-name>::coroutine_frame*. As such, our coroutines will only work with a promise_type that can take a macoro::coroutine_handle<*> as input to <awaiter>.await_suspend(…).
This is somewhat unfortunate because thirdparty awaitable types which are intended for C++20 will most likely not work out of the box. However, this is probably unavoidable given that they will likely take other dependencies to C++20.
Allocations
The current strategy requires several allocations. First, we must allocate the coroutine_frame. C++20 defines a way to allow the user to control where this allocation happens. A description of it can be found in Lewis Baker’s excellent blog posts. A similar strategy can be applied to our framework with some modifications.
Another allocation likely occurs for the std::function<void(void)>. However, this is not required. We can instead define the function
template<typename lambda_type, typename promise_type> coroutine_frame<lambda_type, promise_type>* makeFrame(lambda_type&& inner_fn);
that returns a frame that stores the inner function with type lambda_type inside the coroutine_frame. Then the inner function will take a pointer to the frame as its only parameter.
The final allocation is with respect to the storage of any awaiters that are produced during an active co_await. As suggested previously, these can be partially handled by having an additional buffer within the coroutine_frame. However, it is still possible that the coroutine body awaits an awaiter that does not fit in the buffer and then an allocation would occur. However, it is possible for this to result in a compile time error. This can then be combined with allowing the buffer size to be a template parameter, i.e. an input to CO_BEGIN.
Combining these we could obtain a coroutine that is guaranteed to not to allocation any memory, something that C++20 coroutines can not currently achieve 100 percent of the time.
Macros for Readability
Using the pattern above we can define the macros shown in the beginning. A prototype of these ideas can be found in the macoro repo. Below is a simplified definition of the main macros. In summary, CO_BEGIN begins the call to makeFrame that allocates the frame with the inner function being stored inline. The closing brace of the lambda and makeFrame function is in CO_END. The first time we enter the inner function we finish the initial suspend (which was started in the outer function in CO_END) by calling await_resume and destroying the awaiter.
CO_AWAIT first computes the type of the awaiter and then constructs it within the frame (or on the heap). It then performs the await_ready/await_suspend/await_resume procedure. If it does suspend, it updates the suspend point with the line number.
CO_END performs the catch block, passing any exception to the promise and then performs the final suspend. This completes the definition of the inner function. CO_END then completes the outer function by constructing the return object and starting the initial suspend. If ready we start the coroutine by calling the inner function and then return the return type.
CO_BEGIN(return_type, …)
using promise_type = return_type::promise_type; using Handle = coroutine_handle<promise_type>; // make frame with inline body. auto frame = makeFrame<promise_type>( [__VA_ARGS__](FrameBase<promise_type>* frame) mutable { promise_type& promise = frame->promise; try { // jump to where we suspended switch (frame->getSuspendPoint()) { case SuspensionPoint::InitialSuspend: { // ------- initial suspend continued ------- using Awaiter = typename awaiter_for<promise_type, decltype(promise.initial_suspend())>; frame->getAwaiter<Awaiter>().await_resume(); frame->destroyAwaiter<Awaiter>(); } // -------- end of initial suspend --------- // ---------- beginning of user code -----------
CO_AWAIT(RETURN_VAL, EXPR)
{ using Awaiter = awaiter_for<promise_type, decltype(EXPR)>; { auto& awaiter = frame->constructAwaiter(EXPR); if (!awaiter.await_ready()) { // suspend-coroutine frame->setSuspendPoint((SuspensionPoint)__LINE__); // call awaiter.await_suspend(). If it's void return, then return true. if (await_suspend(awaiter, Handle::from_promise(promise))) return; } } // resume-point, __LINE__ will be the same in macro. case SuspensionPoint(__LINE__): RETURN_VAL = frame->getAwaiter<Awaiter>().await_resume(); frame->destroyAwaiter<Awaiter>(); }
CO_END()
// ---------- end of user code ----------- default: } } catch (...) { frame->promise.unhandled_exception(); } // ---------- beginning of final suspend ----------- using Awaiter = typename awaiter_for<promise_type, decltype(promise.final_suspend())>; auto& awaiter = frame->constructAwaiter(promise.final_suspend()); auto handle = Handle::from_promise(promise); if (!awaiter.await_ready()) { // suspend-coroutine frame->setSuspendPoint(SuspensionPoint::FinalSuspend); // call awaiter.await_suspend(). If it's void return, then return true. if (await_suspend(awaiter, handle)) return; } frame->getAwaiter<Awaiter>().await_resume(); frame->destroyAwaiter<Awaiter>(); handle.destroy(); // ---------- end of final suspend ----------- }); promise_type& promise = frame->promise; auto ret = promise.get_return_object(); // ------- beginning of initial suspend ---------- using Awaiter = awaiter_for<promise_type, decltype(promise.initial_suspend())>; auto& awaiter = frame->constructAwaiter(promise.initial_suspend()); auto handle = Handle::from_promise(promise); if (!awaiter.await_ready()) { // suspend-coroutine frame->setSuspendPoint(SuspensionPoint::InitialSuspend); //call awaiter.await_suspend(). If it's void return, then return true if (await_suspend(awaiter, handle)) return ret; } /*begin coroutine*/ (*frame)(frame); return ret;