Coroutines in C++/Boost (2)

Also see my previous article: Coroutines in C++/Boost.

C++ finally has a native implementation in C++20. The principal difference between coroutines and routines is that a coroutine enables explicit suspend and resume of its progress via additional operations by preserving execution state and thus provides an enhanced control flow (maintaining the execution context).

1. Asymmetric vs Symmetric

From boost:

An asymmetric coroutine knows its invoker, using a special operation to implicitly yield control specifically to its invoker.

By contrast, all symmetric coroutines are equivalent; one symmetric coroutine may pass control to any other symmetric coroutine. Because of this, a symmetric coroutine must specify the coroutine to which it intends to yield control.

So C++20 coroutines are asymmetric ones. A coroutine only knows its parent. With the dependency, symmetric coroutines can be chained, just like a normal function calls another one. No goto semantics as with a symmetric one.

C++23 generators are also asymmetric. They are resumed repeatedly to generate a series of return values.

2. Stackless vs Stackful

Again From boost:

In contrast to a stackless coroutine, a stackful coroutine can be suspended from within a nested stackframe. Execution resumes at exactly the same point in the code where it was suspended before.

With a stackless coroutine, only the top-level routine may be suspended. Any routine called by that top-level routine may not itself suspend. This prohibits providing suspend/resume operations in routines within a general-purpose library.

Well, these two are confusing. Tutorials and Blogs have different description. To make it simple, if there is await/yield definition, it’s stackless. Then if there is something called Fiber in the language, it’s stackful.

Fibers are just like threads, they can be suspended at any stackframe. While await/yield is used as a suspend point, a stackless coroutine can only suspend at exactly that point.

A stackless coroutine shares a default stack among all the coroutines, while a stackful coroutine assigns a separate stack to each coroutine. With stackless coroutine, the code is transformed into event handlers at compile time, and driven by an event engine at run time, i.e. the scheduler of stackless coroutine. Transferring control of CPU to a stackless coroutine is merely a function call with an argument pointing to its context. Conversely, transferring CPU control to a stackful coroutine requires a context switch.

Here’s a summary of how coroutine is implemented in most popular programming languages.

Language Stackful coroutines (Fibers) Stackless coroutines (await/yield)
Java (Y2023) Virtual threads in Java 21 n/a
C n/a n/a
C++ n/a (Y2020) co_await, co_yield, co_return in C++ 20
Python n/a (Y2015) async, await/yield in Python 3.5
C# n/a (Y2012) async, await/yield in C# 5.0
Javascript n/a (Y2017) async, await/yield in ES 2017
PHP (Y2021) Fiber in PHP 8.1 n/a
Go (Y2012) Goroutine in Go 1.0
(Y2020) asynchronously preemptible in 1.14
n/a
Objective-C n/a n/a
Swift n/a (Y2021) async, await/yield in Swift 5.5
Rust n/a (Y2019) async, await in Rust 1.39

Reference

Boost.Coroutine2
Fibers under the magnifying glass
Stackful Coroutine Made Fast

Linux System Call

The HelloWorld application is much simpler than the Windows one. Just put parameters into registers from %eax to %edx, and trigger a 0x80 interrupt.

Windows System Call Sequence and Simulation

There are hundreds of documents telling how Windows implements its system call, using int 2e or sysenter. But I can find no code to run to learn how exactly it works. And I managed to write it for my own.

The C code requires only SDK to compile, for I have copied all DDK definitions inline. It opens a C:\test.txt file and write Hello World! to it. Quite simple. I’ve tried a HelloWorld console application. But its call sequence is far more complex than I have expected, after I have made some reverse engineering and read some code from ReactOS project(Wine does not help, since it does not implement a Win32 compatible call sequence in the console case). The code is the basis of our further investigation. It invokes NtCreateFile(), NtWriteFile() and NtClose() in ntdll.dll with dynamic loading:

I found the handle value and all three function pointers are fixed, at least on my Windows XP(SP3). It may be caused by the preferred base address of ntdll.dll. The code should work on all Windows platforms, since it has no hardcoded values.

Now, translate the C code into assembly. Error handling is ommitted:

Compile the code with:

The assembly code of NtCreateFile(), NtWriteFile() and NtClose() are copied directly from ntdll.dll. For NtCreate(), 25h is the system service number that will be used to index into the KiServiceTable(SSDT, System Service Dispatch Table) to locate the kernel function that handles the call.

System service numbers vary between Windows versions. This is why they are not recommend to be used directly to invoke system calls. I only demonstrate the approach here. For Windows XP, the values of the three numbers are 25h, 112h and 19h. While for Windows 7, they are 42h, 18ch and 32h. Change them yourself if you’re running Windows 7. For a complete list of system service numbers, refer here or dissemble your ntdll.dll manually :). The output executable is a tiny one, only 3KB in size, since it eliminates the usage of CRT. Moreover, it has an empty list of import functions!

At 7ffe0300h is a pointer to the following code:

NOTE: The assembly code may work only when compiled to a 32-bit application. 64-bit mode is not tested and need modification to work.

One last point, it seems the STR_HELLO string is required to be aligned to 8 byte border. Otherwise, you will get 0x80000002 error code(STATUS_DATATYPE_MISALIGNMENT).